Skip to main content

TileDb backed objects for array and matrix like data

Project description

Project generated with PyScaffold PyPI-Server Monthly Downloads Unit tests

tiledbarray

This is the Python equivalent of Bioconductor's TileDBArray package, providing a representation of TileDB-backed arrays within the delayedarray framework. The idea is to allow users to store, manipulate and operate on large datasets without loading them into memory, in a manner that is trivially compatible with other data structures in the BiocPy ecosystem.

Installation

This package can be installed from PyPI with the usual commands:

pip install tiledbarray

Quick start

Let's mock up a dense array:

import numpy
data = numpy.random.rand(40, 50)

tiledb.from_numpy("dense.tiledb", data)

We can now represent it as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray("dense.tiledb", attribute_name="")
# <40 x 50> TileDbArray object of type 'float64'
# [[0.96316214, 0.90187013, 0.55767551, ..., 0.81663263, 0.57660051,
#   0.3986336 ],
#  [0.72578394, 0.06328588, 0.9473141 , ..., 0.89977069, 0.34617884,
#   0.09208036],
#  [0.87291607, 0.01714908, 0.96570953, ..., 0.28404601, 0.20394673,
#   0.6454273 ],
#  ...,
#  [0.21565857, 0.11721607, 0.45146332, ..., 0.18565937, 0.348599  ,
#   0.16050929],
#  [0.95061188, 0.71917657, 0.33039149, ..., 0.60267692, 0.28035863,
#   0.56416845],
#  [0.40462116, 0.61058508, 0.5067807 , ..., 0.64234988, 0.5881812 ,
#   0.17138409]]

This is just a subclass of a DelayedArray and can be used anywhere in the BiocPy framework. Parts of the NumPy API are also supported - for example, we could apply a variety of delayed operations:

scaling = numpy.random.rand(100)
transformed = numpy.log1p(arr / scaling)
# <40 x 50> DelayedArray object of type 'float64'
# [[1.29646391, 2.05014167, 0.48661736, ..., 0.90574803, 2.38890685,
#   1.1277655 ],
#  [1.09916863, 0.38865342, 0.72500505, ..., 0.96463182, 1.93797807,
#   0.39371608],
#  [1.22596458, 0.12107778, 0.73496894, ..., 0.41384292, 1.50457489,
#   1.47747976],
#  ...,
#  [0.46673182, 0.63114795, 0.41040352, ..., 0.28897665, 1.94394461,
#   0.61032586],
#  [1.28695229, 1.85595293, 0.31579293, ..., 0.73604123, 1.76033915,
#   1.37526146],
#  [0.74949037, 1.71968269, 0.45082104, ..., 0.76976215, 2.40698455,
#   0.64080734]]

Check out the documentation for more details.

Sparse Matrices

We can perform similar operations on a sparse matrix as well. Lets mock a sparse matrix and store it as a tiledb file.

dir_path = "sparse_array.tiledb"
dom = tiledb.Domain(
     tiledb.Dim(name="rows", domain=(0, 4), tile=5, dtype=np.int32),
     tiledb.Dim(name="cols", domain=(0, 4), tile=5, dtype=np.int32),
)
schema = tiledb.ArraySchema(
     domain=dom, sparse=True, attrs=[tiledb.Attr(name="", dtype=np.int32)]
)
tiledb.SparseArray.create(f"{dir_path}", schema)

tdb = tiledb.SparseArray(f"{dir_path}", mode="w")
i, j = [1, 2, 2], [1, 4, 3]
data = np.array(([1, 2, 3]))
tdb[i, j] = data

We can now represent this as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray(dir_path, attribute_name="")

slices = (slice(0,3), [2, 4])

import delayedarray
subset = delayedarray.extract_sparse_array(arr, (*slices,))
print(subset)
# <3 x 2> SparseNdarray object of type 'int32'
# [[2, 0],
#  [0, 0],
#  [0, 0]]

Check out the delayedarray for more details.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiledbarray-0.2.0.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiledbarray-0.2.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file tiledbarray-0.2.0.tar.gz.

File metadata

  • Download URL: tiledbarray-0.2.0.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for tiledbarray-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d6fb57981fbb174a2c43d0d6c60b35e58d34708dc184720f055a297e0a8cb0b0
MD5 67b39c23f2f284a86a13f179ad7b4b84
BLAKE2b-256 8b520a8deaba98bce62e4df5a3ede550be109b84ca116739d04aa34b60767cc4

See more details on using hashes here.

File details

Details for the file tiledbarray-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tiledbarray-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for tiledbarray-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e564dbd53fedfef1bfc9a99fdbf467e3bf1c88284995ca50b538f87589c7015
MD5 9168728192fe66874b7f3645d357279e
BLAKE2b-256 37b60866767177e0329a8f1631a684b4f141eb08855d509b9ac87055e7805326

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page