Column Classes

Deep Lake provides two column classes for different access levels:

Class	Description
Column	Full read-write access to column data
ColumnView	Read-only access to column data

Column Class

deeplake.Column

Bases: ColumnView

Provides read-write access to a column in a dataset. Column extends ColumnView with methods for modifying data, making it suitable for dataset creation and updates in ML workflows.

The Column class allows you to: - Read and write data using integer indices, slices, or lists of indices - Modify data asynchronously for better performance - Access and modify column metadata - Handle various data types common in ML: images, embeddings, labels, etc.

Examples:

Update training labels:

# Update single label
ds["labels"][0] = 1

# Update batch of labels
ds["labels"][0:32] = new_labels

# Async update for better performance
future = ds["labels"].set_async(slice(0, 32), new_labels)
future.wait()

Store image embeddings:

# Generate and store embeddings
embeddings = model.encode(images)
ds["embeddings"][0:len(embeddings)] = embeddings

Manage column metadata:

# Store preprocessing parameters
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]

getitem

__getitem__(index: int | slice | list | tuple) -> Any

Retrieve data from the column at the specified index or range.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: - int: Single item index - slice: Range of indices (e.g., 0:10) - list/tuple: Multiple specific indices	required

Returns:

Type	Description
`Any`	The data at the specified index/indices. Type depends on the column's data type.

Examples:

# Get single item
image = column[0]

# Get range
batch = column[0:32]

# Get specific indices
items = column[[1, 5, 10]]

setitem

__setitem__(index: int | slice, value: Any) -> None

Set data in the column at the specified index or range.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice`	Can be: - int: Single item index - slice: Range of indices (e.g., 0:10)	required
`value`	`Any`	The data to store. Must match the column's data type.	required

Examples:

# Update single item
column[0] = new_image

# Update range
column[0:32] = new_batch

get_async

get_async(index: int | slice | list | tuple) -> Future

Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: - int: Single item index - slice: Range of indices - list/tuple: Multiple specific indices	required

Returns:

Name	Type	Description
`Future`	`Future`	A Future object that resolves to the requested data.

Examples:

# Async batch load
future = column.get_async(slice(0, 32))
batch = future.result()

# Using with async/await
async def load_batch():
    batch = await column.get_async(slice(0, 32))
    return batch

set_async

set_async(index: int | slice, value: Any) -> FutureVoid

Asynchronously set data in the column. Useful for large updates or when modifying multiple items in ML pipelines.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice`	Can be: - int: Single item index - slice: Range of indices	required
`value`	`Any`	The data to store. Must match the column's data type.	required

Returns:

Name	Type	Description
`FutureVoid`	`FutureVoid`	A FutureVoid that completes when the update is finished.

Examples:

# Async batch update
future = column.set_async(slice(0, 32), new_batch)
future.wait()

# Using with async/await
async def update_batch():
    await column.set_async(slice(0, 32), new_batch)

metadata `property`

metadata: Metadata

name `property`

name: str

Get the name of the column.

Returns:

Name	Type	Description
`str`	`str`	The column name.

ColumnView Class

deeplake.ColumnView

Provides read-only access to a column in a dataset. ColumnView is designed for efficient data access in ML workflows, supporting both synchronous and asynchronous operations.

The ColumnView class allows you to: - Access column data using integer indices, slices, or lists of indices - Retrieve data asynchronously for better performance in ML pipelines - Access column metadata and properties - Get information about linked data if the column contains references

Examples:

Load image data from a column for training:

# Access a single image
image = ds["images"][0]

# Load a batch of images
batch = ds["images"][0:32]

# Async load for better performance
images_future = ds["images"].get_async(slice(0, 32))
images = images_future.result()

Access embeddings for similarity search:

# Get all embeddings
embeddings = ds["embeddings"][:]

# Get specific embeddings by indices
selected = ds["embeddings"][[1, 5, 10]]

Check column properties:

# Get column name
name = ds["images"].name

# Access metadata
if "mean" in ds["images"].metadata.keys():
    mean = dataset["images"].metadata["mean"]

getitem

__getitem__(index: int | slice | list | tuple) -> Any

Retrieve data from the column at the specified index or range.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: - int: Single item index - slice: Range of indices (e.g., 0:10) - list/tuple: Multiple specific indices	required

Returns:

Type	Description
`Any`	The data at the specified index/indices. Type depends on the column's data type.

Examples:

# Get single item
image = column[0]

# Get range
batch = column[0:32]

# Get specific indices
items = column[[1, 5, 10]]

get_async

get_async(index: int | slice | list | tuple) -> Future

Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: - int: Single item index - slice: Range of indices - list/tuple: Multiple specific indices	required

Returns:

Name	Type	Description
`Future`	`Future`	A Future object that resolves to the requested data.

Examples:

# Async batch load
future = column.get_async(slice(0, 32))
batch = future.result()

# Using with async/await
async def load_batch():
    batch = await column.get_async(slice(0, 32))
    return batch

metadata `property`

metadata: ReadOnlyMetadata

Access the column's metadata. Useful for storing statistics, preprocessing parameters, or other information about the column data.

Returns:

Name	Type	Description
`ReadOnlyMetadata`	`ReadOnlyMetadata`	A ReadOnlyMetadata object for reading metadata.

Examples:

# Access preprocessing parameters
mean = column.metadata["mean"]
std = column.metadata["std"]

# Check available metadata
for key in column.metadata.keys():
    print(f"{key}: {column.metadata[key]}")

name `property`

name: str

Get the name of the column.

Returns:

Name	Type	Description
`str`	`str`	The column name.

Class Comparison

Column

Provides read-write access
Can modify data
Can update metadata
Available in Dataset

# Get mutable column
ds = deeplake.open("s3://bucket/dataset")
column = ds["images"]

# Read data
image = column[0]
batch = column[0:100]

# Write data
column[0] = new_image
column[0:100] = new_batch

# Async operations
future = column.set_async(0, new_image)
future.wait()

ColumnView

Read-only access
Cannot modify data
Can read metadata
Available in ReadOnlyDataset and DatasetView

# Get read-only column
ds = deeplake.open_read_only("s3://bucket/dataset")
column = ds["images"]

# Read data
image = column[0]
batch = column[0:100]

# Async read
future = column.get_async(slice(0, 100))
batch = future.result()

Examples

Data Access

# Direct indexing
single_item = column[0]
batch = column[0:100]
selected = column[[1, 5, 10]]

# Async data access 
future = column.get_async(slice(0, 1000))
data = future.result()

Metadata

# Read metadata from any column type
name = column.name
metadata = column.metadata

# Update metadata (Column only)
column.metadata["mean"] = [0.485, 0.456, 0.406]
column.metadata["std"] = [0.229, 0.224, 0.225]

Column Classes

Column Class

deeplake.Column

__getitem__

__setitem__

get_async

set_async

metadata property

name property

ColumnView Class

deeplake.ColumnView

__getitem__

get_async

metadata property

name property

Class Comparison

Column

ColumnView

Examples

Data Access

Metadata

getitem

setitem

metadata `property`

name `property`

getitem

metadata `property`

name `property`