Miscellaneous
Metadata
Metadata provides key-value storage for datasets and columns.
Dataset Metadata
deeplake.Metadata
Bases: ReadOnlyMetadata
Writable access to dataset and column metadata for ML workflows.
Stores important information about datasets like: - Model parameters and hyperparameters - Preprocessing statistics - Data splits and fold definitions - Version and training information
Changes are persisted immediately without requiring commit().
Examples:
Storing model metadata:
dataset.metadata["model_name"] = "resnet50"
dataset.metadata["hyperparameters"] = {
"learning_rate": 0.001,
"batch_size": 32
}
Setting preprocessing stats:
dataset.images.metadata["mean"] = [0.485, 0.456, 0.406]
dataset.images.metadata["std"] = [0.229, 0.224, 0.225]
__getitem__
__setitem__
# Set dataset metadata
ds.metadata["description"] = "Training dataset"
ds.metadata["version"] = "1.0"
ds.metadata["params"] = {
"image_size": 224,
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225]
}
# Read dataset metadata
description = ds.metadata["description"]
params = ds.metadata["params"]
# List all metadata keys
for key in ds.metadata.keys():
print(f"{key}: {ds.metadata[key]}")
Column Metadata
deeplake.ReadOnlyMetadata
Read-only access to dataset and column metadata for ML workflows.
Stores important information about datasets like: - Model parameters and hyperparameters - Preprocessing statistics (mean, std, etc.) - Data splits and fold definitions - Version and training information
Examples:
Accessing model metadata:
metadata = ds.metadata
model_name = metadata["model_name"]
model_params = metadata["hyperparameters"]
Reading preprocessing stats:
__getitem__
# Set column metadata
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]
ds["labels"].metadata["class_names"] = ["cat", "dog", "bird"]
# Read column metadata
mean = ds["images"].metadata["mean"]
class_names = ds["labels"].metadata["class_names"]
Version Control
Version
deeplake.Version
An atomic change within deeplake.Dataset's history
client_timestamp
property
When the version was created, according to the writer's local clock.
This timestamp is not guaranteed to be accurate, and deeplake.Version.timestamp should generally be used instead.
timestamp
property
The version timestamp.
This is based on the storage provider's clock, and so generally more accurate than deeplake.Version.client_timestamp.
# Get current version
version_id = ds.version
# Access specific version
version = ds.history[version_id]
print(f"Version: {version.id}")
print(f"Message: {version.message}")
print(f"Timestamp: {version.timestamp}")
# Open dataset at specific version
old_ds = version.open()
History
deeplake.History
The version history of a deeplake.Dataset.
# View all versions
for version in ds.history:
print(f"Version {version.id}: {version.message}")
print(f"Created: {version.timestamp}")
# Get specific version
version = ds.history["version_id"]
# Get version by index
first_version = ds.history[0]
latest_version = ds.history[-1]
Tagging
Tag
deeplake.Tag
Describes a tag within the dataset.
Tags are created using deeplake.Dataset.tag.
open_async
Asynchronously fetches the dataset corresponding to the tag and returns a Future object.
# Create tag
ds.tag("v1.0")
# Access tagged version
tag = ds.tags["v1.0"]
print(f"Tag: {tag.name}")
print(f"Version: {tag.version}")
# Open dataset at tag
tagged_ds = tag.open()
# Delete tag
tag.delete()
# Rename tag
tag.rename("v1.0.0")
Tags
deeplake.Tags
Provides access to the tags within a dataset.
It is returned by the [deeplake.Dataset.tags][] property.
# Create tag
ds.tag("v1.0") # Tag current version
ds.tag("v1.0", version="specific_version_id") # Tag specific version
# List all tags
for name in ds.tags.names():
tag = ds.tags[name]
print(f"Tag {tag.name} points to version {tag.version}")
# Check number of tags
num_tags = len(ds.tags)
# Access specific tag
tag = ds.tags["v1.0"]
# Common operations with tags
latest_ds = ds.tags["latest"].open() # Open dataset at tag
stable_ds = ds.tags["stable"].open_async() # Async open
# Error handling
try:
tag = ds.tags["non_existent"]
except deeplake.TagNotFoundError:
print("Tag not found")
TagView
deeplake.TagView
Describes a read-only tag within the dataset.
Tags are created using deeplake.Dataset.tag.
open_async
Asynchronously fetches the dataset corresponding to the tag and returns a Future object.
# Open read-only dataset
ds = deeplake.open_read_only("s3://bucket/dataset")
# Access tag view
tag_view = ds.tags["v1.0"]
print(f"Tag: {tag_view.name}")
print(f"Version: {tag_view.version}")
# Open dataset at tag
tagged_ds = tag_view.open()
TagsView
deeplake.TagsView
Provides access to the tags within a dataset.
It is returned by the [deeplake.Dataset.tags][] property on a deeplake.ReadOnlyDataset.