A unified Python interface for file storage, supporting local filesystem, Google Cloud Storage (GCS), and Amazon S3. Easily switch between storage backends using environment variables, and interact with files using a simple, consistent API.
- Unified Storage Interface: Use the same API to interact with Local Filesystem, Google Cloud Storage, and Amazon S3.
- File Operations: Save, read, and append to files as bytes or file-like objects.
- Efficient Append: Smart append operations that use native filesystem append for local storage and multi-part patterns for cloud storage.
- URL Generation: Get URLs for files stored in any of the supported storage systems.
- File Upload: Upload files directly from local file paths to the storage system.
- Existence Check: Check if a file exists in the storage system.
- Backend Flexibility: Seamlessly switch between local, GCS, and S3 storage by setting environment variables.
- Extensible: Add new storage backends by subclassing the
Storageabstract base class. - Factory Pattern: Automatically selects the appropriate backend at runtime.
This package uses uv for dependency management. To install dependencies:
uv syncDepending on the storage backend(s) you want to use, you can install optional dependencies:
- Google Cloud Storage support:
uv sync --extra gcs
- Amazon S3 support:
uv sync --extra s3
- All:
uv sync --all-extras
The simplest storage option, ideal for development and testing.
Required Environment Variables:
DATADIR(optional): Directory path for file storage. Defaults to./dataif not set.
Example Setup:
# Optional: Set custom data directory
export DATADIR="/path/to/your/data"
# Or use default ./data directory (no setup needed)Usage:
from omni_storage.factory import get_storage
# Automatic detection (when only DATADIR is set)
storage = get_storage()
# Or explicit selection
storage = get_storage(storage_type="local")Store files in Amazon S3 buckets with full AWS integration.
Required Environment Variables:
AWS_S3_BUCKET: Your S3 bucket nameAWS_REGION(optional): AWS region (e.g., "us-east-1")
AWS Credentials: Must be configured via one of these methods:
- Environment variables:
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY - AWS credentials file:
~/.aws/credentials - IAM roles (when running on AWS infrastructure)
- See boto3 credentials documentation for all options
Example Setup:
# Required: S3 bucket name
export AWS_S3_BUCKET="my-storage-bucket"
# Optional: AWS region
export AWS_REGION="us-west-2"
# AWS credentials (if not using IAM roles or credentials file)
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"Usage:
from omni_storage.factory import get_storage
# Automatic detection (when AWS_S3_BUCKET is set)
storage = get_storage()
# Or explicit selection
storage = get_storage(storage_type="s3")Store files in Google Cloud Storage buckets.
Required Environment Variables:
GCS_BUCKET: Your GCS bucket name
GCS Authentication: Must be configured via one of these methods:
- Service account key file: Set
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Application Default Credentials (ADC) when running on Google Cloud
- gcloud CLI authentication for local development
- See Google Cloud authentication documentation for details
Example Setup:
# Required: GCS bucket name
export GCS_BUCKET="my-gcs-bucket"
# Authentication via service account (most common)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
# Or authenticate via gcloud CLI for development
gcloud auth application-default loginUsage:
from omni_storage.factory import get_storage
# Automatic detection (when GCS_BUCKET is set)
storage = get_storage()
# Or explicit selection
storage = get_storage(storage_type="gcs")Omni Storage can determine the appropriate backend in two ways:
- Explicitly via
storage_typeparameter: Passstorage_type="s3",storage_type="gcs", orstorage_type="local"toget_storage() - Automatically via Environment Variables: If
storage_typeis not provided, the backend is chosen based on which environment variables are set:- If
AWS_S3_BUCKETis set → S3 storage - If
GCS_BUCKETis set → GCS storage - Otherwise → Local storage (using
DATADIRor default./data)
- If
Note: Even when using explicit selection, the relevant environment variables for that backend must still be set.
from omni_storage.factory import get_storage
# Get storage instance (auto-detect from environment)
storage = get_storage()
# Save a file from bytes
data = b"Hello, World!"
storage.save_file(data, 'hello.txt')
# Save a file from file-like object
with open('local_file.txt', 'rb') as f:
storage.save_file(f, 'uploads/remote_file.txt')
# Read a file
content = storage.read_file('uploads/remote_file.txt')
print(content.decode('utf-8'))
# Upload a file directly from path
storage.upload_file('/path/to/local/file.pdf', 'documents/file.pdf')
# Check if file exists
if storage.exists('documents/file.pdf'):
print("File exists!")
# Get file URL
url = storage.get_file_url('documents/file.pdf')
print(f"File URL: {url}")The append_file method allows you to efficiently add content to existing files:
from omni_storage.factory import get_storage
storage = get_storage()
# Append text to a file
storage.append_file("Line 1\n", "log.txt")
storage.append_file("Line 2\n", "log.txt")
# Append binary data
binary_data = b"\x00\x01\x02\x03"
storage.append_file(binary_data, "data.bin")
# Append from file-like objects
from io import StringIO, BytesIO
text_buffer = StringIO("Buffered text content\n")
storage.append_file(text_buffer, "output.txt")
bytes_buffer = BytesIO(b"Binary buffer content")
storage.append_file(bytes_buffer, "binary_output.bin")
# Streaming large CSV data
import csv
from io import StringIO
# Simulate streaming data from a database
for batch in fetch_large_dataset():
csv_buffer = StringIO()
writer = csv.writer(csv_buffer)
writer.writerows(batch)
# Append CSV data efficiently
csv_buffer.seek(0)
storage.append_file(csv_buffer, "large_dataset.csv")Cloud Storage Optimization: For S3 and GCS, append operations intelligently choose between:
- Single-file strategy: For small files, downloads existing content, appends new data, and re-uploads
- Multi-part strategy: For large files (>100MB by default), creates separate part files and a manifest for efficient streaming
The multi-part pattern is transparent to users - when you read a file, it automatically handles both single files and multi-part files seamlessly.
# Force specific storage backend
s3_storage = get_storage(storage_type="s3") # Requires AWS_S3_BUCKET
gcs_storage = get_storage(storage_type="gcs") # Requires GCS_BUCKET
local_storage = get_storage(storage_type="local") # Uses DATADIR or ./data
# URLs differ by provider:
# - S3: https://bucket-name.s3.region.amazonaws.com/path/to/file
# - GCS: https://storage.googleapis.com/bucket-name/path/to/file
# - Local: file:///absolute/path/to/filesave_file(file_data: Union[bytes, BinaryIO], destination_path: str) -> str- Save file data to storage.
read_file(file_path: str) -> bytes- Read file data from storage.
get_file_url(file_path: str) -> str- Get a URL or path to access the file.
upload_file(local_path: str, destination_path: str) -> str- Upload a file from a local path to storage.
exists(file_path: str) -> bool- Check if a file exists in storage.
append_file(content: Union[str, bytes, BinaryIO], filename: str, create_if_not_exists: bool = True, strategy: Literal["auto", "single", "multipart"] = "auto", part_size_mb: int = 100) -> AppendResult- Append content to an existing file or create a new one.
- Returns
AppendResultwith:path,bytes_written,strategy_used, andparts_count.
S3Storage(bucket_name: str, region_name: str | None = None)- Stores files in an Amazon S3 bucket.
GCSStorage(bucket_name: str)- Stores files in a Google Cloud Storage bucket.
LocalStorage(base_dir: str)- Stores files on the local filesystem.
get_storage(storage_type: Optional[Literal["s3", "gcs", "local"]] = None) -> Storage- Returns a storage instance. If
storage_typeis provided (e.g., "s3", "gcs", "local"), it determines the backend. Otherwise, the choice is based on environment variables.
- Returns a storage instance. If
This project is licensed under the MIT License.
Contributions are welcome! Please open issues and pull requests for bug fixes or new features.
- Inspired by the need for flexible, pluggable storage solutions in modern Python applications.