Quick Start
Get up and running with Cumulus in minutes.
Installation
pip install cumulus-sdk
Initialize Client
from cumulus import CumulusClient
client = CumulusClient(api_key="YOUR_API_KEY")
Upload a Model
Register your local model with Cumulus:
endpoint = client.upload_model(
name="detector_model",
path="./models/detector.pth"
)
Deploy a Model
endpoint = client.deploy(
model="detector_model",
workload_type="inference"
)
# Get your endpoint ID
print(endpoint.id) # Returns: "ep_1a2b3c4d5e6f7g8h"
Run Inference
result = endpoint("input_data_here")
print(result.output)
That's it. Cumulus automatically:
- Uploads and registers your model
- Analyzes your model size
- Finds the best GPU in the closest region
- Handles fractional allocation if needed
- Routes requests optimally
No configuration required. Just upload, deploy, and run.
Container Configuration & Dependencies
Cumulus handles container setup automatically, but you can specify custom dependencies:
endpoint = client.deploy(
model="detector_model",
workload_type="inference",
dependencies=[
"torch==2.4.0",
"transformers==4.40.0",
"accelerate==0.30.0"
],
env_vars={
"HF_HUB_ENABLE_HF_TRANSFER": "1" # faster model downloads
}
)
Cumulus automatically:
- Installs dependencies in the container
- Configures environment variables
- Optimizes the base image for your model type
Common Dependencies
| Package | Description |
|---|---|
torch | PyTorch for deep learning |
transformers | Hugging Face model hub |
vllm | High-performance LLM serving |
accelerate | Distributed inference |