Quick Start

Get up and running with Cumulus in minutes.

Installation

pip install cumulus-sdk

Initialize Client

from cumulus import CumulusClient

client = CumulusClient(api_key="YOUR_API_KEY")

Upload a Model

endpoint = client.upload_model(
    name="detector_model",
    path="./models/detector.pth"
)

Deploy a Model

endpoint = client.deploy(
    model="detector_model",
    workload_type="inference"
)

# Get your endpoint ID
print(endpoint.id)  # Returns: "ep_1a2b3c4d5e6f7g8h"

Run Inference

result = endpoint("input_data_here")
print(result.output)

That's it. Cumulus automatically:

Uploads and registers your model
Analyzes your model size
Finds the best GPU in the closest region
Handles fractional allocation if needed
Routes requests optimally

No configuration required. Just upload, deploy, and run.

Container Configuration & Dependencies

Cumulus handles container setup automatically, but you can specify custom dependencies:

endpoint = client.deploy(
    model="detector_model",
    workload_type="inference",
    dependencies=[
        "torch==2.4.0",
        "transformers==4.40.0",
        "accelerate==0.30.0"
    ],
    env_vars={
        "HF_HUB_ENABLE_HF_TRANSFER": "1"  # faster model downloads
    }
)

Cumulus automatically:

Installs dependencies in the container
Configures environment variables
Optimizes the base image for your model type

Common Dependencies

Package	Description
`torch`	PyTorch for deep learning
`transformers`	Hugging Face model hub
`vllm`	High-performance LLM serving
`accelerate`	Distributed inference

Installation​

Initialize Client​

Upload a Model​

Deploy a Model​

Run Inference​

Container Configuration & Dependencies​

Common Dependencies​