Skip to main content

Configuration Reference

Complete reference for all Cumulus job configuration options.


Quick Reference

from cumulus import CumulusClient

client = CumulusClient()

job = client.submit(
# Required
script="train.py",

# File handling
include_patterns=["*.yaml", "data/*.csv"],
exclude_patterns=["*.pyc", "__pycache__/*"],
requirements=["torch", "transformers"],
requirements_file="requirements.txt",

# Resources
gpu_count=1,
memory_request="16Gi",
memory_limit="32Gi",
worker_image="pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime",

# Job settings
job_id="my-training-001",
workload_type="training",
priority=7,

# Environment
env={"WANDB_API_KEY": "...", "HF_TOKEN": "..."},

# Optimization hints
model_architecture={...},
training_config={...},
auto_detect=True
)

File Handling

script (required)

Path to the main Python script to execute.

job = client.submit(script="train.py")
job = client.submit(script="src/main.py")

include_patterns

Glob patterns for files to include with your job.

job = client.submit(
script="train.py",
include_patterns=[
"*.yaml", # All YAML files in root
"data/*.csv", # CSV files in data folder
"models/**/*.pt", # All .pt files in models (recursive)
"configs/*.json" # JSON files in configs folder
]
)

Pattern syntax:

PatternMatches
*.yamlAll YAML files in current directory
**/*.yamlAll YAML files recursively
data/*.csvCSV files in data/ directory
models/**/*.ptAll .pt files under models/
Automatic Detection

By default, the SDK automatically detects files referenced in your code. Use include_patterns when you need to include files that aren't directly referenced, or when auto-detection misses something.

exclude_patterns

Glob patterns for files to exclude from your job.

job = client.submit(
script="train.py",
exclude_patterns=[
"*.pyc", # Compiled Python
"__pycache__/*", # Cache directories
"*.log", # Log files
".git/*", # Git directory
"*.egg-info/*", # Package metadata
"tests/*" # Test files
]
)

requirements

List of pip packages to install.

job = client.submit(
script="train.py",
requirements=["torch", "transformers", "wandb"]
)

# With version constraints
job = client.submit(
script="train.py",
requirements=[
"torch>=2.0.0",
"transformers==4.35.0",
"wandb~=0.16.0"
]
)

additional_files

Explicitly include files that aren't auto-detected.

job = client.submit(
script="train.py",
additional_files=["model.py", "utils.py", "../data/processed"]
)
Files Outside Script Directory

When including files outside your script's directory (paths with ../), the .. components are stripped and files are placed alongside your script. For example, ../data/processed/ becomes data/processed/ relative to your script.

Use relative paths in your script to access these files.

requirements_file

Path to a requirements.txt file.

job = client.submit(
script="train.py",
requirements_file="requirements.txt"
)

You can combine both:

job = client.submit(
script="train.py",
requirements_file="requirements.txt",
requirements=["wandb"] # Additional packages
)

Resource Configuration

gpu_count

Number of GPUs to request.

job = client.submit(
script="train.py",
gpu_count=1 # Default: 1
)

# Multi-GPU training
job = client.submit(
script="distributed_train.py",
gpu_count=4
)

memory_request

Minimum memory guaranteed for your job.

job = client.submit(
script="train.py",
memory_request="16Gi" # Default: "8Gi"
)

Format: Use Kubernetes notation: "8Gi", "16Gi", "32Gi", etc.

memory_limit

Maximum memory your job can use.

job = client.submit(
script="train.py",
memory_limit="32Gi" # Default: "16Gi"
)

Guidelines:

Model Sizememory_requestmemory_limit
Small (< 1B params)8Gi16Gi
Medium (1B - 7B params)16Gi32Gi
Large (7B+ params)32Gi64Gi

worker_image

Docker image to run your job.

job = client.submit(
script="train.py",
worker_image="pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime"
)

Available images:

ImageDescription
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtimeDefault PyTorch 2.5 with CUDA 12.4
nvcr.io/nvidia/pytorch:24.01-py3NVIDIA optimized PyTorch
nvcr.io/nvidia/tensorflow:24.01-tf2-py3NVIDIA TensorFlow 2
nvcr.io/nvidia/tritonserver:24.01-py3NVIDIA Triton for serving

Job Settings

job_id

Custom identifier for your job.

job = client.submit(
script="train.py",
job_id="bert-finetune-exp-001"
)

Requirements:

  • Alphanumeric characters and hyphens only
  • Maximum 63 characters
  • Must be unique

If not provided, a unique ID is auto-generated: job-20240103-143052-a1b2c3

workload_type

Type of workload, affects scheduling priority.

job = client.submit(
script="train.py",
workload_type="training" # Default
)
TypeUse CaseEviction Priority
trainingModel trainingHighest (rarely evicted)
finetuningFine-tuning pre-trained modelsMedium
inferenceBatch inference, evaluationLowest

priority

Job priority (1-10), affects scheduling order and eviction.

job = client.submit(
script="train.py",
priority=8 # Default: 5
)
PriorityBehavior
1-3Low priority, evicted first for higher priority jobs
4-6Normal priority
7-10High priority, rarely evicted
tip

Use high priority (7-10) for important production jobs. Use low priority (1-3) for experimental or non-urgent work.


Environment Variables

env

Dictionary of environment variables for your script.

job = client.submit(
script="train.py",
env={
"WANDB_API_KEY": "your-wandb-key",
"HF_TOKEN": "your-huggingface-token",
"DEBUG": "true",
"LEARNING_RATE": "0.001",
"BATCH_SIZE": "32"
}
)

Access in your script:

import os

wandb_key = os.environ.get("WANDB_API_KEY")
lr = float(os.environ.get("LEARNING_RATE", "0.001"))
batch_size = int(os.environ.get("BATCH_SIZE", "32"))
debug = os.environ.get("DEBUG", "false").lower() == "true"

System-provided environment variables:

These are automatically set inside your job:

VariableDescriptionExample
JOB_IDCurrent job identifierjob-20240103-143052-abc123
ORIGINAL_JOB_IDOriginal job ID (for requeued jobs)job-20240103-143052-abc123
S3_BUCKETStorage bucketcumulus-jobs-...
AWS_REGIONCloud regionus-east-2
RESUME_FROM_CHECKPOINTWhether to resumetrue or false
CHECKPOINT_PATHPath to checkpoint (if resuming)s3://bucket/job/checkpoints/
REQUEUE_COUNTTimes job has been requeued0, 1, 2, ...
CUDA_VISIBLE_DEVICESAssigned GPU index0

Optimization Hints

model_architecture

Model architecture details for VRAM estimation.

job = client.submit(
script="train.py",
model_architecture={
"architecture_type": "transformer",
"num_layers": 12,
"hidden_dim": 768,
"num_heads": 12,
"total_params": 110_000_000,
"base_model": "llama-7b", # For known model baselines
"lora_rank": 32 # For LoRA fine-tuning
}
)
FieldDescription
architecture_typeModel type: transformer, diffusion, cnn, unet, etc.
num_layersNumber of layers
hidden_dimHidden dimension size
num_headsNumber of attention heads (transformers)
total_paramsTotal parameter count
base_modelBase model name for known baselines (e.g., sdxl, llama-7b, flux)
lora_rankLoRA rank for accurate trainable parameter calculation
Diffusion Models

For diffusion models like SDXL or Flux, set architecture_type: "diffusion" and include base_model for the most accurate VRAM estimates.

training_config

Training configuration for optimization.

job = client.submit(
script="train.py",
training_config={
"batch_size": 32,
"precision": "fp16",
"optimizer": "adamw",
"sequence_length": 512,
"peft": True, # LoRA/PEFT fine-tuning
"image_size": 1024, # For diffusion models
"gradient_checkpointing": True,
"cfg_scale": 7.5 # Classifier-free guidance
}
)
FieldDescriptionValues
batch_sizeTraining batch sizeInteger
precisionNumerical precisionfp32, fp16, bf16
optimizerOptimizer typeadam, adamw, sgd
sequence_lengthSequence length (for NLP)Integer
peftUsing PEFT/LoRA (reduces VRAM estimate)True, False
image_sizeImage resolution for vision/diffusion modelsInteger (e.g., 512, 1024)
gradient_checkpointingEnables gradient checkpointing (reduces VRAM ~30%)True, False
cfg_scaleClassifier-free guidance scale (diffusion); >1 doubles effective batchFloat
Diffusion Training

For SDXL LoRA fine-tuning, a typical configuration is:

training_config={
"batch_size": 1,
"precision": "fp16",
"peft": True,
"image_size": 1024,
"gradient_checkpointing": True,
"cfg_scale": 7.5
}

This accurately estimates ~12-16GB VRAM depending on checkpointing.

auto_detect

Enable automatic detection of model configuration from your script.

job = client.submit(
script="train.py",
auto_detect=True # Default: True
)

When enabled, the SDK parses your script to detect:

  • Model architecture (PyTorch, HuggingFace, torchvision)
  • Batch sizes
  • Precision settings
  • Local file imports

Disable if auto-detection causes issues:

job = client.submit(
script="train.py",
auto_detect=False
)

Advanced Configuration

These options let you manually specify GPU resources when you know your exact requirements.

When to Use Manual Resource Settings

Use vram_gb and sm_percent when:

  • You know your exact requirements from profiling or documentation
  • You need guaranteed resources for consistent performance (inference servers)
  • Jobs need specific memory allocations based on your model size

sm_percent

Manually set the GPU compute percentage (1-100). This controls what portion of GPU compute resources your job receives.

job = client.submit(
script="train.py",
sm_percent=50 # 50% of GPU compute resources
)
sm_percentUse Case
100Full GPU (maximum performance)
50Half GPU, suitable for smaller models
25Quarter GPU, good for inference or light workloads

vram_gb

Manually set VRAM (GPU memory) allocation in GB. This is the guaranteed amount of GPU memory reserved for your job.

job = client.submit(
script="train.py",
vram_gb=20.0 # Guaranteed 20GB VRAM
)
GPUTotal VRAMTypical vram_gb values
A10040GB / 80GB10, 20, 40, 80
H10080GB20, 40, 80
Guaranteed resources

When you set both sm_percent and vram_gb, you get guaranteed GPU resources. This is ideal for:

  • Production inference servers needing consistent latency
  • Training jobs where you know exact memory requirements
  • Workloads that need predictable performance

queue_timeout_seconds

Maximum time (in seconds) to wait in queue before triggering autoscaler.

job = client.submit(
script="train.py",
queue_timeout_seconds=300 # Wait up to 5 minutes
)

service_port

Expose a port via public tunnel for running inference servers, APIs, or development endpoints. The tunnel provides a public URL accessible from anywhere.

job = client.submit(
script="sglang_server.py",
service_port=30000 # Expose SGLang server
)

# Wait for tunnel to be ready
tunnel_url = client.wait_for_tunnel(job.job_id, timeout=300)
print(f"Server available at: {tunnel_url}")
# Returns: http://tunnel.cumuluslabs.io:8443/12345

How it works:

  1. Your job starts and listens on the specified port (e.g., 30000)
  2. A tunnel sidecar connects your port to tunnel.cumuluslabs.io
  3. You get a unique public URL like http://tunnel.cumuluslabs.io:8443/12345
  4. Requests to this URL require API key authentication

Getting the tunnel URL:

# Method 1: Wait for tunnel (blocking)
tunnel_url = client.wait_for_tunnel(job.job_id, timeout=300)

# Method 2: Check immediately (non-blocking)
tunnel_url = client.get_tunnel_url(job.job_id) # Returns None if not ready

# Method 3: Poll in your own loop
import time
while True:
tunnel_url = client.get_tunnel_url(job.job_id)
if tunnel_url:
break
time.sleep(5)

Common service ports:

FrameworkDefault Port
SGLang30000
vLLM8000
Ollama11434
Gradio7860
FastAPI8000

Example: Inference server with guaranteed resources

Authentication Required

Tunnel URLs require your Cumulus API key for access. Include it via:

  • X-API-Key: <your-api-key> header, or
  • Authorization: Bearer <your-api-key> header
import os

job = client.submit(
script="vllm_server.py",
service_port=8000,
workload_type="inference",
sm_percent=100, # Full GPU compute
vram_gb=40.0, # 40GB guaranteed VRAM
gpu_count=1
)

tunnel_url = client.wait_for_tunnel(job.job_id, timeout=300)
print(f"vLLM server ready at: {tunnel_url}")

# Make authenticated inference requests
import requests

api_key = os.environ.get("CUMULUS_API_KEY")

response = requests.post(
f"{tunnel_url}/v1/completions",
headers={"X-API-Key": api_key}, # Auth required!
json={
"prompt": "Hello, world!",
"max_tokens": 100
}
)

user_id

Override the owner ID for job filtering in the web dashboard.

job = client.submit(
script="train.py",
user_id="team-ml-research" # Custom owner for dashboard filtering
)

Complete Example

from cumulus import CumulusClient
import os

client = CumulusClient()

job = client.submit(
# Script and files
script="src/train.py",
include_patterns=["configs/*.yaml", "data/*.csv"],
exclude_patterns=["*.pyc", "__pycache__/*", "*.log"],
requirements_file="requirements.txt",
requirements=["wandb"],

# Resources
gpu_count=1,
memory_request="16Gi",
memory_limit="32Gi",
worker_image="pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime",

# Job settings
job_id="bert-finetune-v1",
workload_type="training",
priority=7,

# Environment
env={
"WANDB_API_KEY": os.environ.get("WANDB_API_KEY"),
"HF_TOKEN": os.environ.get("HF_TOKEN"),
"EXPERIMENT_NAME": "bert-finetune-v1"
},

# Optimization hints
model_architecture={
"architecture_type": "transformer",
"num_layers": 12,
"hidden_dim": 768,
"num_heads": 12,
"total_params": 110_000_000
},
training_config={
"batch_size": 32,
"precision": "fp16",
"optimizer": "adamw",
"sequence_length": 512
},
auto_detect=True
)

print(f"Job submitted: {job.job_id}")
print(f"Storage path: {job.s3_path}")

# Wait for completion
status = client.wait_for_completion(job.job_id, timeout=7200)
print(f"Final status: {status}")

# Get results
if status == "SUCCEEDED":
results = client.get_results(job.job_id)
print(results)

Parameter Summary

ParameterTypeDefaultDescription
scriptstrrequiredPath to main Python script
include_patternslist[str][]Glob patterns to include
exclude_patternslist[str][]Glob patterns to exclude
requirementslist[str][]pip packages to install
requirements_filestrNonePath to requirements.txt
gpu_countint1Number of GPUs
memory_requeststr"8Gi"Minimum memory
memory_limitstr"16Gi"Maximum memory
worker_imagestrPyTorch defaultDocker image
job_idstrauto-generatedCustom job ID
workload_typestr"training"Job type
priorityint5Priority (1-10)
envdict{}Environment variables
model_architecturedictNoneModel hints
training_configdictNoneTraining hints
auto_detectboolTrueAuto-detect config
sm_percentintautoGPU compute percentage (1-100)
vram_gbfloatautoGPU memory allocation in GB
queue_timeout_secondsint900Max queue wait time
service_portintNonePort to expose via tunnel
user_idstrAPI key userOwner ID for dashboard