Skip to main content
Create a new endpoint group

Usage

vastai create endpoint [OPTIONS]

Options

--min_load
number
default:"0.0"
minimum floor load in perf units/s (token/s for LLms)
--min_cold_load
number
default:"0.0"
minimum floor load in perf units/s (token/s for LLms), but allow handling with cold workers
--target_util
number
default:"0.9"
target capacity utilization (fraction, max 1.0, default 0.9)
--cold_mult
number
default:"2.5"
cold/stopped instance capacity target as multiple of hot capacity target (default 2.5)
--cold_workers
integer
default:"5"
min number of workers to keep ‘cold’ when you have no load (default 5)
--max_workers
integer
default:"20"
max number of workers your endpoint group can have (default 20)
--endpoint_name
string
deployment endpoint name (allows multiple autoscale groups to share same deployment endpoint)
--max_queue_time
number
maximum seconds requests may be queued on each worker (default 30.0)
--target_queue_time
number
target seconds for the queue to be cleared (default 10.0)
--inactivity_timeout
integer
seconds of no traffic before the endpoint can scale to zero active workers

Description

Create a new endpoint group to manage many autoscaling groups Example: vastai create endpoint --target_util 0.9 --cold_mult 2.0 --endpoint_name “LLama”

Examples

vastai create endpoint

Global Options

The following options are available for all commands:
OptionDescription
--url URLServer REST API URL
--retry NRetry limit
--rawOutput machine-readable JSON
--explainVerbose explanation of API calls
--api-key KEYAPI key (defaults to ~/.config/vastai/vast_api_key)