vastai create endpoint

Create a new endpoint group

Usage

vastai create endpoint [OPTIONS]

Options

--min_load

number

default:"0.0"

minimum floor load in perf units/s (token/s for LLms)

--min_cold_load

number

default:"0.0"

minimum floor load in perf units/s (token/s for LLms), but allow handling with cold workers

--target_util

number

default:"0.9"

target capacity utilization (fraction, max 1.0, default 0.9)

--cold_mult

number

default:"2.5"

cold/stopped instance capacity target as multiple of hot capacity target (default 2.5)

--cold_workers

integer

default:"5"

min number of workers to keep ‘cold’ when you have no load (default 5)

--max_workers

integer

default:"20"

max number of workers your endpoint group can have (default 20)

--endpoint_name

string

deployment endpoint name (allows multiple autoscale groups to share same deployment endpoint)

--max_queue_time

number

maximum seconds requests may be queued on each worker (default 30.0)

--target_queue_time

number

target seconds for the queue to be cleared (default 10.0)

--inactivity_timeout

integer

seconds of no traffic before the endpoint can scale to zero active workers

Description

Create a new endpoint group to manage many autoscaling groups Example: vastai create endpoint --target_util 0.9 --cold_mult 2.0 --endpoint_name “LLama”

Examples

vastai create endpoint

Global Options

The following options are available for all commands:

Option	Description
`--url URL`	Server REST API URL
`--retry N`	Retry limit
`--raw`	Output machine-readable JSON
`--explain`	Verbose explanation of API calls
`--api-key KEY`	API key (defaults to `~/.config/vastai/vast_api_key`)

Get Started

Commands

Usage

Options

Description

Examples

Global Options

Get Started

Commands

​Usage

​Options

​Description

​Examples

​Global Options

Usage

Options

Description

Examples

Global Options