webAI CLI
The webAI CLI is a command-line interface tool designed to simplify the deployment and management of AI Flows in a headless environment. This tool enables users to create and manage clusters with multiple workers through an automated process, allowing for efficient execution of LLM Flows across distributed systems.
Key Capabilities
- Install Runtime on multiple machines within a local network
- Connect machines as workers to a dedicated Controller
- Form a Cluster to handle Deployments
- Execute LLM Flows:
- Run an LLM on a single dedicated machine using webFrame
- Run a distributed LLM across multiple machines using webFrame
- Scale an LLM across multiple machines for load balancing (future feature)
All machines must:
- Have internet access (to fetch model weights)
- Be able to ping each other
- Have remote login (SSH) enabled
- Must have Xcode command line tools installed
Network Recommendations
1. Setup cluster in a separate network/subnet/vlan to avoid traffic interference from unrelated traffic
2. There are two main setup options:
Wifi + Thunderbolt
There are no restrictions on the wifi setup as long as it enables internet access. For Thunderbolt connections we recommend to set an IP, either manual or via DHCP with manual address with the 169.254.x.x prefix
Ethernet Only
First, connect all machines to a switch/router that enables internet access. You should then assign IP addresses to machines, either manual or DHCP with manual address
* These are only network setup recommendations. you can also combine wifi, ethernet and thunderbolt between machines but thunderbolt will always be preferred if available.
** Choosing one option over the other does not restrict a user from also leveraging the other
Each command and subcommand of the CLI has a dedicated --help
option that will display information about the command and explain what options and arguments can be provided.
Invoke the command from terminal only. If opened from within finder, a quarantine error will be seen.
If having to enter passwords while starting workers, use ssh-copy-id
to help remove the need to enter passwords.
Configuration
The webAI CLI uses a YAML configuration file to define controller and worker relationships. The file structure looks like this:
controller: 192.168.1.100
workers:
- user@192.168.1.1
- user@192.168.1.2
- ...
Controller
Specify the IP address of the machine that will act as the controller
Workers
List the worker machines with their SSH user ID and IP address in the format userid@ip.address
Comment out entries with # to exclude specific workers
Getting Started
If you run the model just locally and you want to try it on a cluster that includes workers, you'd have to stop the model, then start the worker(s), then start the model again.
Alternatively, if you know that you're going to run on a cluster with workers, start controller, start workers, start the model.
-
Uncompress the zip file contents into your working directory.
The folder will have the following structure:├── ips.yaml ├── rtctl └── runtime ├── agent └── setup └── wheels └── runner-0.6.0-cp310-cp310-macosx_11_0_arm64.whl
-
Navigate to that directory using cd (e.g., cd Downloads/Headless)
-
Configure your ips.yaml file with appropriate controller and worker information
The provided
ips.yaml
file is a template that you should modify to fit your specific setup. While it's optional as the cli supports passing in all values as arguments, it can be much more convenient to utilize the file../rtctl
can only be run from the directory that was unpacked. The structure and contents of the runtime folder have to match and exist in the same format and exist at the same level as the cliIf you receive the error below, you will want to check and make sure the contents and structure of the runtime folder match exactly
./rtctl start controller 2025/03/14 11:00:40 ERROR "runtime folder does not exist, please make sure that runtime directory is present and has the correct structure"
Creating a Controller-Worker Setup
- Start the controller (runs locally on the user's machine) with
./rtctl start controller
- Launch the workers (cli will transfer all needed software to each remote machine) with
./rtctl start workers
- Run a model
Webframe Model
Display the available models to use./rtctl run model --list
Choose the model to be used./rtctl run model --model <model-from-list>
Scaled
Run scaled./rtctl run scaled
Adding different models
For example, if you launched a flow using ./rtctl start model --model <model-from-list>
you can try a different model using ./rtctl stop model
then running ./rtctl start model -m <model-from-list>
to start a different one.
Runtime can sometimes expose a new preview port for the api element when you change models
Preview address localhost:10501
--------------------------------
Chat will be available using `rtctl run chat --addr localhost:10501 "<prompt>"`
http:// prefix is not required
Chat is ready
the default built into the cli is localhost:10501 but if the logs show anything different, you will want to run the command recommended by the cli output
For teardown follow the sequence below:
-
Stop the running flow with
./rtctl stop model
or./rtctl stop scaled
-
Stop all of the workers with
./rtctl stop workers -f ips.yaml
or./rtctl stop workers user@192.168.1.1
-
Stop the controller with
./rtctl stop controller
Basic Commands
System Management
./rtctl status
# Check if the runtime controller is currently running
./rtctl status -f ips.yaml
# View status of all machines (controllers and workers) defined in the YAML file
./rtctl start controller
# Install and launch headless runtime on a local machine
./rtctl stop controller
# Stop the runtime controller instance on a local machine
Managing Workers
./rtctl start workers user1@192.168.1.27,user2@192.168.1.28 --controller-ip 192.168.1.24
# Start specific machines as workers (requires controller IP)
./rtctl start workers --from-file ips.yaml --controller-ip 192.168.1.24
# Start all workers defined in the YAML file
Model Operations
./rtctl run model
# Create the WebFrame flow and run it
./rtctl run model --list
# List all available models
./rtctl run model --model <model>
# Run a specific model (e.g., --model mlx-community/codegemma-7b-it-8bit)
./rtctl run model --help
# Display all options available with the run model command
When running the simple ./rtctl run model
run model command, the default model is mlx-community/Meta-Llama-3.1-8B-Instruct-4bit
Scaled Flow Operations
./rtctl run scaled -f ips.yaml --model <model>
# Run a scaled flow with a specified model using the YAML file
./rtctl run scaled --control-api-url user@192.168.1.100 --model <model>
# Run a scaled flow with a specified model using the control API URL
./rtctl stop scaled
# Stop the scaled flow
./rtctl run scaled --list
# List all supported models for the scaled flow
Chat Interaction
The following command will run the prompt for either model
or scaled
depending on which is running.
./rtctl run chat "[prompt]"
# Send a prompt to the model
Example
./rtctl run chat "Tell me a long story"
Each chat response includes performance metrics,
ttft: Time To First Token (in seconds)
tps: Tokens Per Second
The models do not maintain chat history. Each prompt is treated as a new query without context from previous interactions.
Example demonstrating lack of chat history:
./rtctl run chat "Who was the first president of the United States?"
# Response about George Washington
./rtctl run chat "And who was his vice-president?"
# Model responds with confusion due to lack of context
./rtctl run chat "Who was George Washington's vice-president?"
# Model now provides the correct information with proper context
Success Indicators
Controller
Specify the IP address of the machine that will act as the controller
Workers
List the worker machines with their SSH user ID and IP address in the format userid@ip.address
Comment out entries with # to exclude specific workers
Troubleshooting
If you are experiencing problems, we recommended resetting all machines to a clean state.
When performing this action, it is essential to apply it to all instances. Deleting only some .webai folders may lead to undesirable outcomes.
-
If workers are running, stop them first with
./rtctl stop workers -f ips.yaml
-
Stop the controller with
./rtctl stop controller
-
For each machine (worker and controller), delete the
~/.webai
folder.To help make this easier, the cli has a hidden subcommand that can perform all required cleanup steps on all machines.
Enter this command into terminal to wipe all .webai folders./rtctl clean -f ips.yaml.
* if you omit the-f ips.yaml
it will clean the controller only.Deleting the
~/.webai
folder does not delete model cache -
You can now return to step 1 of "Creating a Controller-Worker Setup"