Deploying a Headless Cluster Using the CLI
This guide walks you through setting up an 8-node distributed LLM cluster using the webAI CLI
. You'll learn how to configure multiple machines to work together, enabling distributed processing for large language models.
Prerequisites
Before beginning, ensure you have:
• 8 machines (Mac or Linux preferred) on the same subnet/LAN
• All machines must have:
• Internet access
• SSH enabled
• Xcode Command Line Tools installed for macOS: xcode-select --install
• One machine designated as the controller
• Seven machines designated as workers
Setup Process
-
Generate SSH Key on Controller
First, create an SSH key on your controller machine that will be used to connect to workers:
ssh-keygen -t ed25519 -C "webai@controller"
- Press Enter to accept the default file location (
~/.ssh/id_ed25519
) - You can leave the passphrase empty for automated connections (optional but recommended for production)
- Press Enter to accept the default file location (
-
Copy SSH Key to Each Worker Node
Copy your public key to each worker machine to enable passwordless authentication:
ssh-copy-id user@192.168.1.101 ssh-copy-id user@192.168.1.102 ssh-copy-id user@192.168.1.103 ssh-copy-id user@192.168.1.104 ssh-copy-id user@192.168.1.105 ssh-copy-id user@192.168.1.106 ssh-copy-id user@192.168.1.107
Replace
user
and IP addresses with your actual worker usernames and IPs.If your system doesn't have
ssh-copy-id
, use this alternative method:cat ~/.ssh/id_ed25519.pub | ssh user@192.168.1.101 "mkdir -p ~/.ssh && cat ~/.ssh/authorized_keys"
-
Verify SSH Connectivity
Test the SSH connection to each worker to ensure passwordless access works properly:
ssh user@192.168.1.101
If successful, you'll connect without being prompted for a password. Repeat this test for all worker nodes.
-
Unpack webAI CLI
On the controller machine:
-
Unzip the webAI CLI package:
unzip webai-cli.zip cd Headless
-
Verify the folder structure:
Headless/ ├── rtctl ├── ips.yaml └── runtime/
-
-
Configure
ips.yaml
Edit the
ips.yaml
file to define your cluster layout:controller: 192.168.1.100 workers: - user@192.168.1.101 - user@192.168.1.102 - user@192.168.1.103 - user@192.168.1.104 - user@192.168.1.105 - user@192.168.1.106 - user@192.168.1.107
- Replace
192.168.1.100
with your controller's IP address - Replace
user
with the actual username on each worker node - Use
#
at the beginning of a line to comment out any node you don't want to include
- Replace
Cluster Management
Starting the Cluster
From within the Headless/
directory:
-
Start the controller:
./rtctl start controller
-
Start all workers:
./rtctl start workers --from-file ips.yaml --controller-ip 192.168.1.100
The
--controller-ip
should match the controller IP in yourips.yaml
file.
Running a Model
-
To see all available models:
./rtctl run model --list
-
To run a distributed model across your cluster:
./rtctl run scaled -f ips.yaml --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit
Interacting with the Model
Once the model is running, you can start a chat session:
./rtctl run chat "What's the fastest land animal?"
The system will respond with the answer and performance metrics:
ttft
(Time to First Token): How quickly the first response appearedtps
(Tokens Per Second): Processing speed of the model
Checking Cluster Status
To verify the health of your cluster:
./rtctl status -f ips.yaml
Stopping the Cluster
- Stop the running model:
./rtctl stop scaled
- Stop all worker nodes:
./rtctl stop workers -f ips.yaml
- Stop the controller:
./rtctl stop controller
- (Optional) For a complete cleanup:
./rtctl clean -f ips.yaml
Best Practices
- Network Configuration: Use a private network or VLAN for optimal performance and security
- Connection Types: You can mix Wi-Fi, Ethernet, and Thunderbolt connections in your cluster
- Thunderbolt: Provides the highest performance (up to 40Gbps) and lowest latency, preferred when available between nodes
- Ethernet: Offers reliable, stable connectivity (1-10Gbps) and is preferred over Wi-Fi for consistent performance
- Wi-Fi: Acceptable for internet access and basic connectivity, but not recommended for primary inter-node communication
- Resource Allocation: Distribute models based on the available memory and compute power of each node
- Error Handling: If a node fails to connect, check its SSH configuration and network connectivity
Troubleshooting
- Check that SSH is enabled on all machines and that firewalls allow SSH connections
- Ensure all nodes have sufficient RAM for the selected model
- Check network bandwidth between nodes; bandwidth limitations can impact distributed processing
- Ensure you're running commands from within the
Headless/
directory