LLM API
LLM API
The LLM API allows external applications to interact with your LLM workflows created in Navigator. This integration enables you to build applications that leverage your custom LLM flows.
The API element is currently in Beta and can only handle one API request at a time.
Getting Started
Setting Up the API Element
Add the API element to your LLM flow and connect it properly:
Required Connections
- Top Left Input (Initiator) Always connect to the "Initiator" element, which maintains the API connection
- Second Input (Content) Connect to your LLM flow's output
- Output Connect back to your LLM flow's input to create the response loop
Configuring API Settings
In the API element settings panel:
- API Key Create a custom alphanumeric key for authentication
- Port Number Default is 5050, but you can specify any available port
When using load balancers, each deployment must have a unique port number.
Using the API
Endpoint Details
- URL
http://:
Example:http://localhost:5050
- Method POST
- Route
/prompt
- Authentication Include your API key in the X-API-Key header
- Response Format Streamed as server-sent events (SSE)
Request Format
{ "message": [ { "role": "string", "content": "string" } ] }
Parameters:
- role Must be one of "system", "user", or "assistant"
- content The message text to send to your LLM
Code Examples
cURL Example
curl -N --location 'http://localhost:5050/prompt' \ --header 'Content-Type: application/json' \ --header 'X-API-Key: QWERTY123' \ --data '{ "message": [{"role": "user", "content": "Once upon a time,"}] }'
Python Example
import json import requests def fetch_llm_response(chat_history) - str: url = "http://localhost:5050/prompt" headers = { "Content-Type": "application/json", "X-API-Key": "QWERTY123", } data = {"message": chat_history} response = requests.post(url, headers=headers, json=data, stream=True) if response.status_code != 200: print(f"Error: Failed to fetch response from server. Status code: {response.status_code}") return output = "" for chunk in response.iter_content(chunk_size=None): try: responses = [ ('{"choices": ' + f"{x}") for x in str(chunk.decode("utf-8")).split('{"choices":')[1:] ] for response in responses: response_json = json.loads(response.strip()) print(response_json["choices"][0]["message"]["content"], flush=True, end="") output += response_json["choices"][0]["message"]["content"] except json.JSONDecodeError: print(traceback.format_exc()) print("Invalid JSON:", chunk.decode("utf-8")) print("\n") return output # Example usage chat_history = [ { "role": "system", "content": "You are a helpful assistant. Please help the user with their questions and answer as truthfully as you can." }, ] if __name__ == "__main__": while True: prompt = input("Enter your LLM prompt (or 'exit' to quit): ") if prompt == "exit": break chat_history.append({"role": "user", "content": prompt}) msg = fetch_llm_response(chat_history) chat_history.append({"role": "assistant", "content": msg})
Advanced: Load Balancing
For handling increased traffic, you can set up a load balancer such as NGINX with multiple API deployments.
NGINX Configuration
This configuration allows two simultaneous connections, distributed across two API instances:
events { worker_connections 1024; } http { upstream backend { server host.docker.internal:5050 max_conns=1; server host.docker.internal:5051 max_conns=1; } server { listen 8080; server_name host.docker.internal; location / { proxy_pass http://backend; proxy_set_header Connection ''; proxy_http_version 1.1; proxy_buffering off; proxy_cache off; proxy_read_timeout 86400s; proxy_send_timeout 86400s; # SSE-specific headers proxy_set_header X-Accel-Buffering no; proxy_set_header Cache-Control no-cache; } } }
Dockerfile for NGINX
FROM nginx:latest COPY nginx.conf /etc/nginx/nginx.conf
With this configuration, your API would be accessible at http://localhost:8080/prompt
.