LLM API
The LLM API is designed to integrate seamlessly with your LLM flow creations, enabling external applications to interact with your workflows. This guide provides step-by-step instructions on setting up, configuring, and using the API element effectively.
The API element is currently in Beta and can only handle one API request at a time.
When using load balancers, ensure that each deployment has the API "port" setting incremented.
API Element Setup
To set up the API element in your workflow:
Inputs
- Top Left Input Always connect this to the "Initiator" element, which serves as a keep-alive for the API.
- Second Input Connect this to your LLM flow's output.
Output
- API Output The output of the API element should feed back into your LLM flow's input.
API Element Settings
The API element has customizable settings to suit your needs:
- API Key This is a user-defined alphanumeric key used for authentication.
- Port Number The default port is 5050, but it can be customized to any available port.
API Endpoint
To interact with the API, use the following endpoint details:
- URL
http://<machine-ip>:<port-number>
(Default:http://localhost:5050
) - Method POST
- Route
/prompt
- Authentication Use the
X-API-Key
header to authenticate. - Response The response is streamed back as server-sent events (SSE).
API Request Payload
When making a POST request to the API, the payload should be structured as follows:
{ "message": [ { "role": "string", "content": "string" } ] }
- role One of "system", "user", or "assistant"
- content The chat/message content to be sent to the LLM.
Usage Examples
Here are examples of how to use the API with cURL and Python.
cURL Example:
curl -N --location 'http://localhost:5050/prompt' \ --header 'Content-Type: application/json' \ --header 'X-API-Key: QWERTY123' \ --data '{ "message": [{"role": "user", "content": "Once upon a time,"}] }'
Python Example (using requests):
import json import requests def fetch_llm_response(chat_history) - str: url = "http://localhost:5050/prompt" headers = { "Content-Type": "application/json", "X-API-Key": "QWERTY123", } data = {"message": chat_history} response = requests.post(url, headers=headers, json=data, stream=True) if response.status_code != 200: print(f"Error: Failed to fetch response from server. Status code: {response.status_code}") return output = "" for chunk in response.iter_content(chunk_size=None): try: responses = [ ('{"choices": ' + f"{x}") for x in str(chunk.decode("utf-8")).split('{"choices":')[1:] ] for response in responses: response_json = json.loads(response.strip()) print(response_json["choices"][0]["message"]["content"], flush=True, end="") output += response_json["choices"][0]["message"]["content"] except json.JSONDecodeError: print(traceback.format_exc()) print("Invalid JSON:", chunk.decode("utf-8")) print("\n") return output # Example usage chat_history = [ { "role": "system", "content": "You are a helpful assistant. Please help the user with their questions and answer as truthfully as you can." }, ] if __name__ == "__main__": while True: prompt = input("Enter your LLM prompt (or 'exit' to quit): ") if prompt == "exit": break chat_history.append({"role": "user", "content": prompt}) msg = fetch_llm_response(chat_history) chat_history.append({"role": "assistant", "content": msg})
Advanced Usage: Load Balancing with NGINX
For handling increased load, consider using a load balancer like NGINX with multiple API deployments across different network ports.
NGINX Configuration Example:
In the NGINX example, the API would be available at localhost:8080/prompt and allow 2 simultaneous connections.
events { worker_connections 1024; } http { upstream backend { server host.docker.internal:5050 max_conns=1; server host.docker.internal:5051 max_conns=1; } server { listen 8080; server_name host.docker.internal; location / { proxy_pass http://backend; proxy_set_header Connection ''; proxy_http_version 1.1; proxy_buffering off; proxy_cache off; proxy_read_timeout 86400s; proxy_send_timeout 86400s; # SSE-specific headers proxy_set_header X-Accel-Buffering no; proxy_set_header Cache-Control no-cache; } } }
Dockerfile Example:
FROM nginx:latest COPY nginx.conf /etc/nginx/nginx.conf