Overview
The LLM API is designed to integrate seamlessly with your LLM flow creations, enabling external applications to interact with your workflows. This guide provides step-by-step instructions on setting up, configuring, and using the API element effectively.
API Element Setup
To set up the API element in your workflow:
- Inputs:
- Top Left Input: Always connect this to the "Initiator" element, which serves as a keep-alive for the API.
- Second Input: Connect this to your LLM flow's output.
- Output: The output of the API element should feed back into your LLM flow's input.
API Element Settings
The API element has customizable settings to suit your needs:
- API Key: This is a user-defined alphanumeric key used for authentication.
- Port Number: The default port is 5050, but it can be customized to any available port.
API Endpoint
To interact with the API, use the following endpoint details:
- URL:
http://<machine-ip>:<port-number>
(Default:http://localhost:5050
) - Method:
POST
- Route:
/prompt
- Authentication: Use the
X-API-Key
header to authenticate. - Response: The response is streamed back as server-sent events (SSE).
POST Payload Specification
When making a POST request to the API, the payload should be structured as follows:
{
"message": [
{
"role": "string",
"content": "string"
}
]
}
- role: One of
"system"
,"user"
, or"assistant"
- content: The chat/message content to be sent to the LLM.
Usage Examples
Here are examples of how to use the API with cURL and Python.
cURL Example:
curl -N --location 'http://localhost:5050/prompt' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: QWERTY123' \
--data '{
"message": [{"role": "user", "content": "Once upon a time,"}]
}'
Python Example (using requests):
import json
import requests
def fetch_llm_response(chat_history) -> str:
url = "http://localhost:5050/prompt"
headers = {
"Content-Type": "application/json",
"X-API-Key": "QWERTY123",
}
data = {"message": chat_history}
response = requests.post(url, headers=headers, json=data, stream=True)
if response.status_code != 200:
print(f"Error: Failed to fetch response from server. Status code: {response.status_code}")
return
output = ""
for chunk in response.iter_content(chunk_size=None):
try:
responses = [
('{"choices": ' + f"{x}")
for x in str(chunk.decode("utf-8")).split('{"choices":')[1:]
]
for response in responses:
response_json = json.loads(response.strip())
print(response_json["choices"][0]["message"]["content"], flush=True, end="")
output += response_json["choices"][0]["message"]["content"]
except json.JSONDecodeError:
print(traceback.format_exc())
print("Invalid JSON:", chunk.decode("utf-8"))
print("\n")
return output
# Example usage
chat_history = [
{
"role": "system",
"content": "You are a helpful assistant. Please help the user with their questions and answer as truthfully as you can."
},
]
if __name__ == "__main__":
while True:
prompt = input("Enter your LLM prompt (or 'exit' to quit): ")
if prompt == "exit":
break
chat_history.append({"role": "user", "content": prompt})
msg = fetch_llm_response(chat_history)
chat_history.append({"role": "assistant", "content": msg})
Advanced Usage: Load Balancing with NGINX
For handling increased load, consider using a load balancer like NGINX with multiple API deployments across different network ports.
NGINX Configuration Example:
events {
worker_connections 1024;
}
http {
upstream backend {
server host.docker.internal:5050 max_conns=1;
server host.docker.internal:5051 max_conns=1;
}
server {
listen 8080;
server_name host.docker.internal;
location / {
proxy_pass http://backend;
proxy_set_header Connection '';
proxy_http_version 1.1;
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
# SSE-specific headers
proxy_set_header X-Accel-Buffering no;
proxy_set_header Cache-Control no-cache;
}
}
}
Dockerfile Example:
FROM nginx:latest
COPY nginx.conf /etc/nginx/nginx.conf
Notes
- The API element is currently in Beta and can only handle one API request at a time.
- When using load balancers, ensure that each deployment has the API "port" setting incremented.
- In the NGINX example, the API would be available at
localhost:8080/prompt
and allow 2 simultaneous connections.
Related to