LLM API


LLM API

The LLM API allows external applications to interact with your LLM workflows created in Navigator. This integration enables you to build applications that leverage your custom LLM flows.

The API element is currently in Beta and can only handle one API request at a time.

Getting Started

Setting Up the API Element

Add the API element to your LLM flow and connect it properly:

Required Connections

  • Top Left Input (Initiator) Always connect to the "Initiator" element, which maintains the API connection
  • Second Input (Content) Connect to your LLM flow's output
  • Output Connect back to your LLM flow's input to create the response loop

Configuring API Settings

In the API element settings panel:

  • API Key Create a custom alphanumeric key for authentication
  • Port Number Default is 5050, but you can specify any available port

When using load balancers, each deployment must have a unique port number.

Using the API

Endpoint Details

  • URL http://: Example: http://localhost:5050
  • Method POST
  • Route /prompt
  • Authentication Include your API key in the X-API-Key header
  • Response Format Streamed as server-sent events (SSE)

Request Format

{
  "message": [
    {
      "role": "string",
      "content": "string"
    }
  ]
}

Parameters:

  • role Must be one of "system", "user", or "assistant"
  • content The message text to send to your LLM

Code Examples

cURL Example

curl -N --location 'http://localhost:5050/prompt' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: QWERTY123' \
--data '{
  "message": [{"role": "user", "content": "Once upon a time,"}]
}'

Python Example

import json
import requests

def fetch_llm_response(chat_history) - str:
    url = "http://localhost:5050/prompt"
    headers = {
        "Content-Type": "application/json",
        "X-API-Key": "QWERTY123",
    }
    data = {"message": chat_history}

    response = requests.post(url, headers=headers, json=data, stream=True)
    if response.status_code != 200:
        print(f"Error: Failed to fetch response from server. Status code: {response.status_code}")
        return

    output = ""
    for chunk in response.iter_content(chunk_size=None):
        try:
            responses = [
                ('{"choices": ' + f"{x}")
                for x in str(chunk.decode("utf-8")).split('{"choices":')[1:]
            ]
            for response in responses:
                response_json = json.loads(response.strip())
                print(response_json["choices"][0]["message"]["content"], flush=True, end="")
                output += response_json["choices"][0]["message"]["content"]
        except json.JSONDecodeError:
            print(traceback.format_exc())
            print("Invalid JSON:", chunk.decode("utf-8"))

    print("\n")
    return output

# Example usage
chat_history = [
    {
        "role": "system",
        "content": "You are a helpful assistant. Please help the user with their questions and answer as truthfully as you can."
    },
]

if __name__ == "__main__":
    while True:
        prompt = input("Enter your LLM prompt (or 'exit' to quit): ")
        if prompt == "exit":
            break
        chat_history.append({"role": "user", "content": prompt})
        msg = fetch_llm_response(chat_history)
        chat_history.append({"role": "assistant", "content": msg})

Advanced: Load Balancing

For handling increased traffic, you can set up a load balancer such as NGINX with multiple API deployments.

NGINX Configuration

This configuration allows two simultaneous connections, distributed across two API instances:

events {
    worker_connections 1024;
}

http {
    upstream backend {
        server host.docker.internal:5050 max_conns=1;
        server host.docker.internal:5051 max_conns=1;
    }

    server {
        listen 8080;
        server_name host.docker.internal;

        location / {
            proxy_pass http://backend;
            proxy_set_header Connection '';
            proxy_http_version 1.1;
            proxy_buffering off;
            proxy_cache off;
            proxy_read_timeout 86400s;
            proxy_send_timeout 86400s;

            # SSE-specific headers
            proxy_set_header X-Accel-Buffering no;
            proxy_set_header Cache-Control no-cache;
        }
    }
}

Dockerfile for NGINX

FROM nginx:latest
COPY nginx.conf /etc/nginx/nginx.conf

With this configuration, your API would be accessible at http://localhost:8080/prompt.