RAG
Document Ingestion Pipeline
The Ingestion Pipeline flow processes your documents and turns them into a format that AI models can understand, interact with, search, and respond with. The output of this flow is a Vector Database that you will use when chatting with your documents in the Inference Flow.
Turning Documents into Editable and Searchable Data
- Add the OCR element to Canvas.
- Click the ... to open OCR settings and select the folder of documents you want to interact with and upload it to the Data Path.
Chunking the Information
Chunking divides large documents or datasets into smaller, manageable pieces or "chunks" before indexing and retrieval. Chunking improves retrieval, enhances context, and reduces overlap.
- Add the Chunking element to Canvas.
- Click the ...to open Chunking settings
These settings have been set to a good default, so no adjustment necessary to run the flow.
- Chunk size: The max size of a chunk
- Chunk overlap: Number of overlapping tokens per chunk. This ensures continuity and coherence when retrieving multiple chunks. A max of 200 is reasonable for most applications
-
Minimum Characters per Sentence: The smallest number of characters that a sentence must have to be considered valid or included in the chunking process
This ensures that only sentences with meaningful content are included, filtering out overly short or incomplete sentences and helps maintain the quality and context of chunks or processed data by avoiding noise from trivial or irrelevant text -
Minimum Sentences per Chunk: The smallest number of sentences that should be grouped together to form a chunk during the chunking process
This ensures that each chunk contains enough contextual information to be meaningful and useful for retrieval or processing and also prevents creating chunks that are too small to make sense on their own or lack sufficient context for downstream tasks
Applying Embedding Models
This process organizes and stores numerical representations of data (vectors) in a structured format that allows for efficient searching, retrieval, and comparison in a high-dimensional space. Embedding Models identify relevant information when given a user's query by looking at the "meaning" behind the query.
- Add the Embedding element to Canvas.
- Click the ... to open the embedding element settings. From here you can choose your embedding model.
You will need to make sure this model is the same model you use at the Embedding stage of the Inference Flow
We support a range of the best open source embedding models and are adding to the list as more are published. We've selected a great choice as a default, but you can choose from any in the dropdown.
Vector Indexing
This step vectorizes the file and saves under the folder path you specify. This is the folder you will point to in the Vector Retrieval element in the Inference flow.
- Add the Vector element to Canvas.
- Click the ... to open the Vector element settings. From here you can choose your Save folder path.
RAG Inference Pipeline
The Inference Pipeline flow allows you to chat with your documents using the Vector Database created in the Ingestion Pipeline. This flow processes user queries and returns relevant responses from your documents.
API Element
This element adds an API on top of your flow, exposing an endpoint that can be used by apps like Companion or your own applications.
- Add the API element to Canvas.
- Click the ... to open API settings. Here you will be able to add any string of characters as an API key to be used by other apps. You also have the option to add timeout settings.
You can alternatively use the Prompt API and Response API elements as your input and output.
Configuring the Embedding Model
- Add the Embedding element to Canvas.
- Click the ... to open the embedding element settings
This must be the same embedding model used in your Ingestion Pipeline flow
We support a range of the best open source embedding models and are adding to the list as more are published. We've selected a great choice as a default, but you can choose from any in the dropdown.
Setting Up Vector Retrieval
- Add the Vector Retrieval element to Canvas.
- Click the ... to open Vector Retrieval settings and upload the folder created by the Vector Indexing element in your Ingestion Pipeline flow.
Configuring Prompt Templates
This element formats the retrieved vector and formats it to be used by the LLM. There are no settings to change, but it is a required step in the pipeline to make sure your LLM gives a coherent response.
- Add the Prompt Templating element to Canvas.
This element formats the retrieved vector and formats it to be used by the LLM. There are no settings to change, but it is a required step in the pipeline to make sure your LLM gives a coherent response.
Setting Up the Language Model
- Add either the LLM Chat or LLM element to Canvas.
Use the Large Language Model Chat element to add custom models and control the system prompt or other settings. Use the LLM element to use webFrame and intelligently distribute the model across your cluster.
- Click the ... to configure your model settings. From here you can change response token limits and system prompts