Creating a Local Expert LLM


If you want to create your own subject matter expert to solve your use case, then training your own local LLM is your best bet. Follow the below steps to begin training your own LLM that lives locally, on your device.

LLM Dataset Generation

Before you can generate your dataset, make sure you have your documents you want to use ready.
Gather all relevant documents you want your model to be built off of.
All documents must be in one folder and in the following formats - PDFs, text, and docx.

Start by creating a new Canvas. Then open the Elements Drawer and drag the LLM Dataset Generator element onto the Canvas.

Open the LLM Dataset Generator Element settings and adjust the following settings:

  1. Topic: This can be anything you would like.

  2. References folder path: Using the Select Directory button, choose the folder where your documents are located.

  3. Output folder path: Using the Select Directory button, choose the folder where you would like to save the output of the dataset generation

  4. Dataset size: Add the number of topics you want your dataset to train with.

    We recommend starting with 5 for testing and getting familiar with the process of dataset generation. This generates a list of five topics and is quicker for training, but it will not produce as accurate of a model as a larger dataset size.
    The higher the dataset size, the more accurate your dataset and trained model will be. However, the larger the dataset size, the longer it will take to generate your dataset. It can take several hours to generate large dataset, so be patient.

  5. Next, enter your GPT, Claude, and Gemini API keys.

  6. Now you can now hit Run.

Dependencies will be installed the first time this flow is run, so it may take a while for them to install.

LLM Model Training

Now that you have generated your LLM Dataset, you can train your LLM Model.

Start by creating a new Canvas. Drag the LLM Trainer Element onto the Canvas.

Open the LLM Trainer Element settings and make the following adjustments:

  1. Dataset Folder Path: Using the Select Directory button, choose the folder where you saved your LLM dataset during LLM Dataset Generation.
  2. Artifact Save Path: Using the Select Directory button, choose the folder where you would like to save your trained adapter.
  3. Base Model Assets Path: Using the Select Directory button, choose the folder where you would like to save your base model.
  4. Evaluator API Key: Add a Groq, OpenAI, Claude, or Gemini API key to enable the Faithfulness and Relevancy benchmarks in your training metrics. If you need a free API key, you can generate one for Groq here.
  5. Batch Size: 4 is recommended for testing

    Leave all other settings as the default.

  6. You can now hit Run

This process may take a while, so be patient.

LLM Model Inference

You have generated your LLM Dataset and you have trained your LLM Model. Now we can use our LLM Inference Model and interact with our trained expert.

Start by creating a new Canvas and drag the LLM Chat element onto the Canvas.

Open the LLM Chat Element setting and make the following adjustments:

  1. Max token: 256 is recommended for testing
  2. Model Storage Path: Using the Select Directory button, choose the folder where you saved your base model during LLM Model training.
  3. Model Adapter Folder Path:  Using the Select Directory button, choose the same folder where you saved trained adapter during LLM model training

    Leave all other settings as the default

  4. Drag the Prompt API and Response API elements to the canvas.
  5. Connect the Prompt API to the LLM chat input.
  6. Connect the output of the LLM chat to the Response API.
  7. Verify that the flow on your canvas looks correct.
  8. Hit Run to start the process.

Dependencies will be installed the first time this flow is run and may take a while.