Deploy a RAG Pipeline and Chatbot with App Platform for LKE

Create a Linode account to try this guide with a $100 credit.
This credit will be applied to any valid services used during your first 60 days.
Beta Notice
The Akamai App Platform is now available as a limited beta. It is not recommended for production workloads. To register for the beta, visit the Betas page in the Cloud Manager and click the Sign Up button next to the Akamai App Platform Beta.

This guide builds on the LLM (Large Language Model) architecture built in our Deploy an LLM for AI Inferencing with App Platform for LKE guide by deploying a RAG (Retrieval-Augmented Generation) pipeline that indexes a custom data set. RAG is a particular method of context augmentation that attaches relevant data as context when users send queries to an LLM.

Follow the steps in this tutorial to install Kubeflow Pipelines and deploy a RAG pipeline using Akamai App Platform for LKE. The deployment in this guide uses the previously deployed Open WebUI chatbot to respond to queries using a custom data set. The data set you use may vary depending on your use case. For example purposes, this guide uses a sample data set from Linode Docs in Markdown format.

If you prefer a manual installation rather than one using App Platform for LKE, see our Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE guide.

Diagram

Components

Infrastructure

  • Linode GPUs (NVIDIA RTX 4000): Akamai has several high-performance GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including inferencing and image generation.

  • Linode Kubernetes Engine (LKE): LKE is Akamai’s managed Kubernetes service, enabling you to deploy containerized applications without needing to build out and maintain your own Kubernetes cluster.

  • App Platform for LKE: A Kubernetes-based platform that combines developer and operations-centric tools, automation, self-service, and management of containerized application workloads. App Platform for LKE streamlines the application lifecycle from development to delivery and connects numerous CNCF (Cloud Native Computing Foundation) technologies in a single environment, allowing you to construct a bespoke Kubernetes architecture.

Additional Software

  • Open WebUI: A self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG (Retrieval-Augmented Generation) solutions. Users interact with this interface to query the LLM.

  • Milvus: Milvus is an open-source vector database and is used for generative AI workloads. This tutorial uses Milvus to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3 LLM.

  • Kubeflow: An open-source software platform designed for Kubernetes that includes a suite of applications used for machine learning tasks. This tutorial installs all default applications and makes specific use of the following:

    • KServe: Serves machine learning models. The architecture in this guide uses the Llama 3 LLM installed on KServe, which then serves it to other applications, including the chatbot UI.

    • Kubeflow Pipeline: Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run LlamaIndex to process the dataset and store embeddings.

Prerequisites

Set Up Infrastructure

Once your LLM has been deployed and is accessible, complete the following steps to continue setting up your infrastructure.

Sign into the App Platform web UI using the platform-admin account, or another account that uses the platform-admin role.

Add the milvus Helm Chart to the Catalog

  1. Select view > team and team > admin in the top bar.

  2. Click on Catalog in the left menu.

  3. Select Add Helm Chart.

  4. Under Git Repository URL, add the URL to the milvus Helm chart:

    https://github.com/zilliztech/milvus-helm/blob/milvus-4.2.40/charts/milvus/Chart.yaml
  5. Click Get Details to populate the Helm chart details.

  6. Deselect Allow teams to use this chart.

  7. Click Add Chart.

Create an Object Storage Bucket and Access Key for Milvus

  1. In Cloud Manager, navigate to Object Storage.

  2. Click Create Bucket.

  3. Enter a name for your bucket, and select a Region close to, or the same as, your App Platform LKE cluster.

  4. While on the Object Storage page, select the Access Keys tab, and then click Create Access Key.

  5. Enter a name for your access key, select the same Region as your Milvus bucket, and make sure your access key has “Read/Write” access enabled for your bucket.

  6. Save your access key information.

Create a Workload for the Milvus Helm Chart

  1. Select view > team and team > admin in the top bar.

  2. Select Workloads.

  3. Click on Create Workload.

  4. Select the Milvus Helm chart from the Catalog.

  5. Click on Values.

  6. Provide a name for the Workload. This guide uses the Workload name milvus.

  7. Add milvus as the namespace.

  8. Select Create a new namespace.

  9. Set the following values. Make sure to replace externalS3 values with those of your Milvus bucket and access key. You may also need to add lines for the resources requests and limits under standalone:

    Tip: Use Command + F
    While navigating the Values configuration window, use the cmd + F keyboard search feature to locate each value.
    cluster:
      enabled: false
    pulsarv3:
      enabled: false
    minio:
      enabled: false
    externalS3:
      enabled: true
      host: <your-region>.linodeobjects.com
      port: "443"
      accessKey: <your-accesskey>
      secretKey: <your-secretkey>
      useSSL: true
      bucketName: <your-bucket-name>
      cloudProvider: aws
      region: <your-bucket-region-id>
    standalone:
      resources:
        requests:
          nvidia.com/gpu: "1"
        limits:
          nvidia.com/gpu: "1"
    
    Unencrypted Secret Keys
    The Milvus Helm chart does not support the use of a secretKeyRef. Using unencrypted Secret Keys in chart values is not considered a Kubernetes security best-practice.
  10. Click Submit.

Create an Object Storage Bucket and Access Key for kubeflow-pipelines

  1. In Cloud Manager, navigate to Object Storage.

  2. Click Create Bucket.

  3. Enter a name for your bucket, and select a Region close to, or the same as, your App Platform LKE cluster.

  4. While on the Object Storage page, select the Access Keys tab, and then click Create Access Key.

  5. Enter a name for your access key, select a Region as your Kubeflow-Pipelines bucket, and make sure your access key has “Read/Write” access enabled for your bucket.

  6. Save your access key information.

Make Sealed Secrets

Create a Sealed Secret for mlpipeline-minio-artifact

Make a Sealed Secret named mlpipeline-minio-artifact granting access to your kubeflow-pipelines bucket.

  1. Select view > team and team > demo in the top bar.

  2. Select Sealed Secrets from the menu, and click Create SealedSecret.

  3. Add a name for your SealedSecret, mlpipeline-minio-artifact.

  4. Select type kubernetes.io/opaque from the type dropdown menu.

  5. Add the Key and Value details below. Replace YOUR_ACCESS_KEY and YOUR_SECRET_KEY with your kubeflow-pipelines access key information.

    To add a second key for your secret key, click the Add Item button after entering your access key information:

    • Type: kubernetes.io/opaque
    • Key=accesskey, Value=YOUR_ACCESS_KEY
    • Key=secretkey, Value=YOUR_SECRET_KEY
  6. Click Submit.

Create a Sealed Secret for mysql-credentials

Make another Sealed Secret named mysql-credentials to establish root user credentials. Make a strong root password, and save it somewhere secure.

  1. Select view > team and team > demo in the top bar.

  2. Select Sealed Secrets from the menu, and click Create SealedSecret.

  3. Add a name for your SealedSecret, mysql-credentials.

  4. Select type kubernetes.io/opaque from the type dropdown menu.

  5. Add the Key and Value details, replacing YOUR_ROOT_PASSWORD with a strong root password you’ve created and saved:

    • Type: kubernetes.io/opaque
    • Key=username, Value=root
    • Key=password, Value=YOUR_ROOT_PASSWORD
  6. Click Submit.

Create a Network Policy

Create a Network Policy in the Team where the kubeflow-pipelines Helm chart will be installed (Team name demo in this guide). This allows communication between all Kubeflow Pipelines Pods.

  1. Select view > team and team > demo in the top bar.

  2. Select Network Policies from the menu.

  3. Click Create Netpol.

  4. Add a name for the Network Policy.

  5. Select Rule type ingress using the following values, where kfp is the name of the Workload created in the next step:

  6. Click Submit.

Create a Workload and Install the kfp-cluster-resources Helm Chart

  1. Select view > team and team > admin in the top bar.

  2. Select Workloads.

  3. Click on Create Workload.

  4. Select the Kfp-Cluster-Resources Helm chart from the Catalog.

  5. Click on Values.

  6. Provide a name for the Workload. This guide uses the Workload name kfp-cluster-resources.

  7. Add kubeflow as the namespace.

  8. Select Create a new namespace.

  9. Continue with the default values, and click Submit. The Workload may take a few minutes to become ready.

Create a Workload for the kubeflow-pipelines Helm Chart

  1. Select view > team and team > admin in the top bar.

  2. Select Workloads.

  3. Click on Create Workload.

  4. Select the Kubeflow-Pipelines Helm chart from the Catalog.

  5. Click on Values.

  6. Provide a name for the Workload. This guide uses the Workload name kfp.

  7. Add team-demo as the namespace.

  8. Select Create a new namespace.

  9. Set the following values. Replace <your-bucket-region> and <your-bucket-name> with those of your kubeflow-pipelines bucket:

    objectStorage:
      region: <your-bucket-region>
      bucket: <your-bucket-name>
    mysql:
      secret: mysql-credentials
    
  10. Click Submit. It may take a few minutes for the Workload to be ready.

Expose the Kubeflow Pipelines UI

  1. Select view > team and team > demo in the top bar.

  2. Select Services.

  3. Click Create Service.

  4. In the Name dropdown menu, select the ml-pipeline-ui service.

  5. Under Exposure, select External.

  6. Click Submit.

Kubeflow Pipelines is now ready to be used by members of the Team demo.

Set Up Kubeflow Pipeline to Ingest Data

Generate the Pipeline YAML File

The steps below create and use a Python script to create a Kubeflow pipeline file. This YAML file describes each step of the pipeline workflow.

  1. On your local machine, create a virtual environment for Python:

    python3 -m venv .
    source bin/activate
  2. Install the Kubeflow Pipelines package in the virtual environment:

    pip install kfp
  3. Create a file named doc-ingest-pipeline.py with the following contents.

    Replace <cluster-domain> with the domain of your App Platform instance. The <cluster-domain> is contained in the console URL in your browser, where console.lke123456.akamai-apl.net is the URL and lke123456.akamai-apl.net is the <cluster-domain>.

    This script configures the pipeline that downloads the Markdown data set to be ingested, reads the content using LlamaIndex, generates embeddings of the content, and stores the embeddings in the milvus database:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    
    from kfp import dsl
    
    @dsl.component(
            base_image='nvcr.io/nvidia/ai-workbench/python-cuda117:1.0.3',
            packages_to_install=['pymilvus>=2.4.2', 'llama-index', 'llama-index-vector-stores-milvus', 'llama-index-embeddings-huggingface', 'llama-index-llms-openai-like']
            )
    def doc_ingest_component(url: str, collection: str) -> None:
        print(">>> doc_ingest_component")
    
        from urllib.request import urlopen
        from io import BytesIO
        from zipfile import ZipFile
    
        http_response = urlopen(url)
        zipfile = ZipFile(BytesIO(http_response.read()))
        zipfile.extractall(path='./md_docs')
    
        from llama_index.core import SimpleDirectoryReader
    
        # load documents
        documents = SimpleDirectoryReader("./md_docs/", recursive=True, required_exts=[".md"]).load_data()
    
        from llama_index.embeddings.huggingface import HuggingFaceEmbedding
        from llama_index.core import Settings
    
        Settings.embed_model = HuggingFaceEmbedding(
            model_name="sentence-transformers/all-MiniLM-L6-v2"
        )
    
        from llama_index.llms.openai_like import OpenAILike
    
        llm = OpenAILike(
            model="llama3",
            api_base="https://llama3-model-predictor-team-demo.<cluster-domain>/openai/v1",
            api_key = "EMPTY",
            max_tokens = 512)
    
        Settings.llm = llm
    
        from llama_index.core import VectorStoreIndex, StorageContext
        from llama_index.vector_stores.milvus import MilvusVectorStore
    
        vector_store = MilvusVectorStore(uri="http://milvus.milvus.svc.cluster.local:19530", collection=collection, dim=384, overwrite=True)
        storage_context = StorageContext.from_defaults(vector_store=vector_store)
        index = VectorStoreIndex.from_documents(
            documents, storage_context=storage_context
        )
    
    @dsl.pipeline
    def doc_ingest_pipeline(url: str, collection: str) -> None:
        comp = doc_ingest_component(url=url, collection=collection)
    
    from kfp import compiler
    
    compiler.Compiler().compile(doc_ingest_pipeline, 'pipeline.yaml')
  4. Run the script to generate a pipeline YAML file called pipeline.yaml:

    python3 doc-ingest-pipeline.py

    This file is uploaded to Kubeflow in the following section.

  5. Exit the Python virtual environment:

    deactivate

Run the Pipeline Workflow

  1. Select view > team and team > demo in the top bar.

  2. Select Services.

  3. Click on the URL of the service ml-pipeline-ui.

  4. Navigate to the Pipelines section, click Upload pipeline.

  5. Under Upload a file, select the pipeline.yaml file created in the previous section, and click Create.

  6. Select Experiments from the left menu, and click Create experiment. Enter a name and description for the experiment, and click Next.

    When complete, you should be brought to the Runs > Start a new run page.

  7. Complete the following steps to start a new run:

    • Under Pipeline, choose the pipeline pipeline.yaml you just created.

    • For Run Type choose One-off.

    • Provide the collection name and URL of the data set to be processed. This is the zip file with the documents you wish to process.

      To use the sample Linode Docs data set in this guide, use the name linode_docs for collection-string and the following GitHub URL for url-string:

      https://github.com/linode/docs/archive/refs/tags/v1.360.0.zip
  8. Click Start to run the pipeline. When completed, the run is shown with a green checkmark to the left of the run title.

Deploy the Chatbot

The next step is to install the Open WebUI pipeline and web interface and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe.

The Open WebUI Pipeline uses the Milvus database to load context related to the search. The pipeline sends it, and the query, to the Llama 3 LLM instance within KServe. The LLM then sends back a response to the chatbot, and your browser displays an answer informed by the custom data set.

Create a configmap with the RAG Pipeline Files

The RAG pipeline files in this section are not related to the Kubeflow pipeline create in the previous section. Rather, the RAG pipeline instructs the chatbot how to interact with each component created thus far, including the Milvus data store and the Llama 3 LLM.

  1. Select view > team and team > demo in the top bar.

  2. Navigate to the Apps section, and click on Gitea.

  3. In Gitea, navigate to the team-demo-argocd repository on the right.

  4. Click the Add File dropdown, and select New File. Create a file with the name pipeline-files.yaml with the following contents. Replace <cluster-domain> with the domain of your App Platform instance:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    
    apiVersion: v1
    data:
        pipeline-requirements.txt: |
          requests
          pymilvus
          llama-index
          llama-index-vector-stores-milvus
          llama-index-embeddings-huggingface
          llama-index-llms-openai-like
          opencv-python-headless
        rag-pipeline.py: |
          """
          title: RAG Pipeline
          version: 1.0
          description: RAG Pipeline
          """
          from typing import List, Optional, Union, Generator, Iterator
    
          class Pipeline:
    
            def __init__(self):
                self.name = "RAG Pipeline"
                self.index = None
                pass
    
    
            async def on_startup(self):
                from llama_index.embeddings.huggingface import HuggingFaceEmbedding
                from llama_index.core import Settings, VectorStoreIndex
                from llama_index.llms.openai_like import OpenAILike
                from llama_index.vector_stores.milvus import MilvusVectorStore
    
                print(f"on_startup:{__name__}")
    
                Settings.embed_model = HuggingFaceEmbedding(
                        model_name="sentence-transformers/all-MiniLM-L6-v2"
                )
    
                llm = OpenAILike(
                    model="llama3",
                    api_base="https://llama3-model-predictor-team-demo.<cluster-domain>/openai/v1",
                    api_key = "EMPTY",
                    max_tokens = 512)
    
                Settings.llm = llm
    
                vector_store = MilvusVectorStore(uri="http://milvus.milvus.svc.cluster.local:19530", collection="linode_docs", dim=384, overwrite=False)
                self.index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
    
            async def on_shutdown(self):
                print(f"on_shutdown:{__name__}")
                pass
    
    
            def pipe(
                self, user_message: str, model_id: str, messages: List[dict], body: dict
            ) -> Union[str, Generator, Iterator]:
                print(f"pipe:{__name__}")
    
                query_engine = self.index.as_query_engine(streaming=True, similarity_top_k=5)
                response = query_engine.query(user_message)
                print(f"rag_response:{response}")
                return f"{response}"
    kind: ConfigMap
    metadata:
      name: pipelines-files

    Optionally add a title and any notes to the change history, and click Commit Changes.

  5. Go to Apps, and open the Argocd application. Navigate to the team-demo application to see if the configmap has been created. If it is not ready yet, click Refresh as needed.

Deploy the open-webui Pipeline and Web Interface

Update the Kyverno Policy open-webui-policy.yaml created in the previous tutorial (Deploy an LLM for AI Inferencing with App Platform for LKE) to mutate the open-webui pods that will be deployed.

  1. Open the Gitea app, navigate to the team-demo-argocd repository, and open the open-webui-policy.yaml file.

  2. Add the following resources so that the open-webui pods are deployed with the sidecar.istio.io/inject: "false" label that prevents Istio sidecar injection:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
          - resources:
              kinds:
              - StatefulSet
              - Deployment
              selector:
                matchLabels:
                  ## change the value to match the name of the Workload
                  app.kubernetes.io/instance: "linode-docs-chatbot"
          - resources:
              kinds:
              - StatefulSet
              - Deployment
              selector:
                matchLabels:
                  ## change the value to match the name of the Workload
                  app.kubernetes.io/instance: "open-webui-pipelines"

    Be mindful of indentations when editing the YAML file. Both -resources sections should live under the -name > match > any block in rules.

Add the open-webui-pipelines Helm Chart to the Catalog

  1. Select view > team and team > admin in the top bar.

  2. Click on Catalog in the left menu.

  3. Select Add Helm Chart.

  4. Under Github URL, add the URL to the open-webui-pipelines Helm chart:

    https://github.com/open-webui/helm-charts/blob/pipelines-0.4.0/charts/pipelines/Chart.yaml
  5. Click Get Details to populate the open-webui-pipelines Helm chart details. If preferred, rename the Target Directory Name from pipelines to open-webui-pipelines for reference later on.

  6. Leave Allow teams to use this chart selected.

  7. Click Add Chart.

Create a Workload for the open-webui-pipelines Helm Chart

  1. Select view > team and team > demo in the top bar.

  2. Select Workloads.

  3. Click on Create Workload.

  4. Select the Open-Webui-Pipelines Helm chart from the Catalog.

  5. Click on Values.

  6. Provide a name for the Workload. This guide uses the Workload name open-webui-pipelines.

  7. Add in or change the following chart values. Make sure to set the name of the Workload in the nameOverride field.

    You may need to uncomment some fields by removing the # sign in order to make them active. Remember to be mindful of indentations:

    nameOverride: linode-docs-pipeline
    resources:
      requests:
        cpu: "1"
        memory: 512Mi
      limits:
        cpu: "3"
        memory: 2Gi
    ingress:
      enabled: false
    extraEnvVars:
      - name: PIPELINES_REQUIREMENTS_PATH
        value: /opt/pipeline-requirements.txt
      - name: PIPELINES_URLS
        value: file:///opt/rag-pipeline.py
    volumeMounts:
      - name: config-volume
        mountPath: /opt
    volumes:
      - name: config-volume
        configMap:
          name: pipelines-files
    
  8. Click Submit.

Expose the linode-docs-pipeline Service

  1. Select view > team and team > demo in the top bar.

  2. Select Services.

  3. Click Create Service.

  4. In the Name dropdown menu, select the linode-docs-pipeline service.

  5. In the Port dropdown, select port 9099.

  6. Under Exposure, select External.

  7. Click Submit.

  8. Once submitted, copy the URL of the linode-docs-pipeline service to your clipboard.

Create a Workload to Install the open-webui Helm Chart

  1. Select view > team and team > demo in the top bar.

  2. Select Workloads.

  3. Click on Create Workload.

  4. Select the Open-Webui Helm chart from the Catalog. This Helm chart should have been added in the previous Deploy an LLM for AI Inferencing with App Platform for LKE guide.

  5. Click on Values.

  6. Provide a name for the Workload. This guide uses the name linode-docs-chatbot.

  7. Edit the chart to include the below values, and set the name of the Workload in the nameOverride field. Replace <cluster-domain> with your App Platform cluster domain.

    You may need to add new lines for the additional names and values under extraEnvVars (extra environment variables):

    nameOverride: linode-docs-chatbot
    ollama:
      enabled: false
    pipelines:
      enabled: false
    persistence:
      enabled: false
    replicaCount: 1
    extraEnvVars:
      - name: WEBUI_AUTH
        value: "false"
      - name: OPENAI_API_BASE_URLS
        value: https://llama3-model-predictor-team-demo.<cluster-domain>/openai/v1;https://linode-docs-pipeline-demo.<cluster-domain>
      - name: OPENAI_API_KEYS
        value: EMPTY;0p3n-w3bu!
    
  8. Click Submit.

Expose the linode-docs-chatbot Service

  1. Select Services.

  2. Click Create Service.

  3. In the Name dropdown menu, select the linode-docs-chatbot service.

  4. Under Exposure, select External.

  5. Click Submit.

Access the Open Web User Interface

In your list of available Services, click on the URL of the linode-docs-chatbot to navigate to the Open WebUI chatbot interface. Select the model you wish to use in the top left dropdown menu (llama3-model or RAG Pipeline).

The Llama 3 AI model uses information that is pre-trained by other data sources - not your custom data set. If you give this model a query, it will use its pre-trained data set to answer your question in real time.

The RAG Pipeline model defined in this guide uses data from the custom data set with which it was provided. The example data set used in this guide is sourced from Linode Docs. If you give this model a query relevant to your custom data, the chatbot should respond with an answer informed by that data set.

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

This page was originally published on


Your Feedback Is Important

Let us know if this guide was helpful to you.


Join the conversation.
Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.
The Disqus commenting system for Linode Docs requires the acceptance of Functional Cookies, which allow us to analyze site usage so we can measure and improve performance. To view and create comments for this article, please update your Cookie Preferences on this website and refresh this web page. Please note: You must have JavaScript enabled in your browser.