Product docs and API reference are now on Akamai TechDocs.
Search product docs.
Search for “” in product docs.
Search API reference.
Search for “” in API reference.
Search Results
results matching
results
No Results
Filters
Deploy a RAG Pipeline and Chatbot with App Platform for LKE
This guide builds on the LLM (Large Language Model) architecture built in our Deploy an LLM for AI Inferencing with App Platform for LKE guide by deploying a RAG (Retrieval-Augmented Generation) pipeline that indexes a custom data set. RAG is a particular method of context augmentation that attaches relevant data as context when users send queries to an LLM.
Follow the steps in this tutorial to install Kubeflow Pipelines and deploy a RAG pipeline using Akamai App Platform for LKE. The deployment in this guide uses the previously deployed Open WebUI chatbot to respond to queries using a custom data set. The data set you use may vary depending on your use case. For example purposes, this guide uses a sample data set from Linode Docs in Markdown format.
If you prefer a manual installation rather than one using App Platform for LKE, see our Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE guide.
Diagram
Components
Infrastructure
Linode GPUs (NVIDIA RTX 4000): Akamai has several high-performance GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including inferencing and image generation.
Linode Kubernetes Engine (LKE): LKE is Akamai’s managed Kubernetes service, enabling you to deploy containerized applications without needing to build out and maintain your own Kubernetes cluster.
App Platform for LKE: A Kubernetes-based platform that combines developer and operations-centric tools, automation, self-service, and management of containerized application workloads. App Platform for LKE streamlines the application lifecycle from development to delivery and connects numerous CNCF (Cloud Native Computing Foundation) technologies in a single environment, allowing you to construct a bespoke Kubernetes architecture.
Additional Software
Open WebUI: A self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG (Retrieval-Augmented Generation) solutions. Users interact with this interface to query the LLM.
Milvus: Milvus is an open-source vector database and is used for generative AI workloads. This tutorial uses Milvus to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3 LLM.
Kubeflow: An open-source software platform designed for Kubernetes that includes a suite of applications used for machine learning tasks. This tutorial installs all default applications and makes specific use of the following:
KServe: Serves machine learning models. The architecture in this guide uses the Llama 3 LLM installed on KServe, which then serves it to other applications, including the chatbot UI.
Kubeflow Pipeline: Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run LlamaIndex to process the dataset and store embeddings.
Prerequisites
Complete the deployment in the Deploy an LLM for AI Inferencing with App Platform for LKE guide. An LKE cluster consisting of at least 3 RTX4000 Ada x1 Medium GPU nodes is recommended for AI inference workloads.
Python3 and the venv Python module installed on your local machine.
Set Up Infrastructure
Once your LLM has been deployed and is accessible, complete the following steps to continue setting up your infrastructure.
Sign into the App Platform web UI using the platform-admin
account, or another account that uses the platform-admin
role.
Add the milvus Helm Chart to the Catalog
Select view > team and team > admin in the top bar.
Click on Catalog in the left menu.
Select Add Helm Chart.
Under Git Repository URL, add the URL to the
milvus
Helm chart:https://github.com/zilliztech/milvus-helm/blob/milvus-4.2.40/charts/milvus/Chart.yaml
Click Get Details to populate the Helm chart details.
Deselect Allow teams to use this chart.
Click Add Chart.
Create an Object Storage Bucket and Access Key for Milvus
In Cloud Manager, navigate to Object Storage.
Click Create Bucket.
Enter a name for your bucket, and select a Region close to, or the same as, your App Platform LKE cluster.
While on the Object Storage page, select the Access Keys tab, and then click Create Access Key.
Enter a name for your access key, select the same Region as your Milvus bucket, and make sure your access key has “Read/Write” access enabled for your bucket.
Save your access key information.
Create a Workload for the Milvus Helm Chart
Select view > team and team > admin in the top bar.
Select Workloads.
Click on Create Workload.
Select the Milvus Helm chart from the Catalog.
Click on Values.
Provide a name for the Workload. This guide uses the Workload name
milvus
.Add
milvus
as the namespace.Select Create a new namespace.
Set the following values. Make sure to replace
externalS3
values with those of your Milvus bucket and access key. You may also need to add lines for the resources requests and limits understandalone
:Tip: Use Command + F While navigating the Values configuration window, use the cmd + F keyboard search feature to locate each value.cluster: enabled: false pulsarv3: enabled: false minio: enabled: false externalS3: enabled: true host: <your-region>.linodeobjects.com port: "443" accessKey: <your-accesskey> secretKey: <your-secretkey> useSSL: true bucketName: <your-bucket-name> cloudProvider: aws region: <your-bucket-region-id> standalone: resources: requests: nvidia.com/gpu: "1" limits: nvidia.com/gpu: "1"
Unencrypted Secret Keys The Milvus Helm chart does not support the use of a secretKeyRef. Using unencrypted Secret Keys in chart values is not considered a Kubernetes security best-practice.Click Submit.
Create an Object Storage Bucket and Access Key for kubeflow-pipelines
In Cloud Manager, navigate to Object Storage.
Click Create Bucket.
Enter a name for your bucket, and select a Region close to, or the same as, your App Platform LKE cluster.
While on the Object Storage page, select the Access Keys tab, and then click Create Access Key.
Enter a name for your access key, select a Region as your Kubeflow-Pipelines bucket, and make sure your access key has “Read/Write” access enabled for your bucket.
Save your access key information.
Make Sealed Secrets
Create a Sealed Secret for mlpipeline-minio-artifact
Make a Sealed Secret named mlpipeline-minio-artifact
granting access to your kubeflow-pipelines
bucket.
Select view > team and team > demo in the top bar.
Select Sealed Secrets from the menu, and click Create SealedSecret.
Add a name for your SealedSecret,
mlpipeline-minio-artifact
.Select type kubernetes.io/opaque from the type dropdown menu.
Add the Key and Value details below. Replace YOUR_ACCESS_KEY and YOUR_SECRET_KEY with your
kubeflow-pipelines
access key information.To add a second key for your secret key, click the Add Item button after entering your access key information:
- Type:
kubernetes.io/opaque
- Key=
accesskey
, Value=YOUR_ACCESS_KEY - Key=
secretkey
, Value=YOUR_SECRET_KEY
- Type:
Click Submit.
Create a Sealed Secret for mysql-credentials
Make another Sealed Secret named mysql-credentials
to establish root user credentials. Make a strong root password, and save it somewhere secure.
Select view > team and team > demo in the top bar.
Select Sealed Secrets from the menu, and click Create SealedSecret.
Add a name for your SealedSecret,
mysql-credentials
.Select type kubernetes.io/opaque from the type dropdown menu.
Add the Key and Value details, replacing YOUR_ROOT_PASSWORD with a strong root password you’ve created and saved:
- Type:
kubernetes.io/opaque
- Key=
username
, Value=root
- Key=
password
, Value=YOUR_ROOT_PASSWORD
- Type:
Click Submit.
Create a Network Policy
Create a Network Policy in the Team where the kubeflow-pipelines
Helm chart will be installed (Team name demo in this guide). This allows communication between all Kubeflow Pipelines Pods.
Select view > team and team > demo in the top bar.
Select Network Policies from the menu.
Click Create Netpol.
Add a name for the Network Policy.
Select Rule type
ingress
using the following values, wherekfp
is the name of the Workload created in the next step:Selector label name:
app.kubernetes.io/instance
Selector label value:
kfp
Click Submit.
Create a Workload and Install the kfp-cluster-resources Helm Chart
Select view > team and team > admin in the top bar.
Select Workloads.
Click on Create Workload.
Select the Kfp-Cluster-Resources Helm chart from the Catalog.
Click on Values.
Provide a name for the Workload. This guide uses the Workload name
kfp-cluster-resources
.Add
kubeflow
as the namespace.Select Create a new namespace.
Continue with the default values, and click Submit. The Workload may take a few minutes to become ready.
Create a Workload for the kubeflow-pipelines Helm Chart
Select view > team and team > admin in the top bar.
Select Workloads.
Click on Create Workload.
Select the Kubeflow-Pipelines Helm chart from the Catalog.
Click on Values.
Provide a name for the Workload. This guide uses the Workload name
kfp
.Add
team-demo
as the namespace.Select Create a new namespace.
Set the following values. Replace <your-bucket-region> and <your-bucket-name> with those of your
kubeflow-pipelines
bucket:objectStorage: region: <your-bucket-region> bucket: <your-bucket-name> mysql: secret: mysql-credentials
Click Submit. It may take a few minutes for the Workload to be ready.
Expose the Kubeflow Pipelines UI
Select view > team and team > demo in the top bar.
Select Services.
Click Create Service.
In the Name dropdown menu, select the
ml-pipeline-ui
service.Under Exposure, select External.
Click Submit.
Kubeflow Pipelines is now ready to be used by members of the Team demo.
Set Up Kubeflow Pipeline to Ingest Data
Generate the Pipeline YAML File
The steps below create and use a Python script to create a Kubeflow pipeline file. This YAML file describes each step of the pipeline workflow.
On your local machine, create a virtual environment for Python:
python3 -m venv . source bin/activate
Install the Kubeflow Pipelines package in the virtual environment:
pip install kfp
Create a file named
doc-ingest-pipeline.py
with the following contents.Replace <cluster-domain> with the domain of your App Platform instance. The <cluster-domain> is contained in the console URL in your browser, where
console.lke123456.akamai-apl.net
is the URL andlke123456.akamai-apl.net
is the <cluster-domain>.This script configures the pipeline that downloads the Markdown data set to be ingested, reads the content using LlamaIndex, generates embeddings of the content, and stores the embeddings in the milvus database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
from kfp import dsl @dsl.component( base_image='nvcr.io/nvidia/ai-workbench/python-cuda117:1.0.3', packages_to_install=['pymilvus>=2.4.2', 'llama-index', 'llama-index-vector-stores-milvus', 'llama-index-embeddings-huggingface', 'llama-index-llms-openai-like'] ) def doc_ingest_component(url: str, collection: str) -> None: print(">>> doc_ingest_component") from urllib.request import urlopen from io import BytesIO from zipfile import ZipFile http_response = urlopen(url) zipfile = ZipFile(BytesIO(http_response.read())) zipfile.extractall(path='./md_docs') from llama_index.core import SimpleDirectoryReader # load documents documents = SimpleDirectoryReader("./md_docs/", recursive=True, required_exts=[".md"]).load_data() from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings Settings.embed_model = HuggingFaceEmbedding( model_name="sentence-transformers/all-MiniLM-L6-v2" ) from llama_index.llms.openai_like import OpenAILike llm = OpenAILike( model="llama3", api_base="https://llama3-model-predictor-team-demo.<cluster-domain>/openai/v1", api_key = "EMPTY", max_tokens = 512) Settings.llm = llm from llama_index.core import VectorStoreIndex, StorageContext from llama_index.vector_stores.milvus import MilvusVectorStore vector_store = MilvusVectorStore(uri="http://milvus.milvus.svc.cluster.local:19530", collection=collection, dim=384, overwrite=True) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context ) @dsl.pipeline def doc_ingest_pipeline(url: str, collection: str) -> None: comp = doc_ingest_component(url=url, collection=collection) from kfp import compiler compiler.Compiler().compile(doc_ingest_pipeline, 'pipeline.yaml')
Run the script to generate a pipeline YAML file called
pipeline.yaml
:python3 doc-ingest-pipeline.py
This file is uploaded to Kubeflow in the following section.
Exit the Python virtual environment:
deactivate
Run the Pipeline Workflow
Select view > team and team > demo in the top bar.
Select Services.
Click on the URL of the service
ml-pipeline-ui
.Navigate to the Pipelines section, click Upload pipeline.
Under Upload a file, select the
pipeline.yaml
file created in the previous section, and click Create.Select Experiments from the left menu, and click Create experiment. Enter a name and description for the experiment, and click Next.
When complete, you should be brought to the Runs > Start a new run page.
Complete the following steps to start a new run:
Under Pipeline, choose the pipeline
pipeline.yaml
you just created.For Run Type choose One-off.
Provide the collection name and URL of the data set to be processed. This is the zip file with the documents you wish to process.
To use the sample Linode Docs data set in this guide, use the name
linode_docs
for collection-string and the following GitHub URL for url-string:https://github.com/linode/docs/archive/refs/tags/v1.360.0.zip
Click Start to run the pipeline. When completed, the run is shown with a green checkmark to the left of the run title.
Deploy the Chatbot
The next step is to install the Open WebUI pipeline and web interface and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe.
The Open WebUI Pipeline uses the Milvus database to load context related to the search. The pipeline sends it, and the query, to the Llama 3 LLM instance within KServe. The LLM then sends back a response to the chatbot, and your browser displays an answer informed by the custom data set.
Create a configmap with the RAG Pipeline Files
The RAG pipeline files in this section are not related to the Kubeflow pipeline create in the previous section. Rather, the RAG pipeline instructs the chatbot how to interact with each component created thus far, including the Milvus data store and the Llama 3 LLM.
Select view > team and team > demo in the top bar.
Navigate to the Apps section, and click on Gitea.
In Gitea, navigate to the
team-demo-argocd
repository on the right.Click the Add File dropdown, and select New File. Create a file with the name
pipeline-files.yaml
with the following contents. Replace <cluster-domain> with the domain of your App Platform instance:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
apiVersion: v1 data: pipeline-requirements.txt: | requests pymilvus llama-index llama-index-vector-stores-milvus llama-index-embeddings-huggingface llama-index-llms-openai-like opencv-python-headless rag-pipeline.py: | """ title: RAG Pipeline version: 1.0 description: RAG Pipeline """ from typing import List, Optional, Union, Generator, Iterator class Pipeline: def __init__(self): self.name = "RAG Pipeline" self.index = None pass async def on_startup(self): from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings, VectorStoreIndex from llama_index.llms.openai_like import OpenAILike from llama_index.vector_stores.milvus import MilvusVectorStore print(f"on_startup:{__name__}") Settings.embed_model = HuggingFaceEmbedding( model_name="sentence-transformers/all-MiniLM-L6-v2" ) llm = OpenAILike( model="llama3", api_base="https://llama3-model-predictor-team-demo.<cluster-domain>/openai/v1", api_key = "EMPTY", max_tokens = 512) Settings.llm = llm vector_store = MilvusVectorStore(uri="http://milvus.milvus.svc.cluster.local:19530", collection="linode_docs", dim=384, overwrite=False) self.index = VectorStoreIndex.from_vector_store(vector_store=vector_store) async def on_shutdown(self): print(f"on_shutdown:{__name__}") pass def pipe( self, user_message: str, model_id: str, messages: List[dict], body: dict ) -> Union[str, Generator, Iterator]: print(f"pipe:{__name__}") query_engine = self.index.as_query_engine(streaming=True, similarity_top_k=5) response = query_engine.query(user_message) print(f"rag_response:{response}") return f"{response}" kind: ConfigMap metadata: name: pipelines-files
Optionally add a title and any notes to the change history, and click Commit Changes.
Go to Apps, and open the Argocd application. Navigate to the
team-demo
application to see if the configmap has been created. If it is not ready yet, click Refresh as needed.
Deploy the open-webui Pipeline and Web Interface
Update the Kyverno Policy open-webui-policy.yaml
created in the previous tutorial (Deploy an LLM for AI Inferencing with App Platform for LKE) to mutate the open-webui
pods that will be deployed.
Open the Gitea app, navigate to the
team-demo-argocd
repository, and open theopen-webui-policy.yaml
file.Add the following resources so that the
open-webui
pods are deployed with thesidecar.istio.io/inject: "false"
label that prevents Istio sidecar injection:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
- resources: kinds: - StatefulSet - Deployment selector: matchLabels: ## change the value to match the name of the Workload app.kubernetes.io/instance: "linode-docs-chatbot" - resources: kinds: - StatefulSet - Deployment selector: matchLabels: ## change the value to match the name of the Workload app.kubernetes.io/instance: "open-webui-pipelines"
Be mindful of indentations when editing the YAML file. Both
-resources
sections should live under the-name
>match
>any
block inrules
.
Add the open-webui-pipelines Helm Chart to the Catalog
Select view > team and team > admin in the top bar.
Click on Catalog in the left menu.
Select Add Helm Chart.
Under Github URL, add the URL to the
open-webui-pipelines
Helm chart:https://github.com/open-webui/helm-charts/blob/pipelines-0.4.0/charts/pipelines/Chart.yaml
Click Get Details to populate the
open-webui-pipelines
Helm chart details. If preferred, rename the Target Directory Name frompipelines
toopen-webui-pipelines
for reference later on.Leave Allow teams to use this chart selected.
Click Add Chart.
Create a Workload for the open-webui-pipelines Helm Chart
Select view > team and team > demo in the top bar.
Select Workloads.
Click on Create Workload.
Select the Open-Webui-Pipelines Helm chart from the Catalog.
Click on Values.
Provide a name for the Workload. This guide uses the Workload name
open-webui-pipelines
.Add in or change the following chart values. Make sure to set the name of the Workload in the
nameOverride
field.You may need to uncomment some fields by removing the
#
sign in order to make them active. Remember to be mindful of indentations:nameOverride: linode-docs-pipeline resources: requests: cpu: "1" memory: 512Mi limits: cpu: "3" memory: 2Gi ingress: enabled: false extraEnvVars: - name: PIPELINES_REQUIREMENTS_PATH value: /opt/pipeline-requirements.txt - name: PIPELINES_URLS value: file:///opt/rag-pipeline.py volumeMounts: - name: config-volume mountPath: /opt volumes: - name: config-volume configMap: name: pipelines-files
Click Submit.
Expose the linode-docs-pipeline Service
Select view > team and team > demo in the top bar.
Select Services.
Click Create Service.
In the Name dropdown menu, select the
linode-docs-pipeline
service.In the Port dropdown, select port
9099
.Under Exposure, select External.
Click Submit.
Once submitted, copy the URL of the
linode-docs-pipeline
service to your clipboard.
Create a Workload to Install the open-webui Helm Chart
Select view > team and team > demo in the top bar.
Select Workloads.
Click on Create Workload.
Select the Open-Webui Helm chart from the Catalog. This Helm chart should have been added in the previous Deploy an LLM for AI Inferencing with App Platform for LKE guide.
Click on Values.
Provide a name for the Workload. This guide uses the name
linode-docs-chatbot
.Edit the chart to include the below values, and set the name of the Workload in the
nameOverride
field. Replace <cluster-domain> with your App Platform cluster domain.You may need to add new lines for the additional names and values under
extraEnvVars
(extra environment variables):nameOverride: linode-docs-chatbot ollama: enabled: false pipelines: enabled: false persistence: enabled: false replicaCount: 1 extraEnvVars: - name: WEBUI_AUTH value: "false" - name: OPENAI_API_BASE_URLS value: https://llama3-model-predictor-team-demo.<cluster-domain>/openai/v1;https://linode-docs-pipeline-demo.<cluster-domain> - name: OPENAI_API_KEYS value: EMPTY;0p3n-w3bu!
Click Submit.
Expose the linode-docs-chatbot Service
Select Services.
Click Create Service.
In the Name dropdown menu, select the
linode-docs-chatbot
service.Under Exposure, select External.
Click Submit.
Access the Open Web User Interface
In your list of available Services, click on the URL of the linode-docs-chatbot
to navigate to the Open WebUI chatbot interface. Select the model you wish to use in the top left dropdown menu (llama3-model
or RAG Pipeline
).
The Llama 3 AI model uses information that is pre-trained by other data sources - not your custom data set. If you give this model a query, it will use its pre-trained data set to answer your question in real time.
The RAG Pipeline model defined in this guide uses data from the custom data set with which it was provided. The example data set used in this guide is sourced from Linode Docs. If you give this model a query relevant to your custom data, the chatbot should respond with an answer informed by that data set.




More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on