You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/index.md
+52-49Lines changed: 52 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,15 +45,19 @@ This tutorial requires you to have access to a few different services and local
45
45
- You should have both [kubectl](https://kubernetes.io/docs/reference/kubectl/) and [Helm](https://helm.sh/) installed on your local machine. These apps are used for managing your LKE cluster and installing applications to your cluster.
46
46
- A **custom dataset** is needed, preferably in Markdown format, though you can use other types of data if you modify the LlamaIndex configuration provided in this tutorial. This dataset should contain all of the information you want used by the Llama 3 LLM. This tutorial uses a Markdown dataset containing all of the Linode Docs.
47
47
48
-
# Set up infrastructure
49
-
50
-
The first step is to provision the infrastructure needed for this tutorial and configure it with kubectl, so that you can manage it locally and install software through helm. As part of this process, we’ll also need to install the NVIDIA GPU operator at this step so that the NVIDIA cards within the GPU worker nodes can be used on Kubernetes.
These instructions are intended as a proof of concept for testing and demonstration purposes. They are not designed as a complete production reference architecture.
The configuration instructions in this document are expected to not expose any services to the Internet. Instead, they run on the Kubernetes cluster's internal network, and to access the services it’s necessary to forward their ports locally first. This configuration is restricted by design to avoid accidentally exposing those services before they can be properly secured. Additionally, some services will run with no authentication or default credentials configured.
54
54
It’s not part of the scope of this document to cover the setup required to secure this configuration for a production deployment.
55
55
{{< /note >}}
56
56
57
+
# Set up infrastructure
58
+
59
+
The first step is to provision the infrastructure needed for this tutorial and configure it with kubectl, so that you can manage it locally and install software through helm. As part of this process, we’ll also need to install the NVIDIA GPU operator at this step so that the NVIDIA cards within the GPU worker nodes can be used on Kubernetes.
60
+
57
61
1.**Provision an LKE cluster.** We recommend using at least two **RTX4000 Ada x2 Medium** GPU plans (plan ID: `g2-gpu-rtx4000a2-m`), though you can adjust this as needed. For reference, Kubeflow recommends 32 GB of RAM and 16 CPU cores. This tutorial has been tested using Kubernetes v1.31, though other versions should also work. To learn more about provisioning a cluster, see the [Create a cluster](https://techdocs.akamai.com/cloud-computing/docs/create-a-cluster) guide.
58
62
59
63
{{< note noTitle=true >}}
@@ -114,7 +118,7 @@ Next, let’s deploy Kubeflow on the LKE cluster. These instructions deploy all
114
118
115
119
After Kubeflow has been installed, we can now deploy the Llama 3 LLM to KServe. This tutorial uses HuggingFace (a platform that provides pre-trained AI models) to deploy Llama 3 to the LKE cluster. Specifically, these instructions use the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model.
116
120
117
-
1. Create a Hugging Face token to use for this project. See the Hugging Face user documentation on [User access tokens](https://huggingface.co/docs/hub/en/security-tokens) for instructions.
121
+
1. Create a Hugging Face token with **READ** access to use for this project. See the Hugging Face user documentation on [User access tokens](https://huggingface.co/docs/hub/en/security-tokens) for instructions.
118
122
119
123
1. Create the manifest file for the [Kubernetes secret](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file/). You can use the following as a template:
120
124
@@ -131,7 +135,7 @@ After Kubeflow has been installed, we can now deploy the Llama 3 LLM to KServe.
131
135
1. Then, create the secret on your cluster by applying the manifest file:
132
136
133
137
```command
134
-
kubectl apply -f hf-secret.yaml
138
+
kubectl apply -f ./hf-secret.yaml
135
139
```
136
140
137
141
1. Create a config file for deploying the Llama 3 model on your cluster.
@@ -174,7 +178,11 @@ After Kubeflow has been installed, we can now deploy the Llama 3 LLM to KServe.
174
178
kubectl apply -f model.yaml
175
179
```
176
180
177
-
Once the configuration applies, Llama 3 will be running on your LKE cluster.
181
+
1. Verify that the new Llama 3 pod is ready before continuing.
182
+
183
+
```command
184
+
kubectl get pods -A
185
+
```
178
186
179
187
### Install Milvus
180
188
@@ -300,6 +308,8 @@ This tutorial employs a Python script to create the YAML file used within Kubefl
300
308
301
309
This creates a file called pipeline.yaml, which you will upload to Kubeflow in the following section.
302
310
311
+
1. Run `deactivate` to exit the Python virtual environment.
312
+
303
313
### Run the pipeline workflow
304
314
305
315
1. Configure port forwarding on your cluster through kubectl so that you can access the Kubeflow interface from your local computer.
@@ -310,6 +320,10 @@ This tutorial employs a Python script to create the YAML file used within Kubefl
310
320
311
321
1. Open a web browser and navigate to the Kubeflow interface at http://localhost:8080. A login screen should appear.
312
322
323
+
{{< note type="warning" noTitle=true >}}
324
+
If the browser instead shows the error `Jwks doesn't have key to match kid or alg from Jwt`, there may be a previous JWT session that is interfering. Opening this URL in your browser's private or incognito mode should resolve this.
325
+
{{< /note >}}
326
+
313
327
1. Log in with the username `user@example.com` and use the password that you created in a previous step.
314
328
315
329
1. Navigate to the Pipelines > Experiments page and click the button to create a new experiment. Enter a name and description for the experiment and click **Next**.
@@ -359,60 +373,49 @@ Despite the naming, these RAG pipeline files are not related to the Kubeflow pip
359
373
360
374
class Pipeline:
361
375
362
-
def __init__(self):
363
-
self.name = "RAG Pipeline"
364
-
self.index = None
365
-
pass
376
+
def __init__(self):
377
+
self.name = "RAG Pipeline"
378
+
self.index = None
379
+
pass
366
380
367
381
368
-
async def on_startup(self):
369
-
# This function is called when the server is started.
370
-
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
371
-
from llama_index.core import Settings, VectorStoreIndex
372
-
from llama_index.llms.openai_like import OpenAILike
373
-
from llama_index.vector_stores.milvus import MilvusVectorStore
382
+
async def on_startup(self):
383
+
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
384
+
from llama_index.core import Settings, VectorStoreIndex
385
+
from llama_index.llms.openai_like import OpenAILike
386
+
from llama_index.vector_stores.milvus import MilvusVectorStore
0 commit comments