Skip to content

Commit 518a0bd

Browse files
committed
format files
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
1 parent 9af0d33 commit 518a0bd

107 files changed

Lines changed: 24808 additions & 11049 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.pre-commit-config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ repos:
8989
hooks:
9090
- id: prettier
9191
args: [--print-width=120]
92-
exclude: (.*\.svelte)$
92+
types_or: [yaml, markdown, html, css, scss, javascript, json]
9393
additional_dependencies:
9494
- prettier@3.2.5
9595

@@ -111,6 +111,7 @@ repos:
111111
rev: v2.2.6
112112
hooks:
113113
- id: codespell
114+
args: [-w]
114115
additional_dependencies:
115116
- tomli
116117

CODE_OF_CONDUCT.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,23 +17,23 @@ diverse, inclusive, and healthy community.
1717
Examples of behavior that contributes to a positive environment for our
1818
community include:
1919

20-
* Demonstrating empathy and kindness toward other people
21-
* Being respectful of differing opinions, viewpoints, and experiences
22-
* Giving and gracefully accepting constructive feedback
23-
* Accepting responsibility and apologizing to those affected by our mistakes,
20+
- Demonstrating empathy and kindness toward other people
21+
- Being respectful of differing opinions, viewpoints, and experiences
22+
- Giving and gracefully accepting constructive feedback
23+
- Accepting responsibility and apologizing to those affected by our mistakes,
2424
and learning from the experience
25-
* Focusing on what is best not just for us as individuals, but for the overall
25+
- Focusing on what is best not just for us as individuals, but for the overall
2626
community
2727

2828
Examples of unacceptable behavior include:
2929

30-
* The use of sexualized language or imagery, and sexual attention or advances of
30+
- The use of sexualized language or imagery, and sexual attention or advances of
3131
any kind
32-
* Trolling, insulting or derogatory comments, and personal or political attacks
33-
* Public or private harassment
34-
* Publishing others' private information, such as a physical or email address,
32+
- Trolling, insulting or derogatory comments, and personal or political attacks
33+
- Public or private harassment
34+
- Publishing others' private information, such as a physical or email address,
3535
without their explicit permission
36-
* Other conduct which could reasonably be considered inappropriate in a
36+
- Other conduct which could reasonably be considered inappropriate in a
3737
professional setting
3838

3939
## Enforcement Responsibilities

ChatQnA/README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
This ChatQnA use case performs RAG using LangChain, Redis vectordb and Text Generation Inference on Intel Gaudi2. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products) for more details.
22

33
# Environment Setup
4+
45
To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, please follow these steps:
56

67
## Prepare Docker
@@ -20,36 +21,41 @@ bash ./serving/tgi_gaudi/build_docker.sh
2021
## Launch TGI Gaudi Service
2122

2223
### Launch a local server instance on 1 Gaudi card:
24+
2325
```bash
2426
bash ./serving/tgi_gaudi/launch_tgi_service.sh
2527
```
2628

2729
For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\<token\> to the docker run command above with a valid Hugging Face Hub read token.
2830

29-
Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
31+
Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
3032

3133
```bash
3234
export HUGGINGFACEHUB_API_TOKEN=<token>
3335
```
3436

3537
### Launch a local server instance on 8 Gaudi cards:
38+
3639
```bash
3740
bash ./serving/tgi_gaudi/launch_tgi_service.sh 8
3841
```
3942

4043
### Customize TGI Gaudi Service
4144

4245
The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters:
46+
4347
- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1.
4448
- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080.
4549
- model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3".
4650

4751
You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_ENDPOINT`:
52+
4853
```bash
4954
export TGI_ENDPOINT="http://xxx.xxx.xxx.xxx:8080"
5055
```
5156

5257
## Enable TGI Gaudi FP8 for higher throughput
58+
5359
The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. According to our test results, FP8 quantization yields approximately a 1.8x performance gain compared to BFLOAT16. Please follow the below steps to enable FP8 quantization.
5460

5561
### Prepare Metadata for FP8 Quantization
@@ -83,8 +89,8 @@ model-id meta-llama/Llama-2-7b-hf
8389

8490
Now the TGI Gaudi will launch the FP8 model by default. Please note that currently only Llama2 and Mistral models support FP8 quantization.
8591

86-
8792
## Launch Redis
93+
8894
```bash
8995
docker pull redis/redis-stack:latest
9096
docker compose -f langchain/docker/docker-compose-redis.yml up -d
@@ -99,7 +105,7 @@ cd langchain/docker/
99105
bash ./build_docker.sh
100106
```
101107

102-
### Lanuch LangChain Docker
108+
### Launch LangChain Docker
103109

104110
Update the `HUGGINGFACEHUB_API_TOKEN` environment variable with your huggingface token in the `docker-compose-langchain.yml`
105111

@@ -134,6 +140,7 @@ export SAFETY_GUARD_ENDPOINT="http://xxx.xxx.xxx.xxx:8088"
134140
```
135141

136142
## Start the Backend Service
143+
137144
Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service:
138145

139146
```bash
@@ -143,7 +150,8 @@ nohup python app/server.py &
143150

144151
## Start the Frontend Service
145152

146-
Navigate to the "ui" folder and execute the following commands to start the fronend GUI:
153+
Navigate to the "ui" folder and execute the following commands to start the frontend GUI:
154+
147155
```bash
148156
cd ui
149157
sudo apt-get install npm && \
@@ -163,11 +171,13 @@ sudo yum install -y nodejs
163171
Update the `DOC_BASE_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address.
164172

165173
Run the following command to install the required dependencies:
174+
166175
```bash
167176
npm install
168177
```
169178

170179
Start the development server by executing the following command:
180+
171181
```bash
172182
nohup npm run dev &
173183
```

ChatQnA/benchmarking/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Will update soon.
1+
Will update soon.

ChatQnA/benchmarking/client.py

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,60 @@
1-
import requests
2-
import json
1+
# Copyright (c) 2024 Intel Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
315
import argparse
416
import concurrent.futures
17+
import json
518
import random
619

20+
import requests
21+
22+
723
def extract_qText(json_data):
824
try:
9-
file = open('devtest.json')
25+
file = open("devtest.json")
1026
data = json.load(file)
1127
json_data = json.loads(json_data)
1228
json_data["inputs"] = data[random.randint(0, len(data) - 1)]["qText"]
1329
return json.dumps(json_data)
1430
except (json.JSONDecodeError, KeyError, IndexError):
1531
return None
1632

33+
1734
def send_request(url, json_data):
18-
headers = {'Content-Type': 'application/json'}
35+
headers = {"Content-Type": "application/json"}
1936
response = requests.post(url, data=json_data, headers=headers)
2037
print(f"Question: {json_data} Response: {response.status_code} - {response.text}")
2138

39+
2240
def main(url, json_data, concurrency):
2341
with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor:
24-
future_to_url = {executor.submit(send_request, url, extract_qText(json_data)): url for _ in range(concurrency*2)}
42+
future_to_url = {
43+
executor.submit(send_request, url, extract_qText(json_data)): url for _ in range(concurrency * 2)
44+
}
2545
for future in concurrent.futures.as_completed(future_to_url):
2646
_ = future_to_url[future]
2747

48+
2849
if __name__ == "__main__":
2950
parser = argparse.ArgumentParser(description="Concurrent client to send POST requests")
3051
parser.add_argument("--url", type=str, default="http://localhost:12345", help="URL to send requests to")
31-
parser.add_argument("--json_data", type=str, default='{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"do_sample": true}}', help="JSON data to send")
52+
parser.add_argument(
53+
"--json_data",
54+
type=str,
55+
default='{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"do_sample": true}}',
56+
help="JSON data to send",
57+
)
3258
parser.add_argument("--concurrency", type=int, default=100, help="Concurrency level")
3359
args = parser.parse_args()
3460
main(args.url, args.json_data, args.concurrency)

0 commit comments

Comments
 (0)