Skip to content

Commit baafa40

Browse files
MSCetin37pre-commit-ci[bot]ashahbaZePan110
authored
Add support for Audio and Video summarization to Docsum (#865)
* v2a services Signed-off-by: Mustafa <mustafa.cetin@intel.com> * add a2t - llm Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update whisper serve Signed-off-by: Mustafa <mustafa.cetin@intel.com> * updates Signed-off-by: Mustafa <mustafa.cetin@intel.com> * add data service Signed-off-by: Mustafa <mustafa.cetin@intel.com> * gateway Signed-off-by: Mustafa <mustafa.cetin@intel.com> * clean gateway & orchestrator Signed-off-by: Mustafa <mustafa.cetin@intel.com> * updates Signed-off-by: Mustafa <mustafa.cetin@intel.com> * updates Signed-off-by: Mustafa <mustafa.cetin@intel.com> * adding functional tests Signed-off-by: Mustafa <mustafa.cetin@intel.com> * updates Signed-off-by: Mustafa <mustafa.cetin@intel.com> * updates Signed-off-by: Mustafa <mustafa.cetin@intel.com> * updates read me file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * name changes Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update readme file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update readme file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update readme file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update readme file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update readme file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update max token option Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update the test files Signed-off-by: Mustafa <mustafa.cetin@intel.com> * readme updtes Signed-off-by: Mustafa <mustafa.cetin@intel.com> * readme updtes Signed-off-by: Mustafa <mustafa.cetin@intel.com> * clean code Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update dataprep-compose-cd.yaml file Signed-off-by: Mustafa <mustafa.cetin@intel.com> * merge and sync Signed-off-by: Mustafa <mustafa.cetin@intel.com> * merge and sync gateway Signed-off-by: Mustafa <mustafa.cetin@intel.com> * adding the copyright header Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update the end of file char Signed-off-by: Mustafa <mustafa.cetin@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gateway Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update gateway-docsum Signed-off-by: Mustafa <mustafa.cetin@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test files Signed-off-by: Mustafa <mustafa.cetin@intel.com> * fix test files Signed-off-by: Mustafa <mustafa.cetin@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge & sync Signed-off-by: Mustafa <mustafa.cetin@intel.com> * Update dataprep-compose.yaml * update test Signed-off-by: Mustafa <mustafa.cetin@intel.com> * update dataprep-compose Signed-off-by: Mustafa <mustafa.cetin@intel.com> --------- Signed-off-by: Mustafa <mustafa.cetin@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Co-authored-by: ZePan110 <ze.pan@intel.com>
1 parent 3b106c8 commit baafa40

File tree

23 files changed

+1350
-35
lines changed

23 files changed

+1350
-35
lines changed

.github/workflows/docker/compose/dataprep-compose.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,15 @@ services:
5151
build:
5252
dockerfile: comps/dataprep/neo4j/llama_index/Dockerfile
5353
image: ${REGISTRY:-opea}/dataprep-neo4j-llamaindex:${TAG:-latest}
54+
dataprep-multimedia2text:
55+
build:
56+
dockerfile: comps/dataprep/multimedia2text/Dockerfile
57+
image: ${REGISTRY:-opea}/dataprep-multimedia2text:${TAG:-latest}
58+
dataprep-video2audio:
59+
build:
60+
dockerfile: comps/dataprep/multimedia2text/video2audio/Dockerfile
61+
image: ${REGISTRY:-opea}/dataprep-video2audio:${TAG:-latest}
62+
dataprep-audio2text:
63+
build:
64+
dockerfile: comps/dataprep/multimedia2text/audio2text/Dockerfile
65+
image: ${REGISTRY:-opea}/dataprep-audio2text:${TAG:-latest}

comps/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@
3636
ScoreDoc,
3737
PIIRequestDoc,
3838
PIIResponseDoc,
39+
Audio2text,
40+
DocSumDoc,
3941
)
4042

4143
# Constants

comps/asr/whisper/dependency/whisper_model.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@
1414
class WhisperModel:
1515
"""Convert audio to text."""
1616

17-
def __init__(self, model_name_or_path="openai/whisper-small", language="english", device="cpu", hpu_max_len=8192):
17+
def __init__(
18+
self,
19+
model_name_or_path="openai/whisper-small",
20+
language="english",
21+
device="cpu",
22+
hpu_max_len=8192,
23+
return_timestamps=False,
24+
):
1825
if device == "hpu":
1926
# Explicitly link HPU with Torch
2027
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
@@ -31,6 +38,7 @@ def __init__(self, model_name_or_path="openai/whisper-small", language="english"
3138

3239
self.language = language
3340
self.hpu_max_len = hpu_max_len
41+
self.return_timestamps = return_timestamps
3442

3543
if device == "hpu":
3644
self._warmup_whisper_hpu_graph("https://github.com/Spycsh/assets/raw/main/ljspeech_60s_audio.wav")
@@ -104,7 +112,7 @@ def _warmup_whisper_hpu_graph(self, url):
104112
)
105113
),
106114
language=self.language,
107-
return_timestamps=True,
115+
return_timestamps=self.return_timestamps,
108116
)
109117

110118
def audio2text(self, audio_path):
@@ -167,7 +175,7 @@ def audio2text(self, audio_path):
167175
)
168176
),
169177
language=self.language,
170-
return_timestamps=True,
178+
return_timestamps=self.return_timestamps,
171179
)
172180
# pylint: disable=E1101
173181
result = self.processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]
@@ -180,7 +188,9 @@ def audio2text(self, audio_path):
180188

181189

182190
if __name__ == "__main__":
183-
asr = WhisperModel(model_name_or_path="openai/whisper-small", language="english", device="cpu")
191+
asr = WhisperModel(
192+
model_name_or_path="openai/whisper-small", language="english", device="cpu", return_timestamps=True
193+
)
184194

185195
# Test multilanguage asr
186196
asr.language = "chinese"

comps/asr/whisper/dependency/whisper_server.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ async def audio_to_text(request: Request):
3939

4040
audio = AudioSegment.from_file(file_name)
4141
audio = audio.set_frame_rate(16000)
42+
4243
audio.export(f"{file_name}", format="wav")
4344
try:
4445
asr_result = asr.audio2text(file_name)
@@ -57,8 +58,14 @@ async def audio_to_text(request: Request):
5758
parser.add_argument("--model_name_or_path", type=str, default="openai/whisper-small")
5859
parser.add_argument("--language", type=str, default="english")
5960
parser.add_argument("--device", type=str, default="cpu")
61+
parser.add_argument("--return_timestamps", type=str, default=True)
6062

6163
args = parser.parse_args()
62-
asr = WhisperModel(model_name_or_path=args.model_name_or_path, language=args.language, device=args.device)
64+
asr = WhisperModel(
65+
model_name_or_path=args.model_name_or_path,
66+
language=args.language,
67+
device=args.device,
68+
return_timestamps=args.return_timestamps,
69+
)
6370

6471
uvicorn.run(app, host=args.host, port=args.port)

comps/cores/mega/gateway.py

Lines changed: 13 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,11 @@
1717
ChatCompletionResponse,
1818
ChatCompletionResponseChoice,
1919
ChatMessage,
20+
DocSumChatCompletionRequest,
2021
EmbeddingRequest,
2122
UsageInfo,
2223
)
23-
from ..proto.docarray import LLMParams, LLMParamsDoc, RerankedDoc, RerankerParms, RetrieverParms, TextDoc
24+
from ..proto.docarray import DocSumDoc, LLMParams, LLMParamsDoc, RerankedDoc, RerankerParms, RetrieverParms, TextDoc
2425
from .constants import MegaServiceEndpoint, ServiceRoleType, ServiceType
2526
from .micro_service import MicroService
2627

@@ -409,34 +410,20 @@ async def handle_request(self, request: Request):
409410
class DocSumGateway(Gateway):
410411
def __init__(self, megaservice, host="0.0.0.0", port=8888):
411412
super().__init__(
412-
megaservice, host, port, str(MegaServiceEndpoint.DOC_SUMMARY), ChatCompletionRequest, ChatCompletionResponse
413+
megaservice,
414+
host,
415+
port,
416+
str(MegaServiceEndpoint.DOC_SUMMARY),
417+
input_datatype=DocSumChatCompletionRequest,
418+
output_datatype=ChatCompletionResponse,
413419
)
414420

415-
async def handle_request(self, request: Request, files: List[UploadFile] = File(default=None)):
416-
data = await request.form()
421+
async def handle_request(self, request: Request):
422+
data = await request.json()
417423
stream_opt = data.get("stream", True)
418-
chat_request = ChatCompletionRequest.parse_obj(data)
419-
file_summaries = []
420-
if files:
421-
for file in files:
422-
file_path = f"/tmp/{file.filename}"
423-
424-
import aiofiles
425-
426-
async with aiofiles.open(file_path, "wb") as f:
427-
await f.write(await file.read())
428-
docs = read_text_from_file(file, file_path)
429-
os.remove(file_path)
430-
if isinstance(docs, list):
431-
file_summaries.extend(docs)
432-
else:
433-
file_summaries.append(docs)
434-
435-
if file_summaries:
436-
prompt = self._handle_message(chat_request.messages) + "\n".join(file_summaries)
437-
else:
438-
prompt = self._handle_message(chat_request.messages)
424+
chat_request = ChatCompletionRequest.model_validate(data)
439425

426+
prompt = self._handle_message(chat_request.messages)
440427
parameters = LLMParams(
441428
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
442429
top_k=chat_request.top_k if chat_request.top_k else 10,
@@ -446,10 +433,9 @@ async def handle_request(self, request: Request, files: List[UploadFile] = File(
446433
presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
447434
repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
448435
streaming=stream_opt,
449-
language=chat_request.language if chat_request.language else "auto",
450436
)
451437
result_dict, runtime_graph = await self.megaservice.schedule(
452-
initial_inputs={"query": prompt}, llm_parameters=parameters
438+
initial_inputs={data["type"]: prompt}, llm_parameters=parameters
453439
)
454440
for node, response in result_dict.items():
455441
# Here it suppose the last microservice in the megaservice is LLM.

comps/cores/proto/api_protocol.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,14 @@ class ChatCompletionRequest(BaseModel):
269269
request_type: Literal["chat"] = "chat"
270270

271271

272+
class DocSumChatCompletionRequest(BaseModel):
273+
llm_params: Optional[ChatCompletionRequest] = None
274+
text: Optional[str] = None
275+
audio: Optional[str] = None
276+
video: Optional[str] = None
277+
type: Optional[str] = None
278+
279+
272280
class AudioChatCompletionRequest(BaseModel):
273281
audio: str
274282
messages: Optional[

comps/cores/proto/docarray.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ class TextDoc(BaseDoc, TopologyInfo):
2020
text: str = None
2121

2222

23+
class Audio2text(BaseDoc, TopologyInfo):
24+
query: str = None
25+
26+
2327
class FactualityDoc(BaseDoc):
2428
reference: str
2529
text: str
@@ -74,6 +78,12 @@ class Base64ByteStrDoc(BaseDoc):
7478
byte_str: str
7579

7680

81+
class DocSumDoc(BaseDoc):
82+
text: Optional[str] = None
83+
audio: Optional[str] = None
84+
video: Optional[str] = None
85+
86+
7787
class DocPath(BaseDoc):
7888
path: str
7989
chunk_size: int = 1500
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# Use the official Python 3.11 slim image as the base image
5+
FROM python:3.11-slim
6+
7+
# Set environment variables
8+
ENV LANG=C.UTF-8
9+
10+
# Install necessary packages and clean up to reduce image size
11+
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
12+
build-essential \
13+
libgl1-mesa-glx \
14+
libjemalloc-dev && \
15+
apt-get clean && \
16+
rm -rf /var/lib/apt/lists/*
17+
18+
# Create a directory for the user and set it as the working directory
19+
WORKDIR /home/user
20+
21+
# Copy the application code and requirements file to the container
22+
COPY comps /home/user/comps
23+
COPY requirements.txt /home/user/requirements.txt
24+
COPY ./comps/dataprep/multimedia2text/multimedia2text.py /home/user/multimedia2text.py
25+
26+
# Install Python dependencies
27+
RUN python -m pip install --no-cache-dir -r requirements.txt
28+
29+
# Define the entry point for the container
30+
ENTRYPOINT ["python", "multimedia2text.py"]

0 commit comments

Comments
 (0)