-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Description
IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:
- Speech SDK log taken from a run that exhibits the reported issue.
See instructions on how to take logs.
- A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.
import json
import os
import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech import PropertyId
from dotenv import load_dotenv
load_dotenv()
speech_key = os.getenv("AZURE_SPEECH_KEY")
speech_region = os.getenv("AZURE_SPEECH_REGION")
if not speech_key or not speech_region:
raise ValueError("Missing AZURE_SPEECH_KEY or AZURE_SPEECH_REGION")
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "./log.out")
# Text to Speech for sample audio
REFERENCE_TEXT = "香港"
FILE_NAME = "tts_text_zh-HK.wav"
speech_config.speech_synthesis_voice_name = "zh-HK-HiuMaanNeural"
speech_config.speech_synthesis_language = "zh-HK"
speech_config.set_speech_synthesis_output_format(
speechsdk.SpeechSynthesisOutputFormat.Riff16Khz16BitMonoPcm
)
audio_output_config = speechsdk.audio.AudioOutputConfig(filename=FILE_NAME)
synthesizer = speechsdk.SpeechSynthesizer(
speech_config=speech_config, audio_config=audio_output_config
)
result = synthesizer.speak_text_async(REFERENCE_TEXT).get()
# Speech to text with pronunciation assessment
audio_config = speechsdk.audio.AudioConfig(filename=FILE_NAME)
recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
language="zh-HK",
audio_config=audio_config,
)
pronunciation_config = speechsdk.PronunciationAssessmentConfig(
reference_text=REFERENCE_TEXT,
grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark,
granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme,
enable_miscue=True,
)
pronunciation_config.apply_to(recognizer)
result: speechsdk.SpeechRecognitionResult | None = (
recognizer.recognize_once_async().get()
)
if result is None:
raise RuntimeError("Speech recognition failed")
for word in (
json.loads(result.properties[PropertyId.SpeechServiceResponse_JsonResult])
.get("NBest", [{}])[0]
.get("Words", [])
):
print(word)-
If relevant, a WAV file of your input audio.
-
Additional information as shown below
Describe the bug
All Phonemes are empty strings when using zh-HK, as seen from the output of my script.
{'Word': '香港', 'Offset': 700000, 'Duration': 8000000, 'PronunciationAssessment': {'AccuracyScore': 100.0, 'ErrorType': 'None'}, 'Phonemes': [{'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 700000, 'Duration': 2000000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 2800000, 'Duration': 900000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 3800000, 'Duration': 900000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 4800000, 'Duration': 300000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 5200000, 'Duration': 1700000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 7000000, 'Duration': 1700000}]}To Reproduce
Steps to reproduce the behavior:
- Run the script
Expected behavior
Phonemes should not be empty strings.
Version of the Cognitive Services Speech SDK
1.47.0
Platform, Operating System, and Programming Language
- OS: Dev container mcr.microsoft.com/devcontainers/python:1-3.13 in WSL
- Hardware - x64, x86, ARM, ...
- Programming language: Python
- Browser [e.g. Chrome, Safari] (if applicable) - N/A
Additional context
- Error messages, stack trace, ...
- Any additional information.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels