Cantonese Pronunciation Assistant, Phonemes are all empty strings

**IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:**

- Speech SDK log taken from a run that exhibits the reported issue. 
See [instructions on how to take logs](https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-use-logging).

[log.txt](https://github.com/user-attachments/files/25197833/log.txt)

- A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.

```py
import json
import os

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech import PropertyId
from dotenv import load_dotenv

load_dotenv()

speech_key = os.getenv("AZURE_SPEECH_KEY")
speech_region = os.getenv("AZURE_SPEECH_REGION")
if not speech_key or not speech_region:
    raise ValueError("Missing AZURE_SPEECH_KEY or AZURE_SPEECH_REGION")

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "./log.out")

# Text to Speech for sample audio

REFERENCE_TEXT = "香港"
FILE_NAME = "tts_text_zh-HK.wav"

speech_config.speech_synthesis_voice_name = "zh-HK-HiuMaanNeural"
speech_config.speech_synthesis_language = "zh-HK"
speech_config.set_speech_synthesis_output_format(
    speechsdk.SpeechSynthesisOutputFormat.Riff16Khz16BitMonoPcm
)
audio_output_config = speechsdk.audio.AudioOutputConfig(filename=FILE_NAME)
synthesizer = speechsdk.SpeechSynthesizer(
    speech_config=speech_config, audio_config=audio_output_config
)
result = synthesizer.speak_text_async(REFERENCE_TEXT).get()

# Speech to text with pronunciation assessment

audio_config = speechsdk.audio.AudioConfig(filename=FILE_NAME)
recognizer = speechsdk.SpeechRecognizer(
    speech_config=speech_config,
    language="zh-HK",
    audio_config=audio_config,
)

pronunciation_config = speechsdk.PronunciationAssessmentConfig(
    reference_text=REFERENCE_TEXT,
    grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark,
    granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme,
    enable_miscue=True,
)
pronunciation_config.apply_to(recognizer)

result: speechsdk.SpeechRecognitionResult | None = (
    recognizer.recognize_once_async().get()
)

if result is None:
    raise RuntimeError("Speech recognition failed")

for word in (
    json.loads(result.properties[PropertyId.SpeechServiceResponse_JsonResult])
    .get("NBest", [{}])[0]
    .get("Words", [])
):
    print(word)
```

- If relevant, a WAV file of your input audio.

- Additional information as shown below

**Describe the bug**

All Phonemes are empty strings when using `zh-HK`, as seen from the output of my script.

```py
{'Word': '香港', 'Offset': 700000, 'Duration': 8000000, 'PronunciationAssessment': {'AccuracyScore': 100.0, 'ErrorType': 'None'}, 'Phonemes': [{'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 700000, 'Duration': 2000000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 2800000, 'Duration': 900000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 3800000, 'Duration': 900000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 4800000, 'Duration': 300000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 5200000, 'Duration': 1700000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 7000000, 'Duration': 1700000}]}
```

**To Reproduce**

Steps to reproduce the behavior:

1. Run the script

**Expected behavior**

Phonemes should not be empty strings.

**Version of the Cognitive Services Speech SDK**

1.47.0

**Platform, Operating System, and Programming Language**

 - OS: Dev container mcr.microsoft.com/devcontainers/python:1-3.13 in WSL
 - Hardware - x64, x86, ARM, ...
 - Programming language: Python
 - Browser [e.g. Chrome, Safari] (if applicable) - N/A

**Additional context**

 - Error messages, stack trace, ...
 - Any additional information.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cantonese Pronunciation Assistant, Phonemes are all empty strings #2990

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cantonese Pronunciation Assistant, Phonemes are all empty strings #2990

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions