Skip to content

Cantonese Pronunciation Assistant, Phonemes are all empty strings #2990

@henrymcl

Description

@henrymcl

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

log.txt

  • A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.
import json
import os

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech import PropertyId
from dotenv import load_dotenv

load_dotenv()

speech_key = os.getenv("AZURE_SPEECH_KEY")
speech_region = os.getenv("AZURE_SPEECH_REGION")
if not speech_key or not speech_region:
    raise ValueError("Missing AZURE_SPEECH_KEY or AZURE_SPEECH_REGION")

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "./log.out")

# Text to Speech for sample audio

REFERENCE_TEXT = "香港"
FILE_NAME = "tts_text_zh-HK.wav"

speech_config.speech_synthesis_voice_name = "zh-HK-HiuMaanNeural"
speech_config.speech_synthesis_language = "zh-HK"
speech_config.set_speech_synthesis_output_format(
    speechsdk.SpeechSynthesisOutputFormat.Riff16Khz16BitMonoPcm
)
audio_output_config = speechsdk.audio.AudioOutputConfig(filename=FILE_NAME)
synthesizer = speechsdk.SpeechSynthesizer(
    speech_config=speech_config, audio_config=audio_output_config
)
result = synthesizer.speak_text_async(REFERENCE_TEXT).get()

# Speech to text with pronunciation assessment

audio_config = speechsdk.audio.AudioConfig(filename=FILE_NAME)
recognizer = speechsdk.SpeechRecognizer(
    speech_config=speech_config,
    language="zh-HK",
    audio_config=audio_config,
)

pronunciation_config = speechsdk.PronunciationAssessmentConfig(
    reference_text=REFERENCE_TEXT,
    grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark,
    granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme,
    enable_miscue=True,
)
pronunciation_config.apply_to(recognizer)

result: speechsdk.SpeechRecognitionResult | None = (
    recognizer.recognize_once_async().get()
)

if result is None:
    raise RuntimeError("Speech recognition failed")

for word in (
    json.loads(result.properties[PropertyId.SpeechServiceResponse_JsonResult])
    .get("NBest", [{}])[0]
    .get("Words", [])
):
    print(word)
  • If relevant, a WAV file of your input audio.

  • Additional information as shown below

Describe the bug

All Phonemes are empty strings when using zh-HK, as seen from the output of my script.

{'Word': '香港', 'Offset': 700000, 'Duration': 8000000, 'PronunciationAssessment': {'AccuracyScore': 100.0, 'ErrorType': 'None'}, 'Phonemes': [{'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 700000, 'Duration': 2000000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 2800000, 'Duration': 900000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 3800000, 'Duration': 900000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 4800000, 'Duration': 300000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 5200000, 'Duration': 1700000}, {'Phoneme': '', 'PronunciationAssessment': {'AccuracyScore': 100.0}, 'Offset': 7000000, 'Duration': 1700000}]}

To Reproduce

Steps to reproduce the behavior:

  1. Run the script

Expected behavior

Phonemes should not be empty strings.

Version of the Cognitive Services Speech SDK

1.47.0

Platform, Operating System, and Programming Language

  • OS: Dev container mcr.microsoft.com/devcontainers/python:1-3.13 in WSL
  • Hardware - x64, x86, ARM, ...
  • Programming language: Python
  • Browser [e.g. Chrome, Safari] (if applicable) - N/A

Additional context

  • Error messages, stack trace, ...
  • Any additional information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions