-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Open
Description
Hello, i managed to compile whisper.cpp with OpenCL for Adreno support, but when trying to transcribe audio i get this error.
GGML_ASSERT(M % 4 == 0) failed.
It might be because opencl isn't offically supported but im not sure.
Logs:
./whisper-cli -m /sdcard/Download/ggml-medium-q8_0.bin -f '/sdcard/Music/Audio Media Converter/Argentina_sMemecoinDisasterIsWorsethanYouthink.wav'
", the input path should be absolutevalid input: "
" (ignoring)ker: Warning: unable to normalize "
whisper_init_from_file_with_params_no_state: loading model from '/sdcard/Download/ggml-medium-q8_0.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 0800.56.1 Compiler E031.47.18.42
ggml_opencl: vector subgroup broadcast support: true
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: device max image buffer size (pixels): 134217728
ggml_opencl: device max workgroup size: 1024
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels...........................................................................................
ggml_opencl: default device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16 whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 7
whisper_model_load: qntvr = 2
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: OpenCL total size = 822.75 MB
/data/data/com.termux/files/home/whisper.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:4450: GGML_ASSERT(M % 4 == 0) failed
0: 0x73668965ac
1: 0x736689656c ggml_print_backtrace
2: 0x73668a9698 ggml_abort
3: 0x736118c1c0
4: 0x736b4b0904
5: 0x736b4aebdc whisper_init_with_params_no_state
6: 0x736b4ae98c whisper_init_from_file_with_params_no_state
7: 0x736b4b0c88 whisper_init_from_file_with_params
8: 0x5bad0682f0
9: 0x735ff48d64 __libc_init
Aborted ./whisper-cli -m /sdcard/Download/ggml-medium-q8_0.bin -f '/sdcard/Music/Audio Media Converter/Argentina_sMemecoinDisasterIsWorsethanYouthink.wav'
clinfo:
~ $ clinfo
", the input path should be absolutevalid input: "
" (ignoring)ker: Warning: unable to normalize "
Number of platforms 1
Platform Name QUALCOMM Snapdragon(TM)
Platform Vendor QUALCOMM
Platform Version OpenCL 3.0 QUALCOMM build: 0800.56.1
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions with Version cl_khr_icd 0xc00000 (3.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix QCOM
Platform Host timer resolution 0ns
Platform Name QUALCOMM Snapdragon(TM)
Number of devices 1
Device Name QUALCOMM Adreno(TM) 830
Device Vendor QUALCOMM
Device Vendor ID 0x5143
Device Version OpenCL 3.0 Adreno(TM) 830
Device Numeric Version 0xc00000 (3.0.0)
Driver Version OpenCL 3.0 QUALCOMM build: 0800.56.1 Compiler E031.47.18.42
Device OpenCL C Version OpenCL C 3.0 Adreno(TM) 830
Device OpenCL C all versions OpenCL C 0xc00000 (3.0.0)
OpenCL C 0x800000 (2.0.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x400000 (1.0.0)
Device OpenCL C features __opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_device_enqueue 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0)
__opencl_c_kernel_clock_scope_device 0xc00000 (3.0.0)
__opencl_c_kernel_clock_scope_sub_group 0xc00000 (3.0.0)
__opencl_c_kernel_clock_scope_work_group 0xc00000 (3.0.0)
__opencl_c_pipes 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
Latest conformance test passed v2024-05-29-01
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes Linker Available Yes
Max compute units 12
Max clock frequency 1MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024 Preferred work group size multiple (device) 8
Preferred work group size multiple (kernel) 8
Max sub-groups per work group 16
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16) float 4 / 4
double 0 / 0 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity Yes
IEEE754-2008 fused multiply-add No
Support is emulated in software No Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity Yes
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a) Address bits 64, Little-Endian
External memory handle types Opaque FD, DMA buffer
External memory assume linear img handle types Opaque FD, DMA buffer
Semaphore types Binary
External semaphore import types Sync FD
External semaphore export types Sync FD
Global memory size 7920769024 (7.377GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Page size (QCOM) 4096 bytes
External memory padding (QCOM) 0 bytes
Preferred alignment for atomics SVM 128 bytes
Global 0 bytes
Local 0 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Max size for global variable 65536 (64KiB) Preferred total size of global vars 1048576 (1024KiB)
Global Memory cache type Read/Write
Global Memory cache size 1048576 (1024KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 pixels Max 2D image size 16384x16384 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 64
Max number of read/write image args 64
Pipe support Yes
Max number of pipe args 16
Max active pipe reservations 24576
Max pipe packet size 1024
Local memory type Local Local memory size 32768 (32KiB)
Max number of constant args 8
Max constant buffer size 65536 (64KiB)
Generic address space support Yes
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Device enqueue capabilities supported
Queue properties (on device) Out-of-order execution Yes
Profiling Yes
Preferred size 655376 (640KiB)
Max size 655376 (640KiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes Run native kernels No
Non-uniform work-groups Yes
Work-group collective functions Yes
Sub-group independent forward progress Yes
IL version (n/a)
ILs with version (n/a)
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Built-in kernels with version (n/a)
Device Extensions cl_ext_image_from_buffer cl_ext_image_requirements_info cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_extended_bit_ops cl_khr_external_memory cl_khr_external_memory_dma_buf cl_khr_external_memory_opaque_fd cl_khr_external_semaphore cl_khr_external_semaphore_sync_fd cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_integer_dot_product cl_khr_kernel_clock cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_semaphore cl_khr_srgb_image_writes cl_khr_subgroups cl_khr_subgroup_ballot cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_rotate cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_suggested_local_work_size cl_qcom_accelerated_image_ops cl_qcom_android_ahardwarebuffer_host_ptr cl_qcom_android_native_buffer_host_ptr cl_qcom_bfloat16_product cl_qcom_bitreverse cl_qcom_compressed_image cl_qcom_create_buffer_from_image cl_qcom_dmabuf_host_ptr cl_qcom_dot_product8 cl_qcom_ext_host_ptr cl_qcom_ext_host_ptr_iocoherent cl_qcom_extended_query_image_info cl_qcom_extract_image_plane cl_qcom_filter_bicubic cl_qcom_ml_ops cl_qcom_other_image cl_qcom_perf_hint cl_qcom_priority_hint cl_qcom_protected_context cl_qcom_recordable_queues cl_qcom_subgroup_shuffle cl_qcom_vector_image_ops cl_qcom_onchip_global_memory cl_qcom_large_buffer cl_qcom_ahardwarebuffer_direct_import cl_qcom_local_memory_control
Device Extensions with Version cl_ext_image_from_buffer 0x400000 (1.0.0)
cl_ext_image_requirements_info 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_egl_event 0x400000 (1.0.0)
cl_khr_egl_image 0x400000 (1.0.0)
cl_khr_extended_bit_ops 0x400000 (1.0.0)
cl_khr_external_memory 0x400000 (1.0.0)
cl_khr_external_memory_dma_buf 0x400000 (1.0.0)
cl_khr_external_memory_opaque_fd 0x400000 (1.0.0)
cl_khr_external_semaphore 0x400000 (1.0.0)
cl_khr_external_semaphore_sync_fd 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_gl_sharing 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
cl_khr_integer_dot_product 0x800000 (2.0.0)
cl_khr_kernel_clock 0x9000 (0.9.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_mipmap_image 0x400000 (1.0.0)
cl_khr_semaphore 0x400009 (1.0.9)
cl_khr_srgb_image_writes 0x400000 (1.0.0)
cl_khr_subgroups 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_extended_types 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0)
cl_khr_subgroup_rotate 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0)
cl_khr_suggested_local_work_size 0x400000 (1.0.0)
cl_qcom_accelerated_image_ops 0x400008 (1.0.8)
cl_qcom_android_ahardwarebuffer_host_ptr 0x400001 (1.0.1)
cl_qcom_android_native_buffer_host_ptr 0x400006 (1.0.6)
cl_qcom_bfloat16_product 0x400000 (1.0.0)
cl_qcom_bitreverse 0x400001 (1.0.1)
cl_qcom_compressed_image 0x400003 (1.0.3)
cl_qcom_create_buffer_from_image 0x400007 (1.0.7)
cl_qcom_dmabuf_host_ptr 0x400000 (1.0.0)
cl_qcom_dot_product8 0x401001 (1.1.1)
cl_qcom_ext_host_ptr 0x400005 (1.0.5)
cl_qcom_ext_host_ptr_iocoherent 0x400004 (1.0.4)
cl_qcom_extended_query_image_info 0x400000 (1.0.0)
cl_qcom_extract_image_plane 0x400005 (1.0.5)
cl_qcom_filter_bicubic 0x400000 (1.0.0)
cl_qcom_ml_ops 0x400001 (1.0.1)
cl_qcom_other_image 0x400006 (1.0.6)
cl_qcom_perf_hint 0x400004 (1.0.4)
cl_qcom_priority_hint 0x400005 (1.0.5)
cl_qcom_protected_context 0x400005 (1.0.5)
cl_qcom_recordable_queues 0x400001 (1.0.1)
cl_qcom_subgroup_shuffle 0x400006 (1.0.6)
cl_qcom_vector_image_ops 0x400009 (1.0.9)
cl_qcom_onchip_global_memory 0x400000 (1.0.0)
cl_qcom_large_buffer 0x400000 (1.0.0)
cl_qcom_ahardwarebuffer_direct_import 0x400000 (1.0.0)
cl_qcom_local_memory_control 0x400000 (1.0.0)
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [QCOM]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name QUALCOMM Snapdragon(TM)
Device Name QUALCOMM Adreno(TM) 830 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name QUALCOMM Snapdragon(TM)
Device Name QUALCOMM Adreno(TM) 830
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name QUALCOMM Snapdragon(TM)
Device Name QUALCOMM Adreno(TM) 830
ICD loader properties
ICD loader Name Khronos OpenCL ICD Loader
ICD loader Vendor Khronos Group
ICD loader Version OPENCL_ICD_LOADER_VERSION_MAJOR.OPENCL_ICD_LOADER_VERSION_MINOR.OPENCL_ICD_LOADER_VERSION_REV
ICD loader Profile OpenCL 3.0
~ $
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels