Skip to content

bug/feat: tool call from llama.cpp fails #1408

@inqode-lars

Description

@inqode-lars
  • I have looked for existing issues (including closed) about this

Bug Report

Tool call from llama.cpp can't be deserialized.

CompletionError: JsonError: data did not match any variant of untagged enum ApiResponse

Reproduction

let client: openai::Client = openai::Client::builder()
    .base_url("llama.cpp url")
    .build()
    .expect("Failed to build client");
let agent = client
    .completions_api()
    .agent(&config.model.name)
    .preamble("You are a helpful assistant with access to tools.")
    .tool(HelloWorld)
    .build();

match agent.prompt("Call the hello_world tool and give me the output.").await {
    Ok(response) => {
        println!("\nResponse from model:");
        println!("  {}", response);
    }
    Err(e) => {
        eprintln!("Error: {}", e);
    }
}

the model gives:

{
  "choices": [{
      "finish_reason": "tool_calls",
      "index": 0,
      "message": {
          "role": "assistant",
          "content": "",
          "tool_calls": [{ "type": "function", "function": { "name": "hello_world", "arguments": {} }, "id": "xxx" }]
      }
  }],
  "created": 0,
  "model": "unsloth/Qwen3-Coder-Next-GGUF:Q8_0",
  "system_fingerprint": "b8113-xxxx",
  "object": "chat.completion",
  "usage": { "completion_tokens": 13, "prompt_tokens": 255, "total_tokens": 268 },
  "id": "xxx",
  "timings": {
      "cache_n": 0,
      "prompt_n": 255,
      "prompt_ms": 670,
      "prompt_per_token_ms": 2.63,
      "prompt_per_second": 380,
      "predicted_n": 13,
      "predicted_ms": 367,
      "predicted_per_token_ms": 28,
      "predicted_per_second": 35
  }
}

I get the error:

CompletionError: JsonError: data did not match any variant of untagged enum ApiResponse

Expected behavior

I would be nice if the llama.cpp tool calling would be supported. And i could get the output of the hello_world tool.

Additional context

The problem is probably that llama.cpp is not exactly compatible to the openai API, because it gives the tool call arguments as an map. The openai API expects them to be a JSON string.

However i think the way that llama.cpp is embedding the arguments is more natural (but sadly not fully compatible to the openai API) so i think it would be nice to support both. For the huggingface provider there is already a deserialize_arguments that is handling JSON-objects and stringified JSON-objects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions