bug/feat: tool call from llama.cpp fails

- [x] I have looked for existing issues (including closed) about this

## Bug Report

Tool call from `llama.cpp` can't be deserialized.

```
CompletionError: JsonError: data did not match any variant of untagged enum ApiResponse
```

## Reproduction

```rust
let client: openai::Client = openai::Client::builder()
    .base_url("llama.cpp url")
    .build()
    .expect("Failed to build client");
let agent = client
    .completions_api()
    .agent(&config.model.name)
    .preamble("You are a helpful assistant with access to tools.")
    .tool(HelloWorld)
    .build();

match agent.prompt("Call the hello_world tool and give me the output.").await {
    Ok(response) => {
        println!("\nResponse from model:");
        println!("  {}", response);
    }
    Err(e) => {
        eprintln!("Error: {}", e);
    }
}
```

the model gives:

```json
{
  "choices": [{
      "finish_reason": "tool_calls",
      "index": 0,
      "message": {
          "role": "assistant",
          "content": "",
          "tool_calls": [{ "type": "function", "function": { "name": "hello_world", "arguments": {} }, "id": "xxx" }]
      }
  }],
  "created": 0,
  "model": "unsloth/Qwen3-Coder-Next-GGUF:Q8_0",
  "system_fingerprint": "b8113-xxxx",
  "object": "chat.completion",
  "usage": { "completion_tokens": 13, "prompt_tokens": 255, "total_tokens": 268 },
  "id": "xxx",
  "timings": {
      "cache_n": 0,
      "prompt_n": 255,
      "prompt_ms": 670,
      "prompt_per_token_ms": 2.63,
      "prompt_per_second": 380,
      "predicted_n": 13,
      "predicted_ms": 367,
      "predicted_per_token_ms": 28,
      "predicted_per_second": 35
  }
}
```

I get the error:

```
CompletionError: JsonError: data did not match any variant of untagged enum ApiResponse
```

## Expected behavior

I would be nice if the `llama.cpp` tool calling would be supported. And i could get the output of the `hello_world` tool.

## Additional context

The problem is probably that `llama.cpp` is not exactly compatible to the openai API, because it gives the tool call `arguments` as an map. The openai API expects them to be a JSON string.

However i think the way that `llama.cpp` is embedding the arguments is more natural (but sadly not fully compatible to the openai API) so i think it would be nice to support both. For the `huggingface`  provider there is already a `deserialize_arguments` that is handling JSON-objects and stringified JSON-objects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug/feat: tool call from llama.cpp fails #1408

Bug Report

Reproduction

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug/feat: tool call from llama.cpp fails #1408

Description

Bug Report

Reproduction

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions