Fix: JSON schema union types (type arrays) fail with 'type must be a string' error#1675
Conversation
…reation This commit adds a Python-side workaround for issue dottxt-ai#1383 where dynamic JSON schema creation fails when using union types (type arrays like ["string", "null"]) with nested optionals. The fix introduces a preprocessing step that converts JSON schema type arrays into the anyOf format that outlines-core 0.1.26 can handle. Key changes: - Add json_schema_utils.py with preprocessing function - Update JsonSchema class to preprocess schemas before passing to Rust - Add comprehensive tests for various union type scenarios Fixes dottxt-ai#1383
RobinPicard
left a comment
There was a problem hiding this comment.
Thanks a lot for both the detailed issue and this excellent contribution! I have just a little comment, but looks good to me otherwise. Upgrading outlines-core to v0.2+ is something we want to do in the coming weeks fyi.
| schema_dict = schema | ||
|
|
||
| preprocessed = _preprocess_schema_dict(schema_dict) | ||
| return json.dumps(preprocessed) |
There was a problem hiding this comment.
We need to include the ensure_ascii argument here. Otherwise the value provided by the user will be ignored if it was False.
…reation - Add preprocessing to convert type arrays like ["string", "null"] to anyOf format - Implement thread-safe LRU cache with compression and performance optimizations - Add comprehensive test suite with 24 test cases covering edge cases - Include DoS protection, graceful fallback, and performance metrics - Support all JSON schema keywords and nested structures - Add benchmark scripts and reproduction examples Fixes dottxt-ai#1383
|
I don't think we need all that for such a small change (especially considering it'll be removed in a few weeks). I think what you had in your 1st commit was sufficient. There's still the issue of the |
- Simplified json_schema_utils.py from 700+ lines to ~120 lines - Added ensure_ascii parameter to preprocessing function - Updated JsonSchema class to pass ensure_ascii to preprocessing - Simplified tests to match minimal implementation - Removed benchmark scripts and complex features - Fixed edge case where type-specific properties were not properly isolated This addresses reviewer feedback requesting a simpler implementation that preserves the ensure_ascii parameter.
… compatibility - Replaced match/case syntax with if/elif statements in to_regex function - Fixed JSON schema preprocessing to preserve original format when unchanged - All tests now pass, style checks pass - Maintains backward compatibility with Python 3.9 This fixes CI failures caused by Python 3.10+ syntax in Python 3.9 environment.
- Added try-catch in preprocess_schema_for_union_types to handle invalid JSON - Return original string unchanged for invalid JSON to preserve error handling - Fixed pre-commit formatting issues (trailing whitespace, end-of-file) - All tests pass including union type tests and DSL tests This fixes the failing CI test and pre-commit hook failures.
|
Merging is blocked by failing coverage check |
989b447 to
848f8ca
Compare
|
I fixed the test coverage issue and pushed to your branch. Thanks again for the contribution @brightlikethelight! |
Summary
["string", "null"]toanyOfformat that outlines-core 0.1.26 supportsBackground
JSON schemas with union types specified as arrays (e.g.,
{"type": ["string", "null"]}) currently fail with a ValueError: "'type' must be a string". This prevents the use of optional fields and other union type patterns in dynamic JSON schema generation.The root cause is that the current version of outlines-core (0.1.26) does not support type arrays. This was fixed in outlines-core PR dottxt-ai/outlines-core#138, but that fix is only available in outlines-core v0.2+. Upgrading to v0.2+ requires significant changes due to breaking API changes (see #1380).
Solution
This PR implements a Python-side preprocessing step that:
anyOfformatThis is a temporary workaround until outlines can be migrated to outlines-core v0.2+.
Test plan
test_json_schema_union_types.pyExample
Before preprocessing:
{ "type": "object", "properties": { "age": {"type": ["integer", "null"]} } }After preprocessing:
{ "type": "object", "properties": { "age": {"anyOf": [{"type": "integer"}, {"type": "null"}]} } }Related Issues
outlinesto useoutlines-corev0.2 #1380 (tracking outlines-core v0.2 migration)