feat: add ArcadeDB document store integration by lvca · Pull Request #2898 · deepset-ai/haystack-core-integrations

lvca · 2026-02-28T23:10:15Z

Summary

Add ArcadeDBDocumentStore implementing the full Haystack DocumentStore protocol (count, filter, write/upsert/skip, delete) with automatic database, schema, and HNSW vector index initialization
Add ArcadeDBEmbeddingRetriever pipeline component for vector similarity retrieval with FilterPolicy support (REPLACE/MERGE)
Pure HTTP/JSON API integration via requests — no special database drivers needed

ArcadeDB is an open-source multi-model database (~4k GitHub stars) that combines document storage, HNSW vector search (LSM_VECTOR), full-text search, and graph capabilities in a single engine. For Haystack users, this means one database replaces separate backends for document storage and vector retrieval.

Key design decisions

HTTP-only: All operations use POST /api/v1/command/{database} with SQL — no Bolt protocol, no neo4j driver, no custom binary protocol
LSM_VECTOR index: Uses ArcadeDB's ACID-compliant HNSW implementation with configurable dimension and similarity metric (cosine, euclidean, dot product)
ARRAY_OF_FLOATS property type: Required by ArcadeDB's vector index (not generic LIST)
Post-filtering for vector search: vectorNeighbors() returns {record, distance} maps — metadata filters are applied as a post-filter step
Automatic schema setup: Creates database, vertex type, properties, unique index, and vector index on first use — zero manual setup required

How did you test it?

14 unit tests for filter conversion (no ArcadeDB required)
12 integration tests against a live ArcadeDB Docker container covering: count, write (overwrite/skip/duplicate), delete, filter (equality/comparison/AND), embedding retrieval, and serialization round-trip
End-to-end pipeline example with ArcadeDBEmbeddingRetriever demonstrating filtered vector search
All 26 tests pass against arcadedata/arcadedb:latest

Checklist

I have read the contributors guidelines and the code of conduct
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: feat:
CI workflow with ArcadeDB Docker service included
Labeler configuration updated
README inventory table updated

Add ArcadeDBDocumentStore and ArcadeDBEmbeddingRetriever for Haystack 2.x. ArcadeDB is an open-source multi-model database that combines document storage, HNSW vector search (LSM_VECTOR), and SQL metadata filtering in a single engine. This integration connects via the HTTP/JSON API using only the requests library — no special drivers needed. Components: - ArcadeDBDocumentStore: full DocumentStore protocol (count, filter, write, delete) with automatic schema/index initialization - ArcadeDBEmbeddingRetriever: pipeline component for vector similarity retrieval with FilterPolicy support - Filter conversion: Haystack filter dicts → ArcadeDB SQL WHERE clauses - Document converters: Haystack Document ↔ ArcadeDB record mapping Includes CI workflow with ArcadeDB Docker service, unit tests for filter conversion, and integration tests for all DocumentStore operations.

CLAassistant · 2026-02-28T23:10:23Z

All committers have signed the CLA.

- Remove unused noqa: B008 directives (B008 already in ignore list) - Use HTTPStatus.BAD_REQUEST instead of magic value 400 (PLR2004) - Add S608 to ruff ignore (SQL string construction is intentional for ArcadeDB HTTP/JSON API with proper value escaping) - Set requests>=2.28.0 minimum to ensure Python 3.13 compatibility (older versions use removed cgi module)

julian-risch

Thank you for opening this pull request @lvca ! Looks quite good to me already but there are also a couple of things we need to change before we can merge it. I'll make most of the changes myself. What you could please do is open a PR in https://github.com/deepset-ai/haystack-integrations with a page to highlight this new integration using this structure. Feel free to reuse what you had in mind for the README.md there.

.github/labeler.yml

integrations/arcadedb/pyproject.toml

integrations/arcadedb/README.md

...ons/arcadedb/src/haystack_integrations/components/retrievers/arcadedb/embedding_retriever.py

integrations/arcadedb/src/haystack_integrations/document_stores/arcadedb/document_store.py

integrations/arcadedb/tests/test_document_store.py

julian-risch · 2026-03-02T12:17:05Z

I ran the example code successfully on my local machine and applied suggestions from my code review. Main task remaining is to use mixin DocumentStore tests based on https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py#L252 I'll do that next.

…rievers/arcadedb/embedding_retriever.py

…s/arcadedb/document_store.py

julian-risch · 2026-03-02T13:03:25Z

I updated the tests to use DocumentStore mixin tests. Both unit and integration tests passed locally.

lvca requested a review from a team as a code owner February 28, 2026 23:10

lvca requested review from julian-risch and removed request for a team February 28, 2026 23:10

github-actions bot added topic:CI type:documentation Improvements or additions to documentation labels Feb 28, 2026

lvca added 3 commits February 28, 2026 18:33

style: apply ruff format to all source files

e031eb4

fix: add type annotation to resolve mypy assignment error

2f1f2a6

julian-risch requested changes Mar 2, 2026

View reviewed changes

Apply suggestions from code review

cd3c8a4

julian-risch self-assigned this Mar 2, 2026

julian-risch and others added 8 commits March 2, 2026 13:21

format and fix docstrings

9922db9

Update integrations/arcadedb/src/haystack_integrations/components/ret…

6827d0f

…rievers/arcadedb/embedding_retriever.py

Update .github/labeler.yml

fc8a7a8

Update integrations/arcadedb/src/haystack_integrations/document_store…

031932f

…s/arcadedb/document_store.py

update license for consistency

5b29fe1

use mixin DocumentStore tests and unify error handling

3498a7a

reuse variable in raise ValueError calls

a535440

add conftest

36d08dd

use action secret for ArcadeDB

4d8b624

julian-risch self-requested a review March 2, 2026 13:09

julian-risch approved these changes Mar 2, 2026

View reviewed changes

julian-risch mentioned this pull request Mar 2, 2026

Add missing operations to ArcadeDBDocumentstore #2906

Open

julian-risch added 2 commits March 2, 2026 14:19

wait for ArcadeDB service to start

b379a14

use default ARCADEDB_PASSWORD in forks

dea7a7e

julian-risch merged commit d65b2fd into deepset-ai:main Mar 2, 2026
8 checks passed

julian-risch mentioned this pull request Mar 2, 2026

Follow up tasks for new ArcadeDB integration #2907

Open

9 tasks

julian-risch added the integration:arcadedb label Mar 2, 2026

julian-risch mentioned this pull request Mar 2, 2026

Add page for new ArcadeDB integration deepset-ai/haystack-integrations#408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ArcadeDB document store integration#2898

feat: add ArcadeDB document store integration#2898
julian-risch merged 16 commits intodeepset-ai:mainfrom
lvca:feat/arcadedb-integration

lvca commented Feb 28, 2026

Uh oh!

CLAassistant commented Feb 28, 2026 •

edited

Loading

Uh oh!

julian-risch left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julian-risch commented Mar 2, 2026

Uh oh!

julian-risch commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lvca commented Feb 28, 2026

Summary

Key design decisions

How did you test it?

Checklist

Uh oh!

CLAassistant commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julian-risch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julian-risch commented Mar 2, 2026

Uh oh!

julian-risch commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Feb 28, 2026 •

edited

Loading