You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
A Wikipedia article would be useful for Apache DataFusion to make the project easier to discover, easier to explain, and easier to cite from a neutral source.
The main benefit is not “marketing copy”; it is legitimacy and referenceability.
This is even more important these days when Wikipedia is a core training corpus for LLMs and search engine results
It makes the project easier for journalists, analysts, conference organizers, students, and procurement people to cite quickly.
It strengthens search visibility and entity recognition. In practice Wikipedia pages often feed search summaries, knowledge panels, mirrors, and LLM retrieval.
It signals that the project is notable beyond its own community because the article must be supported by independent reliable sources.
It gives a durable place to document ecosystem facts like history, governance, and adoption that do not fit cleanly into product docs.
Describe the solution you'd like
I would like a neutral wikipedia page for Apache DataFusion
DuckDB’s page shows the pattern clearly: a short neutral definition, history, architecture, language bindings, commercial use, and foundation/governance in one place, with references to papers and third-party coverage
Describe alternatives you've considered
I think a strong article will include many citations. Here are a bunch I found with the help of codex
Some third-party citations that are probably useful for this article:
CRN: “The 10 Coolest Open-Source Software Tools Of 2024”
(https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3)
It explicitly includes Apache DataFusion and describes it as a fast extensible query engine, notes
its Rust/Arrow basis, and mentions its 2024 top-level-project milestone. This is a strongest source on that page for general
notability.
Datanami: “How the FDAP Stack Gives InfluxDB 3.0 Real-Time Speed, Efficiency”
(https://www.datanami.com/2024/03/15/how-the-fdap-stack-gives-influxdb-3-0-real-time-speed-efficiency/)
This quotes Paul Dix saying DataFusion had matured substantially and had best-in-class performance on a number of queries versus other
columnar query engines. It is not a ranking article, but it is meaningful third-party validation of technical importance.
Spice AI: “How we use Apache DataFusion at Spice AI” (https://spice.ai/blog/how-we-use-apache-datafusion-at-spice-ai)
This says Spice uses DataFusion as its SQL query engine and extends it with custom TableProviders, optimizer rules, and UDFs for
federated SQL workloads.
Cloudflare Log Explorer GA announcement (https://blog.cloudflare.com/logexplorer-ga/) from June 10, 2025.
Queriers fetch matching files from R2 and “process SQL queries using Apache DataFusion.”
Pydantic Logfire issue: “We’re changing database” (We're changing database pydantic/logfire#408)
Usable as a primary source for adoption only. It says Logfire is moving from Timescale to a custom database built on DataFusion and
gives reasons.
Kamu: “100X faster ingestion, and FlightSQL support for connecting BI tools” (https://www.kamu.dev/blog/2023-09-datafusion-flightsql/)
Good third-party primary source for ecosystem adoption. It explicitly says Kamu added support for Apache DataFusion and reports
performance claims in its own product.
Is your feature request related to a problem or challenge?
A Wikipedia article would be useful for Apache DataFusion to make the project easier to discover, easier to explain, and easier to cite from a neutral source.
The main benefit is not “marketing copy”; it is legitimacy and referenceability.
This is even more important these days when Wikipedia is a core training corpus for LLMs and search engine results
Describe the solution you'd like
I would like a neutral wikipedia page for Apache DataFusion
Here are some similar pages
DuckDB’s page shows the pattern clearly: a short neutral definition, history, architecture, language bindings, commercial use, and foundation/governance in one place, with references to papers and third-party coverage
Describe alternatives you've considered
I think a strong article will include many citations. Here are a bunch I found with the help of codex
Some third-party citations that are probably useful for this article:
SIGMOD 2024 technical paper
(https://2024.sigmod.org/industrial-list.shtml), SIGMOD session listing (https://2024.sigmod.org/program_sigmod.shtml).
Citations for technical importance
crates.io: 17,668,287 all time downloads (https://crates.io/crates/datafusion)
CRN: “The 10 Coolest Open-Source Software Tools Of 2024”
(https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3)
It explicitly includes Apache DataFusion and describes it as a fast extensible query engine, notes
its Rust/Arrow basis, and mentions its 2024 top-level-project milestone. This is a strongest source on that page for general
notability.
Datanami: “How the FDAP Stack Gives InfluxDB 3.0 Real-Time Speed, Efficiency”
(https://www.datanami.com/2024/03/15/how-the-fdap-stack-gives-influxdb-3-0-real-time-speed-efficiency/)
This quotes Paul Dix saying DataFusion had matured substantially and had best-in-class performance on a number of queries versus other
columnar query engines. It is not a ranking article, but it is meaningful third-party validation of technical importance.
Third-party citations for usage in products
SiliconANGLE: “Enterprise DB begins rolling AI features into PostgreSQL”
(https://siliconangle.com/2024/05/23/enterprise-db-begins-rolling-ai-features-postgresql/)
Independent coverage stating EDB combined Apache DataFusion, Arrow, and Delta Lake in its analytics/lakehouse capability.
Spice AI: “How we use Apache DataFusion at Spice AI” (https://spice.ai/blog/how-we-use-apache-datafusion-at-spice-ai)
This says Spice uses DataFusion as its SQL query engine and extends it with custom TableProviders, optimizer rules, and UDFs for
federated SQL workloads.
Cloudflare Log Explorer GA announcement (https://blog.cloudflare.com/logexplorer-ga/) from June 10, 2025.
Queriers fetch matching files from R2 and “process SQL queries using Apache DataFusion.”
InfluxData: “Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0”
(https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/)
Clearly states InfluxDB 3.0 chose DataFusion as its query engine foundation and explains why.
Pydantic Logfire issue: “We’re changing database” (We're changing database pydantic/logfire#408)
Usable as a primary source for adoption only. It says Logfire is moving from Timescale to a custom database built on DataFusion and
gives reasons.
Palantir Foundry announcements for July 2025 (https://www.palantir.com/docs/foundry/announcements/2025-07)
This says lightweight pipelines are “powered by DataFusion,”
Cube: “Query pushdown in Cube’s semantic layer” (https://cube.dev/blog/query-push-down-in-cubes-semantic-layer)
Good third-party primary source for “used in production by Cube” and for describing how Cube uses DataFusion internally.
Kamu: “100X faster ingestion, and FlightSQL support for connecting BI tools” (https://www.kamu.dev/blog/2023-09-datafusion-flightsql/)
Good third-party primary source for ecosystem adoption. It explicitly says Kamu added support for Apache DataFusion and reports
performance claims in its own product.
LanceDB: “Columnar File Readers in Depth: APIs and Fusion” (https://lancedb.com/blog/columnar-file-readers-in-depth-apis-and-fusion/)
Usable for ecosystem context. It says Lance uses DataFusion extensively and demonstrates integration with it.
Bauplan Labs: “Duck Hunt: moving Bauplan from DuckDB to DataFusion”
(https://www.bauplanlabs.com/post/duck-hunt-moving-bauplan-from-duckdb-to-datafusion)
Bauplan explains the migration as driven by DataFusion’s Arrow-first architecture, extensibility, and community-driven development.
Additional context
No response