Write a wikipedia article for Apache DataFusion

### Is your feature request related to a problem or challenge?

A Wikipedia article would be useful for Apache DataFusion to make the project easier to discover, easier to explain, and easier to cite from a neutral source. 

The main benefit is not “marketing copy”; it is legitimacy and referenceability.

This is even more important these days when Wikipedia is a core training corpus for LLMs and search engine results

  - It gives newcomers a neutral landing page distinct from https://datafusion.apache,org,
  - It makes the project easier for journalists, analysts, conference organizers, students, and procurement people to cite quickly.
  - It strengthens search visibility and entity recognition. In practice Wikipedia pages often feed search summaries, knowledge panels, mirrors, and LLM retrieval.
  - It signals that the project is notable beyond its own community because the article must be supported by independent reliable sources.
  - It gives a durable place to document ecosystem facts like history, governance, and adoption that do not fit cleanly into product docs.


### Describe the solution you'd like

I would like a neutral wikipedia page for Apache DataFusion

Here are some similar pages
- https://en.wikipedia.org/wiki/DuckDB
- https://en.wikipedia.org/wiki/Apache_Spark
- https://en.wikipedia.org/wiki/Polars_(software)

DuckDB’s page shows the pattern clearly: a short neutral definition, history, architecture, language bindings, commercial use, and foundation/governance in one place, with references to papers and third-party coverage





### Describe alternatives you've considered

I think a strong article will include many citations. Here are a bunch I found with the help of codex

Some third-party citations that are probably useful for this article:

- A standalone Apache top-level project as of April 16, 2024, announced publicly by the Apache Arrow PMC and ASF (Apache Arrow blog (https://arrow.apache.org/blog/2024/05/07/datafusion-tlp/), ASF announcement (https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-datafusion)). 

SIGMOD 2024 technical paper

  - It appears in the SIGMOD 2024 program as an accepted industry-track paper: SIGMOD accepted papers
    (https://2024.sigmod.org/industrial-list.shtml), SIGMOD session listing (https://2024.sigmod.org/program_sigmod.shtml).
  - The DOI is 10.1145/3626246.3653368 (https://dl.acm.org/doi/10.1145/3626246.3653368).

Citations for technical importance
   - crates.io: 17,668,287 all time downloads (https://crates.io/crates/datafusion)

  - CRN: “The 10 Coolest Open-Source Software Tools Of 2024”
    (https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3)
    It explicitly includes Apache DataFusion and describes it as a fast extensible query engine, notes
    its Rust/Arrow basis, and mentions its 2024 top-level-project milestone. This is a strongest source on that page for general
    notability.

  - Datanami: “How the FDAP Stack Gives InfluxDB 3.0 Real-Time Speed, Efficiency”
    (https://www.datanami.com/2024/03/15/how-the-fdap-stack-gives-influxdb-3-0-real-time-speed-efficiency/)
    This quotes Paul Dix saying DataFusion had matured substantially and had best-in-class performance on a number of queries versus other
    columnar query engines. It is not a ranking article, but it is meaningful third-party validation of technical importance.


Third-party citations for usage in products

  - SiliconANGLE: “Enterprise DB begins rolling AI features into PostgreSQL”
    (https://siliconangle.com/2024/05/23/enterprise-db-begins-rolling-ai-features-postgresql/)
    Independent coverage stating EDB combined Apache DataFusion, Arrow, and Delta Lake in its analytics/lakehouse capability.

  - Spice AI: “How we use Apache DataFusion at Spice AI” (https://spice.ai/blog/how-we-use-apache-datafusion-at-spice-ai)
    This says Spice uses DataFusion as its SQL query engine and extends it with custom TableProviders, optimizer rules, and UDFs for
    federated SQL workloads.

  - Cloudflare Log Explorer GA announcement (https://blog.cloudflare.com/logexplorer-ga/) from June 10, 2025.
    Queriers fetch matching files from R2 and “process SQL queries using Apache DataFusion.”

  - InfluxData: “Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0”
    (https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/)
    Clearly states InfluxDB 3.0 chose DataFusion as its query engine foundation and explains why.

  - Pydantic Logfire issue: “We’re changing database” (https://github.com/pydantic/logfire/issues/408)
    Usable as a primary source for adoption only. It says Logfire is moving from Timescale to a custom database built on DataFusion and
    gives reasons. 
  
  - Palantir Foundry announcements for July 2025 (https://www.palantir.com/docs/foundry/announcements/2025-07)
    This says lightweight pipelines are “powered by DataFusion,” 

  - Cube: “Query pushdown in Cube’s semantic layer” (https://cube.dev/blog/query-push-down-in-cubes-semantic-layer)
    Good third-party primary source for “used in production by Cube” and for describing how Cube uses DataFusion internally.
  
  - Kamu: “100X faster ingestion, and FlightSQL support for connecting BI tools” (https://www.kamu.dev/blog/2023-09-datafusion-flightsql/)
    Good third-party primary source for ecosystem adoption. It explicitly says Kamu added support for Apache DataFusion and reports
    performance claims in its own product.
  
  - LanceDB: “Columnar File Readers in Depth: APIs and Fusion” (https://lancedb.com/blog/columnar-file-readers-in-depth-apis-and-fusion/)
    Usable for ecosystem context. It says Lance uses DataFusion extensively and demonstrates integration with it.

- Bauplan Labs: “Duck Hunt: moving Bauplan from DuckDB to DataFusion”
    (https://www.bauplanlabs.com/post/duck-hunt-moving-bauplan-from-duckdb-to-datafusion)
    Bauplan explains the migration as driven by DataFusion’s Arrow-first architecture, extensibility, and community-driven development.


### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write a wikipedia article for Apache DataFusion #21076

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Write a wikipedia article for Apache DataFusion #21076

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions