Skip to content

Conversation

@Imbruced
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

  • Yes, the URL of the associated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-XXX. The PR name follows the format [SEDONA-XXX] my subject.

  • No:

    • this is a documentation update. The PR name follows the format [DOCS] my subject
    • this is a CI update. The PR name follows the format [CI] my subject

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number in vX.Y.Z format.
  • Yes, I have updated the documentation.
  • No, this PR does not affect any public API so no need to change the documentation.

@jiayuasu
Copy link
Member

@paleolimbot

@Imbruced
Copy link
Member Author

I ll fix the missing function issue

@Imbruced
Copy link
Member Author

Starting from Spark 4.0, we can pass the whole arrow table to Spark.createDataFrame. I don't know when the release will be.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I'm new to this code base, so consider my comments optional nits 🙂

Starting from Spark 4.0, we can pass the whole arrow table to Spark.createDataFrame

Based on this PR I'm happy to attempt backporting GeoArrow import of anything implementing __arrow_c_stream__, circumventing a materialize of the GeoPandas data frame as a follow-up 🙂

from pyspark.sql import SparkSession
from pyspark.sql import DataFrame
from pyspark.sql.types import StructType, StructField, DataType, ArrayType, MapType
import pyarrow as pa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the dependency situation is like for spark, but it may be worth making this a lazy import (e.g., like in dataframe_to_arrow so that when we import from seconda.utils.geoarrow from sedona/spark/__init__.py we don't necessarily require pyarrow to be installed (alternatively, we could add pyarrow to the apache-sedona[spark] extras to match the runtime requirement).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll make all the changes later today. Thank you for the review!

return [gen_new_name[name]() for name in names]


def _deduplicate_field_names(dt: DataType) -> DataType:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _deduplicate_field_names(dt: DataType) -> DataType:
# Backport from Spark 4.0
# https://github.com/apache/spark/blob/3515b207c41d78194d11933cd04bddc21f8418dd/python/pyspark/sql/pandas/types.py#L1385
def _deduplicate_field_names(dt: DataType) -> DataType:

@Imbruced
Copy link
Member Author

This is awesome! I'm new to this code base, so consider my comments optional nits 🙂

Starting from Spark 4.0, we can pass the whole arrow table to Spark.createDataFrame

Based on this PR, I'm happy to attempt backporting GeoArrow import of anything implementing __arrow_c_stream__, circumventing a materialize of the GeoPandas data frame as a follow-up 🙂

@paleolimbot
That would be great. When writing Chapter 6, I included examples with the code you provided, which significantly improves the transformation time when we create geopandas from Sedona. Adding something similar to Sedona from Geopandas would be great, so I added this MR. I am happy to apply all the changes you mentioned.

@jiayuasu jiayuasu changed the title SEDONA-714 Add geopandas to spark arrow conversion. [SEDONA-714] Add geopandas to spark arrow conversion. Feb 24, 2025
@Imbruced Imbruced marked this pull request as ready for review February 25, 2025 10:18
@Imbruced Imbruced requested a review from jiayuasu as a code owner February 25, 2025 10:18
Copy link
Member

@jiayuasu jiayuasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documentation to this page? https://sedona.apache.org/latest/tutorial/geopandas-shapely/

@Imbruced
Copy link
Member Author

Can you add documentation to this page? https://sedona.apache.org/latest/tutorial/geopandas-shapely/

sure

@Imbruced Imbruced force-pushed the SEDONA-714-add-geopandas-to-spark-arrow-conversion branch from f328661 to 1c96da0 Compare February 25, 2025 22:17
@jiayuasu jiayuasu added this to the sedona-1.7.1 milestone Feb 26, 2025
@jiayuasu jiayuasu merged commit ebd6f67 into master Feb 26, 2025
28 checks passed
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late review...this is awesome! Thank you!

@jiayuasu jiayuasu deleted the SEDONA-714-add-geopandas-to-spark-arrow-conversion branch February 28, 2025 17:33
Kontinuation added a commit to Kontinuation/sedona that referenced this pull request Jan 21, 2026
…#386)

* [DOCS] Run Python black on Markdown code blocks (apache#1797)

* [CI] pre-commit autoupdate; configure `bandit[toml]` dependency (apache#1799)

Under bandit settings it lists the additional dependency for toml files

https://bandit.readthedocs.io/en/latest/config.html#bandit-settings

* [DOCS] Fix spelling (apache#1800)

* [CI] pre-commit: auto add license headers to `.c` and `.h` files (apache#1802)

* [CI] Update asf.yml (apache#1803)

* Commit

* Add john too

* [DOCS] Add Pranav Toggi to the Committers list (apache#1806)

* .asf.yaml: remove committer jbampton from collaborators (apache#1805)

https://github.com/apache/infrastructure-asfyaml?tab=readme-ov-file#assigning-the-github-triage-role-to-external-collaborators

"Projects may assign external (non-committer) collaborators the triage role for their repository."

* [DOCS] Improve Makefile by Using requirements-docs.txt for Documentation Dependencies (apache#1808)

* Update Makefile

* Create requirements-docs.txt

* Update Makefile

* Update Makefile

* Update Makefile

* [CI] pre-commit: auto add license check for Java files (apache#1807)

* [DOCS] Fix spelling (apache#1804)

* [DOCS] Add geojson docs (apache#1814)

* use dashes not underscores

* fix whitespace

* update based on pr comments

* [DOCS] Add Matomo to Sedona website (apache#1820)

* [DOC] Update ST_KNN documentation for left inner join support and inner kNN join details (apache#1821)

* [DOCS] Correct the document for ST_MakeValid (apache#1822)

* [DOCS] add geoparquet docs page (apache#1818)

* add geoparquet docs page

* use linter

* centralize content on geoparquet page

* lint file

---------

Co-authored-by: Jia Yu <[email protected]>

* [DOCS] add docs on csv files (apache#1824)

* [DOCS] add spatial joins (apache#1829)

* [DOCS] add spatial joins page

* add alt text to images

* update spatial joins based on pr comments

* Update docs/tutorial/concepts/spatial-joins.md

---------

Co-authored-by: Jia Yu <[email protected]>

* Add several frequent contributors (apache#1833)

* [SEDONA-714] Add geopandas to spark arrow conversion. (apache#1825)

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* Update python/sedona/utils/geoarrow.py

Co-authored-by: Dewey Dunnington <[email protected]>

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add docs.

* SEDONA-714 Add docs.

---------

Co-authored-by: Dewey Dunnington <[email protected]>

* [SEDONA-713] add OSM PBF reader (apache#1823)

* Add OSM PBF reader.

Add documentation.

Add documentation.

Add documentation.

Add documentation.

Add documentation.

SEDONA-713 moving to common.

* SEDONA-713 Add docs.

* SEDONA-713 Add docs.

* SEDONA-713 Add docs.

* SEDONA-714 Add docs.

* [DOCS] add geopackage docs (apache#1835)

* [DOCS] add shapefiles documentation page (apache#1837)

* build(deps): bump com.google.protobuf:protobuf-java in /shade-proto (apache#1834)

Bumps [com.google.protobuf:protobuf-java](https://github.com/protocolbuffers/protobuf) from 4.28.0 to 4.28.2.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: com.google.protobuf:protobuf-java
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [SEDONA-717] Fix `dataframe_to_arrow()` for zero-row results (apache#1840)

* fix zero-row case

* typo

* fix lint

* [SEDONA-718] Auto Detect geometry column in GeoJSON writer (apache#1841)

* [SEDONA-719] Support reading Shapefile with Z/M ordinates (apache#1842)

* [DOCS] Fix lint issue

* Fix shade-proto pom file name

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: John Bampton <[email protected]>
Co-authored-by: Max Base <[email protected]>
Co-authored-by: Matthew Powers <[email protected]>
Co-authored-by: Feng Zhang <[email protected]>
Co-authored-by: ruanqizhen <[email protected]>
Co-authored-by: Paweł Tokaj <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kristin Cowalcijk <[email protected]>
Kontinuation pushed a commit to Kontinuation/sedona that referenced this pull request Jan 21, 2026
* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add geopandas to spark arrow conversion.

* Update python/sedona/utils/geoarrow.py

Co-authored-by: Dewey Dunnington <[email protected]>

* SEDONA-714 Add geopandas to spark arrow conversion.

* SEDONA-714 Add docs.

* SEDONA-714 Add docs.

---------

Co-authored-by: Dewey Dunnington <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants