Skip to content

Commit 74a1566

Browse files
stanmartCopilotclaude
authored
Support polars, etc. through narwhals (#957)
* Start working on narwhalizing glum again * Some fixes * Make category expansion work * Delete temporary file * Some additional tests and test updates * Add narwhals as a dependency * `feature_dtypes_` backwards compatibility * I don't think this was doing anything * Add back the check for non-contiguous arrays * Use the same default * Disclaimer (should we just deprecate/ignore this argument?) * Bump minimum tabmat (and, by necessity, sklearn and python) version * Bump minimum versions * Avoid large diff * Make sure manifests are consistent * Adjust tests * Polars golden master tests * Additional polars tests * Add support for unpickling old models * Polar's global string cache bites again * Enums are fine, though * Version bumps * Different runner * Different runners again * Review comment Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove CV pickle compatibility test The GeneralizedLinearRegressorCV fitting behavior changed intentionally on main (alpha path computation + test weight normalization), so the v3.0 pickle predictions no longer match a fresh fit. Regenerate the non-CV pickle with actual glum 3.0.0 and drop the CV test case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Regenerate v3.0 pickle using glum 3.0.0 + pandas 1.4.4 The previous pickle was generated with pandas 3.0.1, causing a StringDtype constructor mismatch when loaded in the oldies environment (pandas 1.4.4). Regenerate using matching versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * pre-commit * oops * Add changelog entry for Polars support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9c0c095 commit 74a1566

20 files changed

Lines changed: 6083 additions & 7562 deletions

.github/workflows/package.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
include:
2323
- { variant-file: linux_64_numpy2.0python3.10.____cpython, target-platform: linux-64, os: ubuntu-latest, rattler-build-args: '' }
2424
- { variant-file: linux_64_numpy2python3.13.____cp313, target-platform: linux-64, os: ubuntu-latest, rattler-build-args: '' }
25-
- { variant-file: osx_64_numpy2.0python3.10.____cpython, target-platform: osx-64, os: macos-latest, rattler-build-args: '' }
25+
- { variant-file: osx_64_numpy2.0python3.10.____cpython, target-platform: osx-64, os: macos-15-intel, rattler-build-args: '' }
2626
- { variant-file: osx_arm64_numpy2.0python3.10.____cpython, target-platform: osx-arm64, os: macos-latest, rattler-build-args: '' }
2727
- { variant-file: osx_arm64_numpy2python3.13.____cp313, target-platform: osx-arm64, os: macos-latest, rattler-build-args: '' }
2828
- { variant-file: win_64_numpy2.0python3.10.____cpython, target-platform: win-64, os: windows-latest, rattler-build-args: '' }

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Changelog
1313
**New features:**
1414

1515
- Add ``solver="closed-form"`` for Gaussian identity-link models, using an analytical normal-equations solution for ridge/OLS, auto-selecting it under ``solver="auto"`` for unconstrained no-L1 cases, and falling back to least-squares for singular or ill-conditioned systems.
16+
- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` now accept `Polars <https://pola.rs>`_ DataFrames as input, in addition to pandas DataFrames and numpy arrays.
1617

1718
3.1.3 - 2025-02-18
1819
------------------

conda.recipe/recipe.yaml

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,21 +36,25 @@ requirements:
3636
- if: osx
3737
then:
3838
- llvm-openmp
39-
- numpy
39+
- numpy >=1.24
4040
- pip
41-
- scikit-learn >=0.23
41+
- scikit-learn >=1.1.0
4242
- setuptools
4343
- setuptools-scm
4444
run:
4545
- python
46-
- formulaic >=0.6
46+
- formulaic >=1.2.0
4747
- joblib
48+
- narwhals >=2.0.0
4849
- numexpr
50+
- numpy >=1.24
4951
- packaging
50-
- pandas
51-
- scikit-learn >=0.23
52-
- scipy
53-
- tabmat >=4.0.0
52+
- pandas >=1.4
53+
- pyarrow
54+
- scikit-learn >=1.1.0
55+
- scipy >=1.8.0
56+
- tabmat >=4.2.0
57+
- tqdm
5458

5559
tests:
5660
- python:

pixi.lock

Lines changed: 5406 additions & 7344 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pixi.toml

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -48,18 +48,20 @@ cmd = """
4848
c-compiler = "*"
4949
cxx-compiler = "*"
5050
cython = "*"
51-
formulaic = "*"
51+
formulaic = ">=1.2.0"
52+
narwhals = ">=2.0.0"
5253
numexpr = "*"
54+
numpy = ">=1.24"
5355
packaging = "*"
5456
pandas = ">=1.4"
5557
pip = "*"
5658
pyarrow = "*"
57-
python = ">=3.9"
58-
scikit-learn = ">=0.23"
59-
scipy = "*"
59+
python = ">=3.10"
60+
scikit-learn = ">=1.1.0"
61+
scipy = ">=1.8.0"
6062
setuptools = ">=61"
6163
setuptools-scm = ">=8.1"
62-
tabmat = ">=4.0.0"
64+
tabmat = ">=4.2.0"
6365
tqdm = "*"
6466
wheel = "*"
6567

@@ -77,6 +79,7 @@ attrs = "*"
7779
click = "*"
7880
git_root = "*"
7981
mypy = "*"
82+
polars = "*"
8083
psutil = "*"
8184
pytest = "*"
8285
pytest-xdist = "*"
@@ -146,15 +149,13 @@ cxx-compiler = "*"
146149
cython = "!=3.0.4"
147150
make = "*"
148151
mako = "*"
149-
narwhals = ">=1.4.1"
152+
narwhals = ">=2.0.0"
150153
pip = "*"
151154
setuptools-scm = "*"
152155
xsimd = "<11|>12.1"
153156
[feature.build-tabmat.target.unix.dependencies]
154157
jemalloc-local = "*"
155158

156-
[feature.py39.dependencies]
157-
python = "3.9.*"
158159
[feature.py310.dependencies]
159160
python = "3.10.*"
160161
[feature.py311.dependencies]
@@ -165,12 +166,14 @@ python = "3.12.*"
165166
python = "3.13.*"
166167

167168
[feature.oldies.dependencies]
168-
formulaic = "0.6.*"
169+
formulaic = "1.2.*"
170+
narwhals = "2.0.*"
171+
numpy = "1.24.*"
169172
pandas = "1.4.*"
170-
python = "3.9.*"
171-
scikit-learn = "0.24.*"
172-
scipy = "1.7.*"
173-
tabmat = "4.0.*"
173+
python = "3.10.*"
174+
scikit-learn = "==1.1.0"
175+
scipy = "1.8.*"
176+
tabmat = "==4.2.0"
174177

175178
[environments]
176179
benchmark = ["benchmark"]
@@ -189,4 +192,3 @@ py310 = ["py310", "test"]
189192
py311 = ["py311", "test"]
190193
py312 = ["py312", "test"]
191194
py313 = ["py313", "test"]
192-
py39 = ["py39", "test"]

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ select = [
3131
known-first-party = ["glum", "glum_benchmarks"]
3232

3333
[tool.mypy]
34-
python_version = "3.9"
34+
python_version = '3.10'
3535
exclude = [
3636
"^tests/",
3737
"^\\.pixi/",

setup.py

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -63,24 +63,27 @@
6363
license="BSD",
6464
classifiers=[ # Optional
6565
"Programming Language :: Python :: 3",
66-
"Programming Language :: Python :: 3.9",
6766
"Programming Language :: Python :: 3.10",
6867
"Programming Language :: Python :: 3.11",
6968
"Programming Language :: Python :: 3.12",
7069
"Programming Language :: Python :: 3.13",
7170
],
7271
package_dir={"": "src"},
7372
packages=find_packages(where="src", include=["glum"]),
74-
python_requires=">=3.9",
73+
python_requires=">=3.10",
7574
install_requires=[
75+
"formulaic>=1.2.0",
7676
"joblib",
77+
"narwhals>=2.0.0",
7778
"numexpr",
78-
"numpy",
79-
"pandas",
80-
"scikit-learn>=0.23",
81-
"scipy",
82-
"formulaic>=0.6",
83-
"tabmat>=4.0.0",
79+
"numpy>=1.24",
80+
"packaging",
81+
"pandas>=1.4",
82+
"pyarrow",
83+
"scikit-learn>=1.1.0",
84+
"scipy>=1.8.0",
85+
"tabmat>=4.1.5",
86+
"tqdm",
8487
],
8588
entry_points=None,
8689
ext_modules=cythonize(

src/glum/_distribution.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1574,7 +1574,7 @@ def guess_intercept(
15741574
if (not isinstance(link, IdentityLink)) and (len(np.unique(y)) == 1):
15751575
raise ValueError("No variation in `y`. Coefficients can't be estimated.")
15761576

1577-
avg_y: float = np.average(y, weights=sample_weight)
1577+
avg_y = np.average(y, weights=sample_weight)
15781578

15791579
if isinstance(link, IdentityLink):
15801580
# This is only correct for the normal. For other distributions, the
@@ -1585,7 +1585,7 @@ def guess_intercept(
15851585

15861586
avg_eta = eta if np.isscalar(eta) else np.average(eta, weights=sample_weight)
15871587

1588-
return avg_y - avg_eta
1588+
return avg_y - avg_eta # type: ignore[operator]
15891589

15901590
elif isinstance(link, LogLink):
15911591
# This is only correct for Tweedie

0 commit comments

Comments
 (0)