You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
14
+
Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression, and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction, and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
15
15
16
16
We believe that for GLM development, broad support for distributions, regularization, and statistical inference, along with fast formula-based specification, is key. `glum` supports
17
17
18
-
* Built-in crossvalidation for optimal regularization, efficiently exploiting a “regularization path”
18
+
* Built-in cross-validation for optimal regularization, efficiently exploiting a “regularization path”
19
19
* L1 regularization, which produces sparse and easily interpretable solutions
20
20
* L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
21
21
* Elastic net regularization
22
-
* Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
22
+
* Normal, Poisson, binomial, gamma, inverse Gaussian, negative binomial, and Tweedie distributions, plus varied and customizable link functions
23
23
* Built-in formula-based model specification using `formulaic`
24
24
* Classical statistical inference for unregularized models
25
25
* Box constraints, linear inequality constraints, sample weights, offsets
26
+
* Support for multiple dataframe backends (pandas, polars, and more) via `narwhals`
26
27
27
28
Performance also matters, so we conducted extensive benchmarks against other modern libraries. Although performance depends on the specific problem, we find that when N >> K (there are more observations than predictors), `glum` is consistently much faster for a wide range of problems. This repo includes the benchmarking tools in the `glum_benchmarks` module. For details, [see here](glum_benchmarks/README.md).
28
29
@@ -33,19 +34,17 @@ Performance also matters, so we conducted extensive benchmarks against other mod
33
34
34
35
For more information on `glum`, including tutorials and API reference, please see [the documentation](https://glum.readthedocs.io/en/latest/).
35
36
36
-
Why did we choose the name `glum`? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professionalsounding name, feel free to pronounce it as G-L-um. Or maybe it stands for "Generalized linear... ummm... modeling?"
37
+
Why did we choose the name `glum`? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional-sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for "Generalized linear... ummm... modeling?"
37
38
38
39
# A classic example predicting housing prices
39
40
40
41
```python
41
42
>>>import pandas as pd
42
-
>>>from sklearn.datasets import fetch_openml
43
43
>>>from glum import GeneralizedLinearRegressor
44
44
>>>
45
45
>>># This dataset contains house sale prices for King County, which includes
46
46
>>># Seattle. It includes homes sold between May 2014 and May 2015.
47
-
>>># The full version of this dataset can be found at:
0 commit comments