Skip to content

Commit f8bf11d

Browse files
Merge pull request #69 from scikit-learn-contrib/joss-paper
Joss paper
2 parents f41318b + b70616b commit f8bf11d

17 files changed

Lines changed: 18395 additions & 6 deletions

File tree

.github/workflows/draft-pdf.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name: Draft PDF
2+
on:
3+
push:
4+
paths:
5+
- paper/**
6+
- .github/workflows/draft-pdf.yml
7+
8+
jobs:
9+
paper:
10+
runs-on: ubuntu-latest
11+
name: Paper Draft
12+
steps:
13+
- name: Checkout
14+
uses: actions/checkout@v4
15+
- name: Build draft PDF
16+
uses: openjournals/openjournals-draft-action@master
17+
with:
18+
journal: joss
19+
paper-path: paper/paper.md
20+
- name: Upload
21+
uses: actions/upload-artifact@v4
22+
with:
23+
name: paper
24+
path: paper/paper.pdf

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,3 +76,7 @@ target/
7676

7777
# auto-generated files
7878
bde/_version.py
79+
80+
# Ide
81+
82+
.vscode

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Introduction
1313
------------
1414

1515
**bde** is a user-friendly implementation of Bayesian Deep Ensembles compatible with
16-
scikit-learn with a particular focus on tabular data. It exposes estimators that plug
17-
into scikit-learn pipelines while leveraging JAX for accelerator-backed training,
16+
scikit-learn with a particular focus on tabular data. It exposes estimators that plug
17+
into scikit-learn pipelines while leveraging JAX for accelerator-backed training,
1818
sampling, and uncertainty quantification.
1919

2020
In particular, **bde** implements **Microcanonical Langevin Ensembles (MILE)** as
@@ -27,7 +27,7 @@ A conceptual overview of MILE is shown below:
2727

2828

2929
**Scope:** As of right now this package supports full-batch MILE for fully connected
30-
feedforward networks, covering classification and regression on tabular data.
30+
feedforward networks, covering classification and regression on tabular data.
3131
The method can however also be applied to other
3232
architectures and data modalities, but these are not yet in scope of this
3333
particular implementation.

bde/bde.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -747,8 +747,6 @@ def predict(
747747
x: ArrayLike,
748748
mean_and_std: bool = False,
749749
credible_intervals: list[float] | None = None,
750-
# Docstring necessary to explain this parameter which
751-
# actually lists quantiles not the intervals
752750
raw: bool = False,
753751
):
754752
"""Predict regression targets with optional uncertainty summaries.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
==========================================
2+
Bike Sharing Dataset
3+
==========================================
4+
5+
Hadi Fanaee-T
6+
7+
Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto
8+
INESC Porto, Campus da FEUP
9+
Rua Dr. Roberto Frias, 378
10+
4200 - 465 Porto, Portugal
11+
12+
13+
=========================================
14+
Background
15+
=========================================
16+
17+
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return
18+
back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return
19+
back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of
20+
over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic,
21+
environmental and health issues.
22+
23+
Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by
24+
these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration
25+
of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into
26+
a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important
27+
events in the city could be detected via monitoring these data.
28+
29+
=========================================
30+
Data Set
31+
=========================================
32+
Bike-sharing rental process is highly correlated to the environmental and seasonal settings. For instance, weather conditions,
33+
precipitation, day of week, season, hour of the day, etc. can affect the rental behaviors. The core data set is related to
34+
the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA which is
35+
publicly available in http://capitalbikeshare.com/system-data. We aggregated the data on two hourly and daily basis and then
36+
extracted and added the corresponding weather and seasonal information. Weather information are extracted from http://www.freemeteo.com.
37+
38+
=========================================
39+
Associated tasks
40+
=========================================
41+
42+
- Regression:
43+
Predication of bike rental count hourly or daily based on the environmental and seasonal settings.
44+
45+
- Event and Anomaly Detection:
46+
Count of rented bikes are also correlated to some events in the town which easily are traceable via search engines.
47+
For instance, query like "2012-10-30 washington d.c." in Google returns related results to Hurricane Sandy. Some of the important events are
48+
identified in [1]. Therefore the data can be used for validation of anomaly or event detection algorithms as well.
49+
50+
51+
=========================================
52+
Files
53+
=========================================
54+
55+
- Readme.txt
56+
- hour.csv : bike sharing counts aggregated on hourly basis. Records: 17379 hours
57+
- day.csv - bike sharing counts aggregated on daily basis. Records: 731 days
58+
59+
60+
=========================================
61+
Dataset characteristics
62+
=========================================
63+
Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv
64+
65+
- instant: record index
66+
- dteday : date
67+
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
68+
- yr : year (0: 2011, 1:2012)
69+
- mnth : month ( 1 to 12)
70+
- hr : hour (0 to 23)
71+
- holiday : weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
72+
- weekday : day of the week
73+
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
74+
+ weathersit :
75+
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
76+
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
77+
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
78+
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
79+
- temp : Normalized temperature in Celsius. The values are divided to 41 (max)
80+
- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
81+
- hum: Normalized humidity. The values are divided to 100 (max)
82+
- windspeed: Normalized wind speed. The values are divided to 67 (max)
83+
- casual: count of casual users
84+
- registered: count of registered users
85+
- cnt: count of total rental bikes including both casual and registered
86+
87+
=========================================
88+
License
89+
=========================================
90+
Use of this dataset in publications must be cited to the following publication:
91+
92+
[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3.
93+
94+
@article{
95+
year={2013},
96+
issn={2192-6352},
97+
journal={Progress in Artificial Intelligence},
98+
doi={10.1007/s13748-013-0040-3},
99+
title={Event labeling combining ensemble detectors and background knowledge},
100+
url={http://dx.doi.org/10.1007/s13748-013-0040-3},
101+
publisher={Springer Berlin Heidelberg},
102+
keywords={Event labeling; Event detection; Ensemble learning; Background knowledge},
103+
author={Fanaee-T, Hadi and Gama, Joao},
104+
pages={1-15}
105+
}
106+
107+
=========================================
108+
Contact
109+
=========================================
110+
111+
For further information about this dataset please contact Hadi Fanaee-T (hadi.fanaee@fe.up.pt)

0 commit comments

Comments
 (0)