Skip to content

Commit 7ec1414

Browse files
Copilotthinkall
andauthored
Clarify period parameter and automatic label lagging in time series forecasting (#1495)
* Initial plan * Add comprehensive documentation for period parameter and automatic label lagging Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Address code review feedback on docstring clarity Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Clarify period vs prediction output length per @thinkall's feedback Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Refine terminology per code review feedback Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Run pre-commit formatting fixes Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> Co-authored-by: Li Jiang <bnujli@gmail.com>
1 parent 9233a52 commit 7ec1414

3 files changed

Lines changed: 59 additions & 7 deletions

File tree

flaml/automl/automl.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1013,6 +1013,14 @@ def retrain_from_log(
10131013
the searched learners, such as sample_weight. Below are a few examples of
10141014
estimator-specific parameters:
10151015
period: int | forecast horizon for all time series forecast tasks.
1016+
This is the number of time steps ahead to forecast (e.g., period=12 means
1017+
forecasting 12 steps into the future). This represents the forecast horizon
1018+
used during model training. Note: during prediction, the output length
1019+
equals the length of X_test. FLAML automatically handles feature
1020+
engineering for you - sklearn-based models (lgbm, rf, xgboost, etc.) will have
1021+
lagged features created automatically, while time series native models (prophet,
1022+
arima, sarimax) use their built-in forecasting capabilities. You do NOT need
1023+
to manually create lagged features of the target variable.
10161024
gpu_per_trial: float, default = 0 | A float of the number of gpus per trial,
10171025
only used by TransformersEstimator, XGBoostSklearnEstimator, and
10181026
TemporalFusionTransformerEstimator.
@@ -2107,6 +2115,14 @@ def cv_score_agg_func(val_loss_folds, log_metrics_folds):
21072115
the searched learners, such as sample_weight. Below are a few examples of
21082116
estimator-specific parameters:
21092117
period: int | forecast horizon for all time series forecast tasks.
2118+
This is the number of time steps ahead to forecast (e.g., period=12 means
2119+
forecasting 12 steps into the future). This represents the forecast horizon
2120+
used during model training. Note: during prediction, the output length
2121+
equals the length of X_test. FLAML automatically handles feature
2122+
engineering for you - sklearn-based models (lgbm, rf, xgboost, etc.) will have
2123+
lagged features created automatically, while time series native models (prophet,
2124+
arima, sarimax) use their built-in forecasting capabilities. You do NOT need
2125+
to manually create lagged features of the target variable.
21102126
gpu_per_trial: float, default = 0 | A float of the number of gpus per trial,
21112127
only used by TransformersEstimator, XGBoostSklearnEstimator, and
21122128
TemporalFusionTransformerEstimator.

flaml/automl/time_series/sklearn.py

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,24 +17,30 @@ class PD:
1717

1818

1919
def make_lag_features(X: pd.DataFrame, y: pd.Series, lags: int):
20-
"""Transform input data X, y into autoregressive form - shift
21-
them appropriately based on horizon and create `lags` columns.
20+
"""Transform input data X, y into autoregressive form by creating `lags` columns.
21+
22+
This function is called automatically by FLAML during the training process
23+
to convert time series data into a format suitable for sklearn-based regression
24+
models (e.g., lgbm, rf, xgboost). Users do NOT need to manually call this function
25+
or create lagged features themselves.
2226
2327
Parameters
2428
----------
2529
X : pandas.DataFrame
26-
Input features.
30+
Input feature DataFrame, which may contain temporal features and/or exogenous variables.
2731
2832
y : array_like, (1d)
29-
Target vector.
33+
Target vector (time series values to forecast).
3034
31-
horizon : int
32-
length of X for `predict` method
35+
lags : int
36+
Number of lagged time steps to use as features.
3337
3438
Returns
3539
-------
3640
pandas.DataFrame
37-
shifted dataframe with `lags` columns
41+
Shifted dataframe with `lags` columns for each original feature.
42+
The target variable y is also lagged to prevent data leakage
43+
(i.e., we use y(t-1), y(t-2), ..., y(t-lags) to predict y(t)).
3844
"""
3945
lag_features = []
4046

@@ -55,6 +61,17 @@ def make_lag_features(X: pd.DataFrame, y: pd.Series, lags: int):
5561

5662

5763
class SklearnWrapper:
64+
"""Wrapper class for using sklearn-based models for time series forecasting.
65+
66+
This wrapper automatically handles the transformation of time series data into
67+
a supervised learning format by creating lagged features. It trains separate
68+
models for each step in the forecast horizon.
69+
70+
Users typically don't interact with this class directly - it's used internally
71+
by FLAML when sklearn-based estimators (lgbm, rf, xgboost, etc.) are selected
72+
for time series forecasting tasks.
73+
"""
74+
5875
def __init__(
5976
self,
6077
model_class: type,

website/docs/Examples/AutoML-Time series forecast.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,25 @@ Install the [automl,ts_forecast] option.
88
pip install "flaml[automl,ts_forecast]"
99
```
1010

11+
### Understanding the `period` Parameter
12+
13+
The `period` parameter (also called **horizon** in the code) specifies the **forecast horizon** - the number of future time steps the model is trained to predict. For example:
14+
15+
- `period=12` means you want to forecast 12 time steps ahead (e.g., 12 months, 12 days)
16+
- `period=7` means you want to forecast 7 time steps ahead
17+
18+
**Important Note on Prediction**: During the prediction stage, the output length equals the length of `X_test`. This means you can generate predictions for any number of time steps by providing the corresponding timestamps in `X_test`, regardless of the `period` value used during training.
19+
20+
#### Automatic Feature Engineering
21+
22+
**Important**: You do NOT need to manually lag the target variable before training. FLAML handles this automatically:
23+
24+
- **For sklearn-based models** (lgbm, rf, xgboost, extra_tree, catboost): FLAML automatically creates lagged features of both the target variable and any exogenous variables. This transforms the time series forecasting problem into a supervised learning regression problem.
25+
26+
- **For time series native models** (prophet, arima, sarimax, holt-winters): These models have built-in time series forecasting capabilities and handle temporal dependencies natively.
27+
28+
The automatic lagging is implemented internally when you call `automl.fit()` with `task="ts_forecast"` or `task="ts_forecast_classification"`, so you can focus on providing clean input data without worrying about feature engineering.
29+
1130
### Simple NumPy Example
1231

1332
```python

0 commit comments

Comments
 (0)