All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Extend tensor conversion to numpy arrays to work with more device types (#1132)
- Add sklearn metadata routing support:
NeuralNetis now a metadata router + consumer, enablinggroupsand other metadata to flow throughPipeline/GridSearchCV(#1139)
- Compatibility with sklearn v1.8:
__sklearn_is_fitted__returns a boolean (#1128) - Compatibility with sklearn v1.8:
SkorchDoctoris now an sklearnBaseEstimatorinstance (#1128)
- Implement
__sklearn_is_fitted__for skorch models, following sklearn custom model protocol (#1119)
- Add Contributing Guidelines for skorch. (#1097)
- Add an example of hyper-parameter optimization using Optuna here (#1098)
- Add Example for Streaming Dataset(#1105)
- Add pyproject.toml to Improve CI/CD and Tooling (#1108)
- Loading of skorch nets using pickle: When unpickling a skorch net, you may come across a PyTorch warning that goes: "FutureWarning: You are using torch.load with weights_only=False [...]"; to avoid this warning, pickle the net again and use the new pickle file (#1092)
- Added a notebook that shows how to use Learning Rate Scheduler in skorch.(#1074)
- All neural net classes now inherit from sklearn's
BaseEstimator. This is to support compatibility with sklearn 1.6.0 and above. Classification models additionally inherit fromClassifierMixinand regressors fromRegressorMixin. (#1078) - When using the
ReduceLROnPlateaulearning rate scheduler, we now record the learning rate in the net history (net.history[:, 'event_lr']by default). It is now also possible to to step per batch, not only by epoch (#1075) - The learning rate scheduler
.simulate()method now supports adding step args which is useful when simulation policies such asReduceLROnPlateauwhich expect metrics to base their schedule on. (#1077) - Removed deprecated
skorch.callbacks.scoring.cache_net_infer(#1088)
- Fix an issue with using
NeuralNetBinaryClassifierwithtorch.compile(#1058)
1.0.0 - 2024-05-27
The 1.0.0 release of skorch is here. We think that skorch is at a very stable point, which is why a 1.0.0 release is appropriate. There are no plans to add any breaking changes or major revisions in the future. Instead, our focus now is to keep skorch up-to-date with the latest versions of PyTorch and scikit-learn, and to fix any bugs that may arise.
0.15.0 - 2023-09-04
- Add the option to globally override the use of caching in scoring callbacks on the net by setting the
use_cachingargument on the net (this overrides the settings of individual callbacks) (#971) - Add support for saving and loading parameters with safetensors; use
net.save_params(..., use_safetensors=True)andnet.load_params(..., use_safetensors=True)(requires to install thesafetensorslibrary) (#970)
- Nets pickled with skorch version 0.11 can no longer be loaded in version 0.15 (see #880); to transition these nets, pickle them in a skorch version between 0.12 and 0.14, then load them in 0.15
- Fixed a couple of issues when saving and loading parameters while using accelerate (via
AccelerateMixin) in a multi-GPU setting, and some other minor accelerate issues (#1008, #1009) - Installing skorch with the
[testing]option now installs all dev requirements (#1015)
0.14.0 - 2023-06-24
- Add version logging to
NeptuneLoggercallback (#964) - Add support for zero-shot and few-shot classification with the help of Large Language Models and the Hugging Face transformers library
-
Moved from
pkg_resourcestoimportliband subsequently dropping support for Python 3.7 as PyTorch moved dropped support and the version itself hit EOL (#928 and #983) -
NeuralNetRegressorcan now be fitted with 1-dimensionaly, which is necessary in some specific circumstances (e.g. in conjunction with sklearn'sBaggingRegressor, see #972); for this to work correctly, the output of the of the PyTorch module should also be 1-dimensional; the existing default, i.e. havingyandy_predbe 2-dimensional, remains the recommended way of usingNeuralNetRegressor
0.13.0 - 2023-05-17
- Add support for compiled PyTorch modules using the
torch.compilefunction, introduced in PyTorch 2.0 release, which can greatly improve performance on new GPU architectures; to use it, initialize your net with thecompile=Trueargument, further compilation arguments can be specified using the dunder notation, e.g.compile__dynamic=True - Add a class
DistributedHistorywhich should be used when training in a multi GPU setting (#955) SkorchDoctor: A helper class that assists in understanding and debugging the neural net training, see this notebook (#912)- When using
AccelerateMixin, it is now possible to prevent unwrapping of the modules by settingunwrap_after_train=True(#963)
- Fixed install command to work with recent changes in Google Colab (#928)
- Fixed a couple of bugs related to using non-default modules and criteria (#927)
- Fixed a bug when using
AccelerateMixinin a multi-GPU setup (#947) _get_param_namesreturns a list instead of a generator so that subsequent error messages return useful information instead of a generatorreprstring (#925)- Fixed a bug that caused modules to not be sufficiently unwrapped at the end of training when using
AccelerateMixin, which could prevent them from being pickleable (#963)
0.12.1 - 2022-11-18
NeptuneLoggerwas updated to work with recent versions of Neptune client (v0.14.3 or higher); it now logs some additional data, including the model summary, configuration, and learning rate (when available) (#906)
- Fixed an error that could occur with specific combinations of gpytorch and PyTorch versions (#913)
0.12.0 - 2022-10-07
- Added
load_bestattribute toEarlyStoppingcallback to automatically load module weights of the best result at the end of training - Added a method,
trim_for_prediction, on the net classes, which trims the net from everything not required for using it for prediction; call this after fitting to reduce the size of the net - Added experimental support for huggingface accelerate; use the provided mixin class to add advanced training capabilities provided by the accelerate library to skorch
- Add integration for Huggingface tokenizers; use
skorch.hf.HuggingfaceTokenizerto train a Huggingface tokenizer on your custom data; useskorch.hf.HuggingfacePretrainedTokenizerto load a pre-trained Huggingface tokenizer - Added support for creating model checkpoints on Hugging Face Hub using
HfHubStorage - Added a notebook that shows how to use skorch with PyTorch Geometric (#863)
- The minimum required scikit-learn version has been bumped to 0.22.0
- Initialize data loaders for training and validation dataset once per fit call instead of once per epoch (migration guide)
- It is now possible to call
np.asarraywithSliceDatasets (#858)
- Fixed a bug in
SliceDatasetthat prevented it to be used withto_numpy(#858) - Fixed a bug that occurred when loading a net that has device set to None (#876)
- Fixed a bug that in some cases could prevent loading a net that was trained with CUDA without CUDA
- Enable skorch to work on M1/M2 Apple MacBooks (#884)
0.11.0 - 2021-10-11
- Added
load_bestattribute toCheckpointcallback to automatically load state of the best result at the end of training - Added a
get_all_learnable_paramsmethod to retrieve the named parameters of all PyTorch modules defined on the net, including of criteria if applicable - Added
MlflowLoggercallback for logging to Mlflow (#769) - Added
InputShapeSettercallback for automatically setting the input dimension of the PyTorch module - Added a new module to support Gaussian Processes through GPyTorch. To learn more about it, read the GP documentation or take a look at the GP notebook. This feature is experimental, i.e. the API could be changed in the future in a backwards incompatible way (#782)
- Changed the signature of
validation_step,train_step_single,train_step,evaluation_step,on_batch_begin, andon_batch_endsuch that instead of receivingXandy, they receive the whole batch; this makes it easier to deal with datasets that don't strictly return an(X, y)tuple, which is true for quite a few PyTorch datasets; please refer to the migration guide if you encounter problems (#699) - Checking of arguments to
NeuralNetis now during.initialize(), not during__init__, to avoid raising false positives for yet unknown module or optimizer attributes - Modules, criteria, and optimizers that are added to a net by the user are now first class: skorch takes care of setting train/eval mode, moving to the indicated device, and updating all learnable parameters during training (check the docs for more details, #751)
CVSplitis renamed toValidSplitto avoid confusion (#752)
- Fixed a few bugs in the
net.historyimplementation (#776) - Fixed a bug in
TrainEndCheckpointthat prevented it from being unpickled (#773)
0.10.0 - 2021-03-23
- Added
SacredLoggercallback for logging to Sacred (#725) - CLI helper function now also supports normal (i.e. non-skorch) sklearn estimators
- Disabling all callbacks is now supported (which allows reducing overhead, which is especially relevant for small models).
LRSchedulernow correctly passes the value being monitored toReduceLROnPlateau. (#738)
- We no longer pass the
epochparameter to LR schedulers, since that parameter has been deprecated. We now rely on the scheduler to keep track of the epoch itself. - Changed implementation of
net.historyaccess to make it faster; this should result in a nice speedup when dealing with very small model/data but otherwise not have any noticeable effects; if you encounter bugs, though, please create an issue
0.9.0 - 2020-08-30
- Added the
event_nameargument forLRSchedulerfor optional recording of LR changes insidenet.history. NOTE: Supported only in Pytorch>=1.4 - Make it easier to add custom modules or optimizers to a neural net class by automatically registering them where necessary and by making them available to set_params
- Added the
step_everyargument forLRSchedulerto set whether the scheduler step should be taken on every epoch or on every batch. - Added the
scoringmodule withloss_scoringfunction, which computes the net's loss (usingget_loss) on provided input data. - Added a parameter
predict_nonlinearitytoNeuralNetwhich allows users to control the nonlinearity to be applied to the module output when callingpredictandpredict_proba(#637, #661) - Added the possibility to save the criterion with
save_paramsand with checkpoint callbacks - Added the possibility to save custom modules with
save_paramsand with checkpoint callbacks
- Removed support for schedulers with a
batch_step()method inLRScheduler. - Raise
FutureWarninginCVSplitwhenrandom_stateis not used. Will raise an exception in a future (#620) - The behavior of method
net.get_paramschanged to make it more consistent with sklearn: it will no longer return "learned" attributes likemodule_; therefore, functions likesklearn.base.clone, when called with a fitted net, will no longer return a fitted net but instead an uninitialized net; if you want a copy of a fitted net, usecopy.deepcopyinstead;net.get_paramsis used under the hood by many sklearn functions and classes, such asGridSearchCV, whose behavior may thus be affected by the change. (#521, #527) - Raise
FutureWarningwhen usingCyclicLRscheduler, because the default behavior has changed from taking a step every batch to taking a step every epoch. (#626) - Set train/validation on criterion if it's a PyTorch module (#621)
- Don't pass
y=NonetoNeuralNet.train_splitto enable the direct use of split functions without positionalyin their signatures. This is useful when working with unsupervised data (#605). to_numpyis now able to unpack dicts and lists/tuples (#657, #658)- When using
CrossEntropyLoss, softmax is now automatically applied to the output when callingpredictorpredict_proba
- Fixed a bug where
CyclicLRscheduler would update during both training and validation rather than just during training. - Fixed a bug introduced by moving the
optimizer.zero_grad()call outside of the train step function, making it incompatible with LBFGS and other optimizers that call the train step several times per batch (#636) - Fixed pickling of the
ProgressBarcallback (#656)
0.8.0 - 2020-04-11
- Added
NeptuneLoggercallback for logging experiment metadata to neptune.ai (#586) - Add
DataFrameTransformer, an sklearn compatible transformer that helps working with pandas DataFrames by transforming the DataFrame into a representation that works well with neural networks (#507) - Added
WandbLoggercallback for logging to Weights & Biases (#607) - Added
Noneoption todevicewhich leaves the device(s) unmodified (#600) - Add
PassthroughScoring, a scoring callback that just calculates the average score of a metric determined at batch level and then writes it to the epoch level (#595)
- When using caching in scoring callbacks, no longer uselessly iterate over the data; this can save time if iteration is slow (#552, #557)
- Cleaned up duplicate code in the
fit_loop(#564)
- WARNING: In release 0.10.0 of skorch, Python 3.5 support will be officially dropped (#634)
- Make skorch compatible with sklearn 0.22 (#571, #573, #575)
- Fixed a bug that could occur when a new "settable" (via
set_params) attribute was added toNeuralNetwhose name starts the same as an existing attribute's name (#590)
0.7.0 - 2019-11-29
- More careful check for wrong parameter names being passed to
NeuralNet(#500) - More helpful error messages when trying to predict using an uninitialized model
- Add
TensorBoardcallback for automatic logging to tensorboard - Make
NeuralNetBinaryClassifierwork withsklearn.calibration.CalibratedClassifierCV - Improve
NeuralNetBinaryClassifiercompatibility with certain sklearn metrics (#515) NeuralNetBinaryClassifierautomatically squeezes module output if necessary (#515)NeuralNetClassifiernow has aclasses_attribute after fit is called, which is inferred from y by default (#465, #486)NeuralNet.load_paramswith a checkpoint now initializes when needed (#497)
- Improve numerical stability when using
NLLLossinNeuralNetClassifer(#491) - Refactor code to make gradient accumulation easier to implement (#506)
NeuralNetBinaryClassifier.predict_probanow returns a 2-dim array; to access the "old"y_proba, takey_proba[:, 1](#515)net.historyis now a property that accessesnet.history_, which stores theHistoryobject (#527)- Remove deprecated
skorch.callbacks.CyclicLR, usetorch.optim.lr_scheduler.CyclicLRinstead
- WARNING: In a future release, the behavior of method
net.get_paramswill change to make it more consistent with sklearn: it will no longer return "learned" attributes likemodule_. Therefore, functions likesklearn.base.clone, when called with a fitted net, will no longer return a fitted net but instead an uninitialized net. If you want a copy of a fitted net, usecopy.deepcopyinstead. Note thatnet.get_paramsis used under the hood by many sklearn functions and classes, such asGridSearchCV, whose behavior may thus be affected by the change. (#521, #527)
- Fixed a bug that caused
LoadInitStatenot to work withTrainEndCheckpoint(#528) - Fixed
NeuralNetBinaryClassifierwrongly squeezing the batch dimension when usingbatch_size = 1(#558)
0.6.0 - 2019-07-19
- Adds FAQ entry regarding the initialization behavior of
NeuralNetwhen passed instantiated models. (#409) - Added CUDA pickle test including an artifact that supports testing on CUDA-less CI machines
- Adds
train_batch_countandvalid_batch_countto history in training loop. (#445) - Adds score method for NeuralNetClassifier, NeuralNetBinaryClassifier, and NeuralNetRegressor (#469)
- Wrapper class for torch Datasets to make them work with some sklearn features (e.g. grid search). (#443)
- Repository moved to https://github.com/skorch-dev/skorch/, please change your git remotes
- Treat cuda dependent attributes as prefix to cover values set using
set_paramssince previously"criterion_"would not matchnet.criterion__weightas set bynet.set_params(criterion__weight=w) - skorch pickle format changed in order to improve CUDA compatibility, if you have pickled models, please re-pickle them to be able to load them in the future
net.criterion_and its parameters are now moved to target device when using criteria that inherit fromtorch.nn.Module. Previously the user had to make sure that parameters such as class weight are on the compute device- skorch now assumes PyTorch >= 1.1.0. This mainly affects learning rate schedulers, whose inner workings have been changed with version 1.1.0. This update will also invalidate pickled skorch models after a change introduced in PyTorch optimizers.
- Include requirements in MANIFEST.in
- Add
criterion_toNeuralNet.cuda_dependent_attributes_to avoid issues with criterion weight tensors from, e.g.,NLLLoss(#426) TrainEndCheckpointcan be cloned bysklearn.base.clone. (#459)
0.5.0 - 2018-12-13
- Basic usage notebook now runs on Google Colab
- Advanced usage notebook now runs on Google Colab
- MNIST with scikit-learn and skorch now runs on Google Colab
- Better user-facing messages when module or optimizer are re-initialized
- Added an experimental API (
net._register_virtual_param) to register "virtual" parameters on the network with custom setter functions. (#369) - Setting parameters
lr,momentum,optimizer__lr, etc. no longer resets the optmizer. As of now you can donet.set_params(lr=0.03)ornet.set_params(optimizer__param_group__0__momentum=0.86)without triggering a re-initialization of the optimizer (#369) - Support for scipy sparse CSR matrices as input (as, e.g., returned by sklearn's
CountVectorizer); note that they are cast to dense matrices during batching - Helper functions to build command line interfaces with almost no boilerplate, example that shows usage
- Reduce overhead of
BatchScoringwhen usingtrain_loss_scoreorvalid_loss_scoreby skipping superfluous inference step (#381) - The
on_grad_computedcallback function will yield an iterable fornamed_parametersonly when it is used to reduce the run-time overhead of the call (#379) - Default
fn_prefixinTrainEndCheckpointis nowtrain_end_(#391) - Issues a warning when
Checkpoints'smonitorparameter is set tomonitorand the history contains<monitor>_best. (#399)
- Re-initialize optimizer when
set_paramsis called withlrargument (#372) - Copying a
SliceDictnow returns aSliceDictinstead of adict(#388) - Calling
==onSliceDicts now works as expected when values are numpy arrays and torch tensors
0.4.0 - 2018-10-24
- Support for PyTorch 0.4.1
- There is no need to explicitly name callbacks anymore (names are assigned automatically, name conflicts are resolved).
- You can now access the training data in the
on_grad_computedevent - There is a new image segmentation example
- Easily create toy network instances for quick experiments using
skorch.toy.make_classifierand friends - New
ParamMappercallback to modify/freeze/unfreeze parameters at certain point in time during training:
>>> from sklearn.callbacks import Freezer, Unfreezer
>>> net = Net(module, callbacks=[Freezer('layer*.weight'), Unfreezer('layer*.weight', at=10)])- Refactored
EpochScoringfor easier sub-classing Checkpointcallback now supports saving the optimizer, this avoids problems with stateful optimizers such asAdamorRMSprop(#360)- Added
LoadInitStatecallback for easy continued training from checkpoints (#360) NeuralNetwork.load_paramsnow supports loading fromCheckpointinstances- Added documentation for saving and loading
- The
ProgressBarcallback now determines the batches per epoch automatically by default (batches_per_epoch=auto) - The
on_grad_computedevent now has access to the current training data batch
- Deprecated
filtered_optimizerin favor ofFreezercallback (#346) NeuralNet.load_paramsandNeuralNet.save_paramsdeprecatefparameter for the sake off_optimizer,f_paramsandf_history(#360)
uses_placeholder_yshould not require existence ofyfield (#311)- LR scheduler creates
batch_idxon first run (#314) - Use
OrderedDictfor callbacks to fix python 3.5 compatibility issues (#331) - Make
to_tensorwork correctly withPackedSequence(#335) - Rewrite
Historyto not use any recursion to avoid memory leaks during exceptions (#312) - Use
flakyin some neural network tests to hide platform differences - Fixes ReduceLROnPlateau when mode == max (#363)
- Fix disconnected weights between net and optimizer after copying the net with
copy.deepcopy(#318) - Fix a bug that intefered with loading CUDA models when the model was a CUDA tensor but the net was configured to use the CPU (#354, #358)