mloptimizer is a Python library for optimizing hyperparameters of machine learning algorithms using genetic algorithms. With mloptimizer, you can find the optimal set of hyperparameters for a given machine learning model and dataset, which can significantly improve the performance of the model. The library supports several popular machine learning algorithms, including decision trees, random forests, and gradient boosting classifiers. The genetic algorithm used in mloptimizer provides an efficient and flexible approach to search for the optimal hyperparameters in a large search space.
- Easy to use
- DEAP-based genetic algorithm ready to use with several machine learning algorithms
- Adaptable to use with any machine learning algorithm that complies with the Scikit-Learn API
- Default hyperparameter ranges
- Default score functions for evaluating the performance of the model
- Reproducibility of results
- Early stopping to prevent overfitting
- Population seeding with known good configurations
- Performance tracking (trials count, optimization time)
- Zero file output mode for cleaner workflows (enabled by default)
- Extensible with more machine learning algorithms that comply with the Scikit-Learn API
- Customizable hyperparameter ranges
- Customizable score functions
- Optional mlflow compatibility for tracking the optimization process
- Generation-level MLflow tracking with nested runs
- Responsive Plotly visualizations with WebGL acceleration
- Joblib-based parallelization for better compatibility
It is recommended to create a virtual environment using the venv package.
To learn more about how to use venv,
check out the official Python documentation at
https://docs.python.org/3/library/venv.html.
# Create the virtual environment
python -m venv myenv
# Activate the virtual environment
source myenv/bin/activateTo install mloptimizer, run:
pip install mloptimizerYou can get more information about the package installation at https://pypi.org/project/mloptimizer/.
Here's a simple example of how to optimize hyperparameters in a decision tree classifier using the iris dataset:
from mloptimizer.interfaces import GeneticSearch, HyperparameterSpaceBuilder
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
# 1) Load the dataset and get the features and target
X, y = load_iris(return_X_y=True)
# 2) Define the hyperparameter space (a default space is provided for some algorithms)
hyperparameter_space = HyperparameterSpaceBuilder.get_default_space(DecisionTreeClassifier)
# 3) Create the optimizer and optimize the classifier
opt = GeneticSearch(
estimator_class=DecisionTreeClassifier,
hyperparam_space=hyperparameter_space,
generations=10,
population_size=20,
early_stopping=True, # Stop early if no improvement
patience=3, # Wait 3 generations
cv=5, # 5-fold cross-validation
seed=42 # Reproducibility
)
# 4) Optimize (no files created by default)
opt.fit(X, y)
# Access results
print(f"Best score: {opt.best_estimator_.score(X, y)}")
print(f"Trials evaluated: {opt.n_trials_}")
print(f"Time taken: {opt.optimization_time_:.2f}s")Other algorithms can be used, such as RandomForestClassifier or XGBClassifier which have a
default hyperparameter space defined in the library.
Even if the algorithm is not included in the default hyperparameter space, you can define your own hyperparameter space
following the documentation.
More details in the documentation.
Examples can be found in examples on readthedocs.io.
The following dependencies are used in mloptimizer:
- Deap - Genetic Algorithms
- XGBoost - Gradient boosting framework
- CatBoost - Gradient boosting framework
- LightGBM - Gradient boosting framework
- Scikit-Learn - Machine learning algorithms and utilities
- Plotly - Interactive visualizations
- Seaborn - Statistical visualizations
- joblib - Parallel processing
- tqdm - Progress bars
Optional:
The documentation for mloptimizer can be found in the project's wiki
with examples, classes and methods reference.
This project is licensed under the MIT License.
