Do you want to visualize missing values in your data? There are plenty amazing methods (check missingno for example) but they all look bulky when your data has too many columns. nafig will help you to build a perfect NA figure!
$ pip install -U nafigor install with Poetry
$ poetry add nafigHere are some examples of the usage both for simulated and real world data. Check this notebook to play with code yourself!
First, let's import the core function and other useful things:
>>> from nafig.plots import na_text_barplot # The core function
>>> from nafig.utils import create_example_data # To simulate data
>>> import pandas as pd # To works with tables>>> df, feature_types = create_example_data()df is just a pandas dataframe with missing values. feature_types is an array, containing data type description for each column. This is just an example, so labels don't correspond to actual data types.
>>> feature_types[:10]
array(['Categorical', 'Categorical', 'Binary', 'Continuous', 'Continuous',
'Continuous', 'Binary', 'Continuous', 'Continuous', 'Binary'],
dtype='<U11')This toy dataframe contains 300 columns. Visualization of missing data with heatmap would unfortunately be too bulky. How to explore missing data distribution in this dataset? Try NA text barplot!
>>> na_text_barplot(df, hue=feature_types, line_height=1.5)Columns of the dataset are binned by percentage of the missing data in them. Colouring by feature types helps to understand, which types of data are missing. On Y-axis you can see the number of features in each group.
You can vary the number of bins using num_bins parameter:
>>> na_text_barplot(df, hue=feature_types, line_height=1.5, num_bins=20)>>> na_text_barplot(df, hue=feature_types, line_height=2, num_bins=2, fig_width=8, font_size=3)Now let's see some real data examples!
Data source: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=train.csv
>>> DATA_PATH = "data/house-prices/train.csv"
>>> house_prices_df = pd.read_csv(DATA_PATH, index_col=0)This is a reasonably good data with most of the values present. But thanks to this plot, we can see, which features are the bad guys!
>>> na_text_barplot(house_prices_df, fig_width=17, num_bins=20, line_height=1.5)Note that if you don't pass the hue parameter, features will be colored by the data type of the column. If you don't want to colorize features at all, set hue to False.
By setting remove_empty_bins to True, you can remove the empty bins. It will require a reader to pay more attention to the X-axis but will save you some space.
>>> na_text_barplot(house_prices_df, fig_width=10, num_bins=20,
line_height=1.5, remove_empty_bins=True)Data source: https://www.kaggle.com/datasets/airbnb/seattle
>>> airbnb_df = pd.read_csv("data/airbnb/listings.csv")This dataset has a bit more missing data. On the plot we can see that all integer features are almost complete, and some object and floating number columns contain missing values
>>> na_text_barplot(airbnb_df, fig_width=18, line_height=1.8, font_size=9, remove_empty_bins=True)Feel free to explore other parameters! There are more to help you create a perfect missing values visualization
- Supports for
Python 3.9and higher. Poetryas the dependencies manager. See configuration inpyproject.tomlandsetup.cfg.- Automatic codestyle with
black,isortandpyupgrade. - Ready-to-use
pre-commithooks with code-formatting. - Type checks with
mypy; docstring checks withdarglint; security checks withsafetyandbandit - Testing with
pytest. - Ready-to-use
.editorconfig,.dockerignore, and.gitignore. You don't have to worry about those things.
GitHubintegration: issue and pr templates.Github Actionswith predefined build workflow as the default CI/CD.- Everything is already set up for security checks, codestyle checks, code formatting, testing, linting, docker builds, etc with
Makefile. More details in makefile-usage. - Dockerfile for your package.
- Always up-to-date dependencies with
@dependabot. You will only enable it. - Automatic drafts of new releases with
Release Drafter. You may see the list of labels inrelease-drafter.yml. Works perfectly with Semantic Versions specification.
Makefile contains a lot of functions for faster development.
1. Download and remove Poetry
To download and install Poetry run:
make poetry-downloadTo uninstall
make poetry-remove2. Install all dependencies and pre-commit hooks
Install requirements:
make installPre-commit hooks coulb be installed after git init via
make pre-commit-install3. Codestyle
Automatic formatting uses pyupgrade, isort and black.
make codestyle
# or use synonym
make formattingCodestyle checks only, without rewriting files:
make check-codestyleNote:
check-codestyleusesisort,blackanddarglintlibrary
Update all dev libraries to the latest version using one comand
make update-dev-deps4. Code security
make check-safetyThis command launches Poetry integrity checks as well as identifies security issues with Safety and Bandit.
make check-safety5. Type checks
Run mypy static type checker
make mypy6. Tests with coverage badges
Run pytest
make test7. All linters
Of course there is a command to rule run all linters in one:
make lintthe same as:
make test && make check-codestyle && make mypy && make check-safety8. Docker
make docker-buildwhich is equivalent to:
make docker-build VERSION=latestRemove docker image with
make docker-removeMore information about docker.
9. Cleanup
Delete pycache files
make pycache-removeRemove package build
make build-removeDelete .DS_STORE files
make dsstore-removeRemove .mypycache
make mypycache-removeOr to remove all above run:
make cleanupYou can see the list of available releases on the GitHub Releases page.
We follow Semantic Versions specification.
We use Release Drafter. As pull requests are merged, a draft release is kept up-to-date listing the changes, ready to publish when youβre ready. With the categories option, you can categorize pull requests in release notes using labels.
| Label | Title in Releases |
|---|---|
enhancement, feature |
π Features |
bug, refactoring, bugfix, fix |
π§ Fixes & Refactoring |
build, ci, testing |
π¦ Build System & CI/CD |
breaking |
π₯ Breaking Changes |
documentation |
π Documentation |
dependencies |
β¬οΈ Dependencies updates |
You can update it in release-drafter.yml.
GitHub creates the bug, enhancement, and documentation labels for you. Dependabot creates the dependencies label. Create the remaining labels on the Issues tab of your GitHub repository, when you need them.
This project is licensed under the terms of the MIT license. See LICENSE for more details.
@misc{nafig,
author = {VladimirShitov},
title = {Package for plotting figures with NA data distribution},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/VladimirShitov/nafig}}
}This project was generated with python-package-template





