Modifications to Pareto GAN

Modifications are made to the original Pareto GAN code which supports this paper.

Pareto GAN shows that a GAN cannot generate data with heavier tails than the latent space, which generally follows a light-tailed distribution (such as Gaussian). This paper (and many others) make changes to the architecture and training procedures of GANs to improve training on heavy-tailed data.

This repo investigates whether a simpler approach can achieve similar results. Specifically, transforming training data to a Gaussian distribution before training as normal, then transforming back to the original distribution. We call this method cauchy2gaussian. If this approach is acceptable, it avoids the need for more complex modifications to GANs, which have generally been developed for Gaussian data.

Results

Histograms left to right: normal, pareto, cauchy2gaussian

Result metrics are shown in the table below. The KS statistic measures the maximum distance between the empirical cumulative distribution functions of the generated and real data. Lower is better. The area function computes the area between the log-log complementary cumulative distribution functions (CCDFs). This metric is particularly sensitive to tail behavior in heavy-tailed distributions. Lower is better.

Experiment	KS Statistic (↓)	Log-Log Area (↓)
`normal`	0.0332	52.35
`pareto`	0.0123	1.97
`cauchy2gaussian`	0.0059	2.07

Even though the training data is a mixture of Cauchy distributions, we find that fitting another Cauchy distribution to the mixture provides adequate results. This indicates that the method may still be suitable for data composed of a mixture of unknown heavy-tailed distributions, provided the fitted distribution provides a reasonably good approximation.

Pareto GAN

Install dependencies

pip install torch numpy matplotlib pandas scipy

Note: we recommend installing torch with GPU support

Run an experiment

python exps.py -ds 0 -type normal -seed 1000
python exps.py -ds 0 -type pareto -seed 1000
python exps.py -ds 0 -type cauchy2gaussian -seed 1000

Options

GAN type (-type):

pareto
uniform
normal
lognormal

Dataset (-ds):

0: Dual Cauchy

Note: real datasets may not be available anymore. Dual Cauchy is a good "dataset" to illustrate the concept.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
exps.py		exps.py
mmd.py		mmd.py
models.py		models.py
tail_estimation.py		tail_estimation.py
tail_shape.py		tail_shape.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modifications to Pareto GAN

Results

Pareto GAN

Install dependencies

Run an experiment

Options

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modifications to Pareto GAN

Results

Pareto GAN

Install dependencies

Run an experiment

Options

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages