Skip to content

[WIP] Gensim docker container#1368

Merged
menshikh-iv merged 21 commits into
piskvorky:developfrom
parulsethi:gensim_docker
Jun 30, 2017
Merged

[WIP] Gensim docker container#1368
menshikh-iv merged 21 commits into
piskvorky:developfrom
parulsethi:gensim_docker

Conversation

@parulsethi
Copy link
Copy Markdown
Contributor

@parulsethi parulsethi commented May 27, 2017

TODO:

  • Run gensim tests in docker container

Comment thread docker/Dockerfile
# Installs python, pip and setup tools (with fixed versions)
RUN apt-get update \
&& apt-get install -y \
ant=1.9.6-1ubuntu1 \
Copy link
Copy Markdown
Contributor Author

@parulsethi parulsethi May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmylk @menshikh-iv Should I remove version pinning from here also?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fixation is not required, then, of course, you can remove it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning versions in a container is actually very good. It guarantees that it keeps running.

Just don't pin the gensim version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please keep the pins here

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, the fixation is needed due best practices with docker. Without these versions fixed the container can broke when a components changes it versions and when someone try to get it from a docker pull.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: In my previous PR I forgot to add a modification in dockerfile that fix some dependencies for git dependencies to avoid this problems.

I was following the docker contribution guidelines to turn this image into an official one in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @danielbdias, I've kept the pinned versions which were already there in your PR and will add version pinning for rest of the packages which I added later.

@parulsethi parulsethi changed the title Gensim docker container [WIP] Gensim docker container May 27, 2017
Copy link
Copy Markdown
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container is a very needed feature, just left some comments to make things more clear

Comment thread docker/Dockerfile Outdated
&& mv ./gensim-$GENSIM_VERSION/* /gensim \
&& rm -rf /gensim/download \
&& cd /gensim \
&& python setup.py install
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please checkout master branch first

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will already download the package from branch which is specified in GENSIM_VERSION variable (and I use a branch on my personal fork for now which has the docker folder).

Comment thread docker/Dockerfile Outdated
ENV WR_HOME gensim_dependencies/wordrank
ENV MALLET_HOME gensim_dependencies/mallet
ENV DTM_PATH gensim_dependencies/dtm/bin/dtm-linux64
ENV VOWPAL_WABBIT_PATH gensim_dependencies/vowpal_wabbit/vowpalwabbit/vw
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add varembed path in order to run varembed tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to test_varembed_wrapper.py varembed path is not required. It only tests on these test_data files.

Comment thread docker/check_fast_version.py Outdated
import sys

try:
from gensim.models.word2vec_inner import FAST_VERSION, MAX_WORDS_IN_BATCH
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for max_words_in_batch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest commit

@menshikh-iv
Copy link
Copy Markdown
Contributor

@parulsethi What's a status of PR? Have you any problems with container?

@parulsethi
Copy link
Copy Markdown
Contributor Author

Update: Proper installation of wordrank dependencies is left now (need to install OpenMPI with multithreading enabled).
All other gensim tests (except wordrank) passes.

Comment thread docker/Dockerfile

ENV GENSIM_REPOSITORY https://github.com/parulsethi/gensim/archive
ENV GENSIM_BRANCH gensim_docker

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add gensim version as env variable

Comment thread docker/Dockerfile Outdated
&& python3 setup.py install

# Set ENV variables for wrappers
ENV FT_HOME gensim_dependencies/fastText
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace relative to absolute (for all *_PATH and *_HOME), it's very important.


try:
from gensim.models.word2vec_inner import FAST_VERSION

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add additional info for this script (like python version, numpy/scipy/gensim version) and create alias in docker for this script

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does run from here in dockerfile with both python 2 and 3. And only the pinned versions of numpy/scipy/gensim installed in the container are used

Comment thread docker/docker-compose.yml Outdated
@@ -0,0 +1,7 @@
version: '2'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need docker-compose file?

Comment thread docker/wordrank_install.sh Outdated
@@ -0,0 +1,19 @@
#!/bin/bash

printf "1. clean up workspace\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add this script here? This script from wordrank repo.

Comment thread docker/Dockerfile

# Install fastText
RUN cd /gensim/gensim_dependencies \
&& git clone https://github.com/facebookresearch/fastText.git \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pin version for FastText/Wordrank/etc (you can use commit hash or version)

Comment thread docker/Dockerfile Outdated
&& cd /gensim/gensim_dependencies/fastText \
&& make

# Install WordRank
Copy link
Copy Markdown
Contributor

@menshikh-iv menshikh-iv Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment all things connected with wordrank in dockerfile (ompi problem)

Comment thread docker/README.md Outdated
python2 setup.py test
```

To push the image to docker hub:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needed this block (about push to dockerhub)

Comment thread docker/README.md Outdated
docker push [my_user]/gensim
```

# Run gensim image from anywhere
Copy link
Copy Markdown
Contributor

@menshikh-iv menshikh-iv Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace to Run ipython notebook with installed gensim

@menshikh-iv
Copy link
Copy Markdown
Contributor

Nice feature, thank you @parulsethi 🥇

@menshikh-iv menshikh-iv merged commit bd6db9a into piskvorky:develop Jun 30, 2017
saparina pushed a commit to saparina/gensim that referenced this pull request Jul 9, 2017
* added dockerfile

* remove fasttext from pip installs

* remove syntax errors

* remove unused imports

* modified dockerfile

* add subversion, locales

* use both python2 and python3

* upgrade numpy version

* add readme with relevant commands

* add fixed versions for wrapper dependencies

* made requested changes

* update readme

* change vw pin and remove docker-yml

* change vw version and make absolute paths for wrappers

* specify original gensim repo for download

* change maintainer

* correct missing slash

* use git clone for gensim

* correct gensim folder sequences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants