Skip to content

[MRG] Added Jensen Shannon metric and dendrogram visualization#1484

Merged
menshikh-iv merged 21 commits into
piskvorky:developfrom
parulsethi:jenson_shannon
Aug 23, 2017
Merged

[MRG] Added Jensen Shannon metric and dendrogram visualization#1484
menshikh-iv merged 21 commits into
piskvorky:developfrom
parulsethi:jenson_shannon

Conversation

@parulsethi
Copy link
Copy Markdown
Contributor

@parulsethi parulsethi commented Jul 13, 2017

Adds Jensen-Shannon distance metric and topic dendrogram-heatmap visualization notebook.

@piskvorky
Copy link
Copy Markdown
Owner

Hanging indent everywhere please -- vertical indent makes the code hard to read and hard to maintain.

@piskvorky
Copy link
Copy Markdown
Owner

@parulsethi It's Jensen-Shannon (after Johan Jensen and Claude Shannon).

@parulsethi
Copy link
Copy Markdown
Contributor Author

@piskvorky Can you please point to which line should be changed to hanging indent? As if i understand correctly hanging/vertical indent is used in case of long arguments? and there weren't any in this PR so I didn't use any line breaks in code

@piskvorky
Copy link
Copy Markdown
Owner

Unfortunately Github says the notebook is too large and its diff cannot be displayed, so I cannot review directly. But the indentation is broken around annotation_html =, update( and a few other places.

There are also a few minor code style inconsistencies, like spaces around , or :. @menshikh-iv will direct you :)

@menshikh-iv is there a way to somehow review the notebook? I'd like to point out some constructs which, while not wrong, could be improved a little for better readability (like map(int(), to be more "Pythonic".

@piskvorky piskvorky changed the title Added Jenson shannon metric [WIP] Added Jensen Shannon metric Jul 24, 2017
@parulsethi
Copy link
Copy Markdown
Contributor Author

@piskvorky I'll remove the cell outputs once the notebook is complete. It will then have a much smaller diff containing only the input code cells which could be displayed in Github for code review.

@piskvorky
Copy link
Copy Markdown
Owner

piskvorky commented Aug 3, 2017

@parulsethi let's fix that Jenson everywhere, before it propagates too far through the code/notebooks.

@parulsethi
Copy link
Copy Markdown
Contributor Author

I've removed the plotly code and cell outputs for smaller diff on github, will add it back after the code review of notebook is complete. If needed, the output visualization can be seen here in a previous commit.

@piskvorky corrected the name in a notebook comment it was being used in. Elsewhere it's the function name.

@piskvorky
Copy link
Copy Markdown
Owner

piskvorky commented Aug 4, 2017

The name is incorrect and needs to be fixed, that's what I mean.

@parulsethi
Copy link
Copy Markdown
Contributor Author

parulsethi commented Aug 4, 2017

Oh, the spelling. Sorry, misunderstood earlier

@menshikh-iv
Copy link
Copy Markdown
Contributor

@piskvorky About notebook - nbdime good tool for view diffs between notebooks, but this tool doesn't integrate with github.

Also, we can use nbconvert, but this again does not suit us. Probably it makes sense to talk with tech GitHub support and discuss their plans to improve the work with notebooks.

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"outputs": [],
"source": [
"from gensim.models.ldamodel import LdaModel\n",
"from gensim.corpora import Dictionary, MmCorpus\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports: gensim.corpora.MmCorpus, scipy.spatial.distance.squareform, plotly.figure_factory

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"texts = []\n",
"for line in df_fake.text:\n",
" lowered = line.lower()\n",
" words = re.findall(r'\\w+', lowered, flags = re.UNICODE | re.LOCALE)\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary spaces flags = re.UNICODE | re.LOCALE

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"source": [
"from gensim.matutils import jensen_shannon\n",
"\n",
"from random import sample\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports: random.sample, scipy

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"\n",
"# Plot dendrogram\n",
"dendro = create_dendrogram(topic_dist, distfun=js_dist, labels=range(1, 36), annotation=annotation)\n",
"dendro['layout'].update({'width':1000, 'height':600})\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces after :

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"annotation = text_annotation(topic_dist, topic_terms, n_ann_terms)\n",
"\n",
"# Initialize figure by creating upper dendrogram\n",
"figure = create_dendrogram(topic_dist, distfun=js_dist, labels = range(1, 36), annotation=annotation)\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no needed spaces labels = range(1, 36)

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"# Add Heatmap Data to Figure\n",
"figure['data'].extend(heatmap)\n",
"\n",
"dendro_leaves = [x+1 for x in dendro_leaves]\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing whitespace x+1

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"dendro_leaves = [x+1 for x in dendro_leaves]\n",
"\n",
"# Edit Layout\n",
"figure['layout'].update({'width':800, 'height':800,\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing spaces after : (and below)

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
" 'showline': False,\n",
" \"showticklabels\": True, \n",
" \"tickmode\": \"array\",\n",
" \"ticktext\" : dendro_leaves,\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary space (before :) and below

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
" 'showline': False,\n",
" \"showticklabels\": True, \n",
" \"tickmode\": \"array\",\n",
" \"ticktext\" : dendro_leaves,\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary space (before :) and below

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
" 'showline': False,\n",
" 'zeroline': False,\n",
" 'showticklabels': False,\n",
" 'ticks':\"\"}})\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space

@menshikh-iv
Copy link
Copy Markdown
Contributor

@parulsethi please fix PEP things, return all images and resolve merge conflict. LGTM for me

Copy link
Copy Markdown
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

Comment thread docs/notebooks/Topic_dendrogram.ipynb Outdated
"# no. of terms to display in annotation\n",
"n_ann_terms = 10\n",
"\n",
"# use Jenson-Shannon distance metric in dendrogram\n",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more "Jenson".

@parulsethi parulsethi changed the title [WIP] Added Jensen Shannon metric [MRG] Added Jensen Shannon metric Aug 23, 2017
@parulsethi parulsethi changed the title [MRG] Added Jensen Shannon metric [MRG] Added Jensen Shannon metric and dendrogram visualization Aug 23, 2017
@menshikh-iv
Copy link
Copy Markdown
Contributor

Good job @parulsethi 🥇

@menshikh-iv menshikh-iv merged commit de3e2cb into piskvorky:develop Aug 23, 2017
@parulsethi parulsethi deleted the jenson_shannon branch August 23, 2017 10:37
fabriciorsf pushed a commit to LINE-PESC/gensim that referenced this pull request Aug 23, 2017
* add notebook with js heatmap

* separate out vector conversion function

* modify notebook

* add cluster heatmap notebook

* print y-axis values

* correct y labels

* add nb details

* few more nb details

* add text labels

* print linkage/dendo data

* add annotations for all hierarchy levels

* make diff annotations symmetric

* add Plotly's dendrogram code

* train for more passes

* remove outputs

* remove plotly code and imports

* err.. re-run cells

* fix jensen spelling

* made requested changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants