Skip to content

proximal gradient method @ group lasso fixed#249

Merged
pavanramkumar merged 7 commits intoglm-tools:masterfrom
AnchorBlues:_prox_fix
Sep 4, 2018
Merged

proximal gradient method @ group lasso fixed#249
pavanramkumar merged 7 commits intoglm-tools:masterfrom
AnchorBlues:_prox_fix

Conversation

@AnchorBlues
Copy link
Copy Markdown
Contributor

In proximal gradient method, weights where those absolute values are smaller than the threshold must be 0.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #249 into master will decrease coverage by 0.04%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
- Coverage    56.8%   56.75%   -0.05%     
==========================================
  Files           7        7              
  Lines        1301     1302       +1     
  Branches      261      261              
==========================================
  Hits          739      739              
- Misses        492      495       +3     
+ Partials       70       68       -2
Impacted Files Coverage Δ
pyglmnet/pyglmnet.py 73.33% <0%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84869a0...bfdb111. Read the comment docs.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Aug 31, 2018

Codecov Report

Merging #249 into master will increase coverage by 1.64%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
+ Coverage    56.8%   58.44%   +1.64%     
==========================================
  Files           7        7              
  Lines        1301     1302       +1     
  Branches      261      261              
==========================================
+ Hits          739      761      +22     
+ Misses        492      472      -20     
+ Partials       70       69       -1
Impacted Files Coverage Δ
pyglmnet/pyglmnet.py 78.22% <100%> (+4.72%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84869a0...5b9668e. Read the comment docs.

@jasmainak
Copy link
Copy Markdown
Member

Thanks. Could you add a tiny test?

@jasmainak
Copy link
Copy Markdown
Member

closes #235

@AnchorBlues
Copy link
Copy Markdown
Contributor Author

AnchorBlues commented Aug 31, 2018

@jasmainak
Sorry, I don't know what kind of test code I should write.
I would be glad if you show me some example test code.

@jasmainak
Copy link
Copy Markdown
Member

Think about how you discovered the bug. The test should basically reproduce that
behavior in some sense. It should fail on master but pass on this branch.

You can add the test in this function. One easy check perhaps could be weather the estimated
betas have the same non-zero indices as the simulated one.

@AnchorBlues
Copy link
Copy Markdown
Contributor Author

@jasmainak
Thank you for your advice.
I added test code to test_group_lasso function.
Please confirm.

Copy link
Copy Markdown
Member

@jasmainak jasmainak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks like a good start! If you can address my comments, I will take another look. Thanks again for the fix

# create an instance of the GLM class
glm_group = GLM(distr='softplus', alpha=1.)
lams = [0.5, 0.3237394, 0.2096144, 0.13572088, 0.08787639,
0.0568981, 0.03684031, 0.02385332, 0.01544452, 0.01]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need all these lams for the test? It will make the tests slow. I would do only one lam

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand.
I will choice 0.2 as one lam value.
if lam==0.2 in that case, I made sure that some groups to be zero, others not to be zero.
So I think it will be a good test case.


for lam in lams:
# create an instance of the GLM class
glm_group = GLM(distr='softplus', alpha=alpha, reg_lambda=lam, group=groups)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the same variable names we use in the rest of the code base. Instead of lam it should be reg_lambda

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll change variable name lam to reg_lambda.

beta = glm_group.beta_
group_norms = np.abs(beta)
for target_group_idx in unique_group_idxs:
if target_group_idx == 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the _prox method of GLM class (https://github.com/glm-tools/pyglmnet/blob/master/pyglmnet/pyglmnet.py).
Line 519 ~ 522.
I just wrote the same process.

# in each group, coef must be [all nonzero] or [all zero].
unique_group_idxs = np.unique(groups)
beta = glm_group.beta_
group_norms = np.abs(beta)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing. You should explicitly initialize an array of zeros

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll fix it.
However, I point out that the _prox method of GLM class is writen the same way.

@AnchorBlues
Copy link
Copy Markdown
Contributor Author

@jasmainak
I found that last assertion of test_group_lasso function,

assert (beta[group_norms <= thresh] == 0.0).all()

is not appropriate because there is a possibility that beta[group_norms <= thresh] will be nonzero if the values of result[idx_to_update] changes to the values less than thresh at the final iteration of optimization.
So I removed the assertion from the test code.


target_beta = beta[groups == group_id]
n_nonzero = (target_beta != 0.0).sum()
assert n_nonzero in (len(target_beta), 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the test as it's compact. But the problem is that it passes on master. Ideally, you want the test to fail on master branch so the bug does not happen again

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added the following assertion to the test code.

assert (beta[groups != 0] == 0.0).any()

I made sure that this assertion assert False at master branch, and assert True after fixed.

reg_lambda = 0.2
# create an instance of the GLM class
glm_group = GLM(distr='softplus', alpha=1.)
glm_group = GLM(distr='softplus', alpha=alpha, reg_lambda=reg_lambda, group=groups)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure you conform to pep8 conventions? You can add a plugin to your editor or install a command-line checker. Right now, the line exceeds 79 characters

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
I fixed the test code so that it conforms to pep8 conventions.

Copy link
Copy Markdown
Member

@jasmainak jasmainak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully last round of changes. Thanks a lot!

assert n_nonzero in (len(target_beta), 0)

# one of the groups must be [all zero]
assert (beta[groups != 0] == 0.0).any()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nopes, this is not checking that all the elements of a group are zero. You need to do something like:

assert np.any([beta[groups == g].sum() == 0 for g in group_ids[1:]])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, there needs to be one more emtpy line after this according to pep8

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasmainak
Thanks.
I fixed the code according to the above advice.
Please confirm,

assert n_nonzero in (len(target_beta), 0)

# one of the groups must be [all zero]
assert np.any([beta[groups == group_id].sum() == 0 \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the trailing backslash here but I can live with it

@jasmainak
Copy link
Copy Markdown
Member

@pavanramkumar please merge if you are happy. LGTM

@pavanramkumar
Copy link
Copy Markdown
Collaborator

Thanks @AnchorBlues and @jasmainak — great contribution!

@pavanramkumar pavanramkumar merged commit 65bb9f7 into glm-tools:master Sep 4, 2018
@jasmainak
Copy link
Copy Markdown
Member

@AnchorBlues looking forward to more contributions and bugfixes. Thanks!

@AnchorBlues
Copy link
Copy Markdown
Contributor Author

@jasmainak @pavanramkumar
Thank you for reviewing my fix and merging it.
I am glad to contribute to your package.
I'll continue to use this package for data science.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants