Skip to content

Extending test coverage for 2D visualization#5000

Merged
orbeckst merged 11 commits intoMDAnalysis:developfrom
TRY-ER:visualize_test_coverage
Apr 9, 2025
Merged

Extending test coverage for 2D visualization#5000
orbeckst merged 11 commits intoMDAnalysis:developfrom
TRY-ER:visualize_test_coverage

Conversation

@TRY-ER
Copy link
Contributor

@TRY-ER TRY-ER commented Mar 30, 2025

Issue 597

GSoC Primary Contribution for test coverage!

Changes made in this Pull Request:

  • Previously, the visualization.streamlines.py showed 54% of the test coverage.
  • The limitations were mostly due to the declaration of certain functions within another function
  • The functions were separated and corresponding test cases were added.
  • The case of maximum core utilized were handled.

Increase by 38 % code coverage in the specific file

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5000.org.readthedocs.build/en/5000/

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

@codecov
Copy link

codecov bot commented Mar 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.58%. Comparing base (f960aa1) to head (afdd4f8).
Report is 18 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5000      +/-   ##
===========================================
+ Coverage    93.43%   93.58%   +0.15%     
===========================================
  Files          177      189      +12     
  Lines        21894    22960    +1066     
  Branches      3095     3095              
===========================================
+ Hits         20457    21488    +1031     
- Misses         986     1019      +33     
- Partials       451      453       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick surface-level review: Given that these functions were previously private, we should keep them out of the public API by naming them with a leading underscore:

  • _produce_list_indices_point_in_polygon_this_frame()
  • _produce_list_centroids_this_frame()
    Then they also don't have to show up in docs.

Also add yourselves to AUTHORS.

We don't need a CHANGELOG entry for tests/internal refactoring.

@orbeckst
Copy link
Member

@lilyminium @tylerjereddy would you be able to add a review — should be quick. Please leave an accept or changes requested to be clear about your intentions. This is GSOC relevant so needs to get done soon and also needs to be unambiguous. Thank you!

@orbeckst orbeckst self-assigned this Mar 30, 2025
@tylerjereddy
Copy link
Member

done soon

By end of day Monday or sooner?

@orbeckst
Copy link
Member

End of Monday works!

@TRY-ER TRY-ER force-pushed the visualize_test_coverage branch from 5c5e5a5 to a293f3a Compare March 31, 2025 03:49
@TRY-ER
Copy link
Contributor Author

TRY-ER commented Mar 31, 2025

Quick surface-level review: Given that these functions were previously private, we should keep them out of the public API by naming them with a leading underscore:

  • _produce_list_indices_point_in_polygon_this_frame()
  • _produce_list_centroids_this_frame()
    Then they also don't have to show up in docs.

Also add yourselves to AUTHORS.

We don't need a CHANGELOG entry for tests/internal refactoring.

Made the notified changes !

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great additions to the test coverage @TRY-ER, I've added some suggestions below.

indices_tuple = (np.array([1, 3]),)
list_indices = [indices_tuple]
result = streamlines._produce_list_centroids_this_frame(list_indices, pts)
expected = np.average(pts[[1, 3]], axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected = np.average(pts[[1, 3]], axis=0)
expected = np.array([2., 2.])

It's better to test actual values instead of re-running through the function logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added actual values directly!

Comment on lines +116 to +117
expected1 = np.average(pts[[0, 2]], axis=0)
expected2 = np.average(pts[[1, 3, 4]], axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well could you please use the actual values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added actual values !

Comment on lines +150 to +174
(np.array([0, 0]), [0.7999992370605469, 0.5399990081787109]),
(np.array([1, 0]), [0.8000001907348633, 0.5399971008300781]),
(np.array([2, 0]), [0.8000020980834961, 0.5400047302246094]),
(np.array([3, 0]), [0.8000001907348633, 0.5400009155273438]),
(np.array([4, 0]), [0.7999982833862305, 0.5400009155273438]),
(np.array([0, 1]), [0.7999992370605469, 0.5399999618530273]),
(np.array([1, 1]), [0.7999954223632812, 0.5400009155273438]),
(np.array([2, 1]), [0.7999992370605469, 0.5400047302246094]),
(np.array([3, 1]), [0.7999954223632812, 0.5399932861328125]),
(np.array([4, 1]), [0.8000030517578125, 0.5399932861328125]),
(np.array([0, 2]), [0.8000068664550781, 0.5399999618530273]),
(np.array([1, 2]), [0.7999992370605469, 0.5400009155273438]),
(np.array([2, 2]), [0.8000106811523438, 0.5400009155273438]),
(np.array([3, 2]), [0.8000106811523438, 0.5399932861328125]),
(np.array([4, 2]), [0.8000030517578125, 0.5399932861328125]),
(np.array([0, 3]), [0.7999954223632812, 0.5399999618530273]),
(np.array([1, 3]), [0.7999954223632812, 0.5400009155273438]),
(np.array([2, 3]), [0.8000030517578125, 0.5399971008300781]),
(np.array([3, 3]), [0.8000030517578125, 0.5399932861328125]),
(np.array([4, 3]), [0.8000030517578125, 0.5400009155273438]),
(np.array([0, 4]), [0.79998779296875, 0.5400009155273438]),
(np.array([1, 4]), [0.8000106811523438, 0.5399971008300781]),
(np.array([2, 4]), [0.7999954223632812, 0.5400047302246094]),
(np.array([3, 4]), [0.8000030517578125, 0.5400009155273438]),
(np.array([4, 4]), [0.7999954223632812, 0.5399932861328125]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are all roughly (0.8, 0.54) -- could you use the atol parameter to check for that instead of including such fine precision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, a check logic was assigned to check the approximate values to 0.8 and 0.5. (rather checking whole output)

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I think I agree with Lily--this cleans up some nested functions I never should have written that way (the code is very old, I was still learning) and exposes them to some tests.

The tests may be mostly around basic functionality, but that still has some value of course. I added a few inline comments/suggestions.

One more thing--the description of this PR cites gh-597 and I don't see the connection--that issue is about checking that errors are raised properly in our source code by testing faulty inputs, etc. The only relationship to this PR is the theme of improving test coverage, but that particular issue is about that specific type of test coverage, for ensuring that errors are raised as expected.

list_indices = [indices_tuple]
result = streamlines._produce_list_centroids_this_frame(list_indices, pts)
expected = np.array([2.0, 2.0])
np.testing.assert_array_almost_equal(result[0], expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should prefer assert_allclose in new code per the docstring of this NumPy function I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used assert_allclose in recent changes.

[0.5, 0.5],
[1.5, 0.5],
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think we could write this more compactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but the black linting was giving errors, hence wrote the list in this manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I agree with @tylerjereddy , I am going to succumb to black's opinion here and just leave it in whichever way black wants it to be. Not worthwhile to mark it as an exception.

vertex_list, points
)
expected = [(np.array([0], dtype=int),)]
np.testing.assert_array_equal(result[0][0], expected[0][0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding something, but above it says:

matplotlib.path.Path.contains_points does not include boundary points.

But expected has the index-0 point detected as interior, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I just checked the boundary points that are included in the Matplotlib path and returns a 1D zero Numpy array. Made changes to the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're just checking the 0-index here, which wouldn't check if result had more than one entry -- could you convert both result and expected to arrays?

Copy link
Contributor Author

@TRY-ER TRY-ER Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific test, I was checking a single point on the shape's boundary! As there will be one expected element in the array, I tested only the first element in the initial test. As you commented, I have added to check all the items in the list rather than the first one only.

for entry in values:
res = entry[1]
assert res[0] == pytest.approx(0.8, abs=1e-1)
assert res[1] == pytest.approx(0.5, abs=1e-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably minor, but I suspect using assert_allclose here with a slice of the first two indices will tell us if only one of these is failing while as written the first failure could mask the success/failure of the second assertion

Copy link
Contributor Author

@TRY-ER TRY-ER Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done it with the numpy.assert_allclose() !

ymin=univ.atoms.positions[..., 1].min(),
ymax=univ.atoms.positions[..., 1].max(),
maximum_delta_magnitude=2.0,
num_cores="maximum",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be a little hesitant to test num_cores="maximum" by default, at least in the CI. One option might be to have a slow test marker for the multi-core test and only do that locally/when needed, but I'm not sure if we have such a marker at the moment (it is quite common upstream for resource-intensive tests where you occasionally may want to check/bisect, but don't want to/can't run continuously).

I suspect the pragmatic thing for now may be to just use a single core, or perhaps 2 so that we get concurrency with reduced risk of locking things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case was added to consider the core inclusion case of "maximum". I have modified the cores to 2. If there is such a marker implemented later, I will modify the test case accordingly to test if the "maximum" case works as expected.

@TRY-ER
Copy link
Contributor Author

TRY-ER commented Apr 1, 2025

At first glance, I think I agree with Lily--this cleans up some nested functions I never should have written that way (the code is very old, I was still learning) and exposes them to some tests.

The tests may be mostly around basic functionality, but that still has some value of course. I added a few inline comments/suggestions.

One more thing--the description of this PR cites gh-597 and I don't see the connection--that issue is about checking that errors are raised properly in our source code by testing faulty inputs, etc. The only relationship to this PR is the theme of improving test coverage, but that particular issue is about that specific type of test coverage, for ensuring that errors are raised as expected.

You are right that the notified issue does not entirely align with the contribution. But among the issues tagged as GSoC Starter, the contribution was most relevant for test case coverage.

@TRY-ER TRY-ER requested review from lilyminium and orbeckst April 1, 2025 14:57
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. As formalities

  • add your GH handle to the author list for the 2.10 release in CHANGELOG – we don't need an entry but you should be visible as a contributor
  • fix author order in AUTHORS

@lilyminium and @tylerjereddy please let me know if you have any remaining concerns/comments. Thank you for reviewing, much appreciated!!

- Matthew Davies
- Jia-Xin Zhu
- Tanish Yelgoe
2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, thanks for fixing :-)

package/AUTHORS Outdated
- Namir Oues
- Lexi Xu
- BHM-Bob G
- Debasish Mohanty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your name at the end of the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[0.5, 0.5],
[1.5, 0.5],
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I agree with @tylerjereddy , I am going to succumb to black's opinion here and just leave it in whichever way black wants it to be. Not worthwhile to mark it as an exception.

@orbeckst
Copy link
Member

orbeckst commented Apr 3, 2025

@TRY-ER sorry I said in my earlier review

We don't need a CHANGELOG entry for tests/internal refactoring.

I should have been more precise: we don't need a bullet point in CHANGELOG but we do want your name in there as a contributor. I realize that my earlier review may have come across as not valuing your contribution here. I am sorry for that. ALL contributions are important for the success of the project and we want to give credit to all contributors.

@TRY-ER
Copy link
Contributor Author

TRY-ER commented Apr 3, 2025

Just for information! My name is Debasish Mohanty (In case my Github handle confuses anyone). [ I am proposing the MDAKit for polymers for GSoC 2025. ]

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my vantage point! Thanks. I am waiting for opinions from @tylerjereddy and @lilyminium .

Comment on lines +147 to +149
square = [(0, 0), (1, 0), (1, 1), (0, 1)]
vertex_list = [square]
points = np.array([[1, 0.5]]) # exactly on the boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this test! Just to be rigorous (and because I'm curious), what happens if vertex_list contains two adjacent squares both including the boundary that points lies on? Does _produce_list_indices_point_in_polygon_this_frame identify the point as being in both squares?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as it is implemented in the _produce_list_indices_point_in_polygon_this_frame function, the index of the point lying on the boundary of two of the adjacent squares will be included for both vertices. (identifying the point is on both squares)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Sorry, I should have been clearer -- could you please update the test to include this test and make it clearer? Otherwise the PR looks good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test has been added to check if the indices of the shared points on adjacent squares return expected data from _produce_list_indices_point_in_polygon_this_frame or not.

@TRY-ER TRY-ER requested a review from lilyminium April 4, 2025 09:32
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks you @TRY-ER!

@orbeckst orbeckst merged commit 1cdb055 into MDAnalysis:develop Apr 9, 2025
23 of 24 checks passed
@orbeckst
Copy link
Member

orbeckst commented Apr 9, 2025

Thank you for your contribution @TRY-ER ! Congratulations on the merged PR 🎉 !

@TRY-ER
Copy link
Contributor Author

TRY-ER commented Apr 9, 2025

Thank you @orbeckst , @lilyminium , @tylerjereddy for your kind guidance and support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants