Parallelize lindensity by tulga-rdn · Pull Request #5007 · MDAnalysis/mdanalysis

tulga-rdn · 2025-03-31T21:32:46Z

Changes made in this Pull Request:

Parallelizes the mass and charge density profile calculation class (MDAnalysis.analysis.lineardensity.LinearDensity). As density profiles are computed independently for each timestep, the current parallelization methods allow the calculation of the density profiles without any problems.
Some variable calculations (masses, charges) are moved to enable parallelization
a boilerplate fixture to testsuite/analysis/conftest.py, analogous with existing ones
a client_... fixtures to all tests using in testsuite/MDAnalysisTests/analysis/test_lineardensity.py, and modify the way run() method is called

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5007.org.readthedocs.build/en/5007/

codecov · 2025-04-01T15:59:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.62%. Comparing base (e64755c) to head (b468be9).
Report is 14 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5007      +/-   ##
===========================================
- Coverage    93.62%   93.62%   -0.01%     
===========================================
  Files          177      177              
  Lines        21978    21995      +17     
  Branches      3110     3112       +2     
===========================================
+ Hits         20578    20593      +15     
- Misses         946      947       +1     
- Partials       454      455       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

orbeckst · 2025-04-02T07:27:10Z

@marinegor would you be able to review?

tulga-rdn · 2025-04-02T13:50:00Z

My primary concern is that parallelization seems to be kinda slow.

I also changed the definition of self.masses and self.charges from simply declaring them None to what's done in _single_frame. It seems correct and it fixes the errors I had during parallelization, but I would like to get it double checked.

orbeckst · 2025-04-02T15:53:42Z

Can you show benchmark data?

tulga-rdn · 2025-04-02T19:11:00Z

On waterPSF and waterDCD, the parallelized version is about 100x slower:

start = time.monotonic()
ld_obj = LinearDensity(selection, grouping, binsize=5)
ld = ld_obj.run(backend='multiprocessing', n_workers=10)
assert_allclose(ld.masses, expected_masses_atoms)
assert_allclose(ld.charges, expected_charges_atoms)
assert_allclose(ld.results.x.mass_density, expected_xmass_atoms, rtol=1e-06)
assert_allclose(ld.results.x.charge_density, expected_xcharge_atoms)
print(time.monotonic() - start)

2.42919

start = time.monotonic()
ld_obj = LinearDensity(selection, grouping, binsize=5)
ld = ld_obj.run(backend="serial")
assert_allclose(ld.masses, expected_masses_atoms)
assert_allclose(ld.charges, expected_charges_atoms)
assert_allclose(ld.results.x.mass_density, expected_xmass_atoms, rtol=1e-06)
assert_allclose(ld.results.x.charge_density, expected_xcharge_atoms)
print(time.monotonic() - start)

0.0373989

orbeckst · 2025-04-02T20:21:21Z

10 workers is a lot and DCD is terrible for parallel trajectory access. What about n_workers = 1, 2, 4, 8 ?

tulga-rdn · 2025-04-02T21:01:05Z

It becomes faster as I decrease the number of workers, would you recommend to switch the test case to other trajectory types? (if yes, then please tell your recommendations to which trajectory format exactly)

orbeckst · 2025-04-02T21:48:47Z

Benchmarking for performance is not easy. For instance, you have overheads that may eat up all your parallel gains on short trajectories. You also have to look at trajectory formats, invalidate OS caches for file access, etc...

https://doi.org/10.6084/m9.figshare.9695852 – Notes on benchmarking RMSF and DensityAnalysis for PMDA (the predecessor to parallel analysis); see discussion of the serial fraction and the stream plots (Fig 5) that show where code spends time
https://doi.org/10.25080/majora-1b6fd038-005 benchmarking parallel analysis, includes different file formats

I'd say it would be good enough if you can show some speed-up relative to serial for 2-4 cores. Post a plot as part of your PR.

You'll probably need a trajectory with a few hundred frames. You can always create one on the fly from your test traj

n_frames = 300    # how many frames do we want?
u = mda.Universe(PSF, DCD)

n_repeats = int(np.ceil(n_frames/u.trajectory.n_frames))
u_long = mda.Universe(PSF, n_repeats * [DCD])
u_long.atoms.write("long.dcd", frames="all")

This will produce the long.dcd file with at least 300 frames. (If you want exactly 300 frames then set frames=np.arange(n_frames) ... I think).

orbeckst · 2025-04-11T05:52:28Z

@PicoCentauri would you be able to review?

Copilot

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

orbeckst

I had a quick look and overall this looks good. Please address my minor comments. Also do the following

add your name to AUTHORS (if not in there yet)
add an entry to Enhancements in CHANGELOG (and your GH handle to the 2.10.0 line)
add a .. versionchanged:: 2.10.0 entry to the docs for LinearDensity (see eg DensityAnalysis for what we write when we add parallelization)

I'd still be grateful for comments from @marinegor @talagayev @PicoCentauri even if they don't manage to review everything but at least to check that I am not overlooking anything obvious.

orbeckst · 2025-04-13T16:16:16Z

@tulga-rdn getting this PR in would be good for your GSOC proposal. If you could address my comments quickly (and assuming that nobody else finds any major issues), I think this could get merged soon.

orbeckst

Thank you for addressing my comments.

@PicoCentauri @marinegor @talagayev it would be super-helpful if you could have a quick look at the PR and if looks good from your view, approve it. My plan is to merge it in the next two days. Thank you!!

(This is relevant for a GSOC application so this should be done in a timely manner.)

PicoCentauri

Thanks @tulga-rdn and sorry @orbeckst for the delay.

I am overall happy with the changes.

orbeckst · 2025-04-16T15:35:51Z

Please also resolve conflicts. Thanks.

orbeckst · 2025-04-16T15:36:26Z

Thanks for the review @PicoCentauri !

orbeckst

Please make sure that no arrays get accidentally changed.

Please fix conflicts.

orbeckst

Can you please do some clean-up on the code? It looks as if there's duplicated code between __init__ and _single_frame. Thanks.

orbeckst · 2025-04-17T23:26:12Z

+        else:
+            raise AttributeError(
+                f"{self.grouping} is not a valid value for grouping."
+            )


Can you add

self.totalmass = np.sum(self.masses)

here?

Also fails the tests

May be for updating atomgroups. Ok.

Sorry, this one works, I was testing on an old iteration of the code

orbeckst · 2025-04-17T23:32:41Z

+        )

    def _single_frame(self):
-        # Get masses and charges for the selection


You added all the mass/charge extraction to __init__ which makes sense to me. Can you then remove the code here that's now in init? Neither masses nor charges should change during the simulation.

You should then also move the self.totalmass = np.sum(self.masses) line to init for completeness.

Can you please check that this will still work and pass the tests? (Or do you see a problem arising by doing this?)

No, it doesn't pass the tests :(

Thanks for testing.

I also read the line in the versionchanged 2.2.0 LinearDensity now works with updating atom groups. — for this to work, we do need to keep the masses/charges extraction in _single_frame.

I do not quite get why the parallel analysis fails when you initialize them to None in __init__ but I think we'll go with it.

Suggested change

# Get masses and charges for the selection

# Get masses and charges for the selection (e.g. UpdatingAtomGroup)

orbeckst

Thanks for the fix-ups.

I looked at totalmass and it's neither used anywhere nor documented. It's also ambiguous as to what it should contain. Therefore, let's remove it completely:

Please delete the self.totalmass line(s).
Add a note to your versionchanged 2.10.0. "Removed undocumented and unused attribute totalmass."
Add to CHANGELOG under Changes: "Removed undocumented and unused attribute analysis.lineardensity.LinearDensity.totalmass (PR #5007)"

(Normally we don't remove anything without deprecation but because it's not documented (and may even lead people to using it wrongly) we can just remove it.)

Thanks. Otherwise ready to merge.

tulga-rdn · 2025-04-18T17:15:20Z

Done, for some reason, read the docs couldn't pull the last commit. @orbeckst maybe you can re-run the read the docs build?

orbeckst

Looks good. Thank you for the contribution.

(I am rerunning RTD and once all of this looks good we can merge.)

orbeckst · 2025-04-18T18:29:20Z

Congratulations @tulga-rdn , PR is merged 🎉 !

tulga-rdn added 4 commits March 31, 2025 23:09

almost working except masses and charges

2ab2bf5

mass and charge work, but densities are not close enough to ref

164790d

moved variables around

cea4f7d

add tests

3151d79

tulga-rdn marked this pull request as ready for review April 1, 2025 15:25

fix errors

31427a2

orbeckst requested a review from Copilot April 11, 2025 05:52

Copilot AI reviewed Apr 11, 2025

View reviewed changes

Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated

Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated

orbeckst added Component-Analysis parallelization labels Apr 11, 2025

orbeckst requested changes Apr 13, 2025

View reviewed changes

orbeckst requested review from PicoCentauri, marinegor and talagayev April 13, 2025 16:14

tulga-rdn added 3 commits April 16, 2025 00:41

minor changes to tests + comments for clarity

02dd5d6

linting

551914e

changelog and authors

09d1a15

tulga-rdn mentioned this pull request Apr 15, 2025

Update README.rst #5020

Merged

1 task

orbeckst approved these changes Apr 15, 2025

View reviewed changes

orbeckst self-assigned this Apr 15, 2025

PicoCentauri reviewed Apr 16, 2025

View reviewed changes

Comment thread package/MDAnalysis/analysis/lineardensity.py

Comment thread testsuite/MDAnalysisTests/analysis/conftest.py

Comment thread package/MDAnalysis/analysis/lineardensity.py

Comment thread package/MDAnalysis/analysis/lineardensity.py

minor improvement in lineardensity.py + readd to CHANGELOG

12b487d

tulga-rdn force-pushed the parallelize_lindensity branch from b4dc272 to 12b487d Compare April 16, 2025 11:01

orbeckst requested changes Apr 16, 2025

View reviewed changes

Comment thread package/MDAnalysis/analysis/lineardensity.py

orbeckst requested changes Apr 17, 2025

View reviewed changes

orbeckst and others added 4 commits April 17, 2025 16:35

Merge branch 'develop' into parallelize_lindensity

cc66edc

minor cleanup

56e771e

minor comment edit

c767ce7

yet another minor edit

2d6a33c

orbeckst requested changes Apr 18, 2025

View reviewed changes

Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated

remove totalmass

b468be9

orbeckst approved these changes Apr 18, 2025

View reviewed changes

orbeckst merged commit e213f2b into MDAnalysis:develop Apr 18, 2025

	# Get masses and charges for the selection
	# Get masses and charges for the selection (e.g. UpdatingAtomGroup)

Uh oh!

Conversation

tulga-rdn commented Mar 31, 2025 • edited by orbeckst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers Certificate of Origin

Uh oh!

codecov Bot commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

orbeckst commented Apr 2, 2025

Uh oh!

tulga-rdn commented Apr 2, 2025

Uh oh!

orbeckst commented Apr 2, 2025

Uh oh!

tulga-rdn commented Apr 2, 2025

Uh oh!

orbeckst commented Apr 2, 2025

Uh oh!

tulga-rdn commented Apr 2, 2025

Uh oh!

orbeckst commented Apr 2, 2025

Uh oh!

orbeckst commented Apr 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

orbeckst left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

orbeckst commented Apr 13, 2025

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

PicoCentauri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

orbeckst commented Apr 16, 2025

Uh oh!

orbeckst commented Apr 16, 2025

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tulga-rdn commented Mar 31, 2025 •

edited by orbeckst

Loading

codecov Bot commented Apr 1, 2025 •

edited

Loading

orbeckst left a comment •

edited

Loading