Skip to content

Parallelize lindensity#5007

Merged
orbeckst merged 14 commits into
MDAnalysis:developfrom
tulga-rdn:parallelize_lindensity
Apr 18, 2025
Merged

Parallelize lindensity#5007
orbeckst merged 14 commits into
MDAnalysis:developfrom
tulga-rdn:parallelize_lindensity

Conversation

@tulga-rdn

@tulga-rdn tulga-rdn commented Mar 31, 2025

Copy link
Copy Markdown
Contributor

Fixes #4678

Changes made in this Pull Request:

  • Parallelizes the mass and charge density profile calculation class (MDAnalysis.analysis.lineardensity.LinearDensity). As density profiles are computed independently for each timestep, the current parallelization methods allow the calculation of the density profiles without any problems.
  • Some variable calculations (masses, charges) are moved to enable parallelization
  • a boilerplate fixture to testsuite/analysis/conftest.py, analogous with existing ones
  • a client_... fixtures to all tests using in testsuite/MDAnalysisTests/analysis/test_lineardensity.py, and modify the way run() method is called

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5007.org.readthedocs.build/en/5007/

@tulga-rdn tulga-rdn marked this pull request as ready for review April 1, 2025 15:25
@codecov

codecov Bot commented Apr 1, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.62%. Comparing base (e64755c) to head (b468be9).
Report is 14 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5007      +/-   ##
===========================================
- Coverage    93.62%   93.62%   -0.01%     
===========================================
  Files          177      177              
  Lines        21978    21995      +17     
  Branches      3110     3112       +2     
===========================================
+ Hits         20578    20593      +15     
- Misses         946      947       +1     
- Partials       454      455       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orbeckst

orbeckst commented Apr 2, 2025

Copy link
Copy Markdown
Member

@marinegor would you be able to review?

@tulga-rdn

Copy link
Copy Markdown
Contributor Author

My primary concern is that parallelization seems to be kinda slow.

I also changed the definition of self.masses and self.charges from simply declaring them None to what's done in _single_frame. It seems correct and it fixes the errors I had during parallelization, but I would like to get it double checked.

@orbeckst

orbeckst commented Apr 2, 2025

Copy link
Copy Markdown
Member

Can you show benchmark data?

@tulga-rdn

Copy link
Copy Markdown
Contributor Author

On waterPSF and waterDCD, the parallelized version is about 100x slower:

start = time.monotonic()
ld_obj = LinearDensity(selection, grouping, binsize=5)
ld = ld_obj.run(backend='multiprocessing', n_workers=10)
assert_allclose(ld.masses, expected_masses_atoms)
assert_allclose(ld.charges, expected_charges_atoms)
assert_allclose(ld.results.x.mass_density, expected_xmass_atoms, rtol=1e-06)
assert_allclose(ld.results.x.charge_density, expected_xcharge_atoms)
print(time.monotonic() - start)

2.42919

start = time.monotonic()
ld_obj = LinearDensity(selection, grouping, binsize=5)
ld = ld_obj.run(backend="serial")
assert_allclose(ld.masses, expected_masses_atoms)
assert_allclose(ld.charges, expected_charges_atoms)
assert_allclose(ld.results.x.mass_density, expected_xmass_atoms, rtol=1e-06)
assert_allclose(ld.results.x.charge_density, expected_xcharge_atoms)
print(time.monotonic() - start)

0.0373989

@orbeckst

orbeckst commented Apr 2, 2025

Copy link
Copy Markdown
Member

10 workers is a lot and DCD is terrible for parallel trajectory access. What about n_workers = 1, 2, 4, 8 ?

@tulga-rdn

Copy link
Copy Markdown
Contributor Author

It becomes faster as I decrease the number of workers, would you recommend to switch the test case to other trajectory types? (if yes, then please tell your recommendations to which trajectory format exactly)

@orbeckst

orbeckst commented Apr 2, 2025

Copy link
Copy Markdown
Member

Benchmarking for performance is not easy. For instance, you have overheads that may eat up all your parallel gains on short trajectories. You also have to look at trajectory formats, invalidate OS caches for file access, etc...

I'd say it would be good enough if you can show some speed-up relative to serial for 2-4 cores. Post a plot as part of your PR.

You'll probably need a trajectory with a few hundred frames. You can always create one on the fly from your test traj

n_frames = 300    # how many frames do we want?
u = mda.Universe(PSF, DCD)

n_repeats = int(np.ceil(n_frames/u.trajectory.n_frames))
u_long = mda.Universe(PSF, n_repeats * [DCD])
u_long.atoms.write("long.dcd", frames="all")

This will produce the long.dcd file with at least 300 frames. (If you want exactly 300 frames then set frames=np.arange(n_frames) ... I think).

@orbeckst orbeckst requested a review from Copilot April 11, 2025 05:52
@orbeckst

Copy link
Copy Markdown
Member

@PicoCentauri would you be able to review?

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated
Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated

@orbeckst orbeckst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look and overall this looks good. Please address my minor comments. Also do the following

  • add your name to AUTHORS (if not in there yet)
  • add an entry to Enhancements in CHANGELOG (and your GH handle to the 2.10.0 line)
  • add a .. versionchanged:: 2.10.0 entry to the docs for LinearDensity (see eg DensityAnalysis for what we write when we add parallelization)

I'd still be grateful for comments from @marinegor @talagayev @PicoCentauri even if they don't manage to review everything but at least to check that I am not overlooking anything obvious.

Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated
Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated
Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated
Comment thread testsuite/MDAnalysisTests/analysis/test_lineardensity.py
Comment thread testsuite/MDAnalysisTests/analysis/test_lineardensity.py Outdated
@orbeckst

Copy link
Copy Markdown
Member

@tulga-rdn getting this PR in would be good for your GSOC proposal. If you could address my comments quickly (and assuming that nobody else finds any major issues), I think this could get merged soon.

@tulga-rdn tulga-rdn mentioned this pull request Apr 15, 2025
1 task

@orbeckst orbeckst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my comments.

@PicoCentauri @marinegor @talagayev it would be super-helpful if you could have a quick look at the PR and if looks good from your view, approve it. My plan is to merge it in the next two days. Thank you!!

(This is relevant for a GSOC application so this should be done in a timely manner.)

@orbeckst orbeckst self-assigned this Apr 15, 2025

@PicoCentauri PicoCentauri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tulga-rdn and sorry @orbeckst for the delay.

I am overall happy with the changes.

Comment thread package/MDAnalysis/analysis/lineardensity.py
Comment thread testsuite/MDAnalysisTests/analysis/conftest.py
Comment thread package/MDAnalysis/analysis/lineardensity.py
Comment thread package/MDAnalysis/analysis/lineardensity.py
@tulga-rdn tulga-rdn force-pushed the parallelize_lindensity branch from b4dc272 to 12b487d Compare April 16, 2025 11:01
@orbeckst

Copy link
Copy Markdown
Member

Please also resolve conflicts. Thanks.

@orbeckst

Copy link
Copy Markdown
Member

Thanks for the review @PicoCentauri !

@orbeckst orbeckst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that no arrays get accidentally changed.

Please fix conflicts.

Comment thread package/MDAnalysis/analysis/lineardensity.py

@orbeckst orbeckst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please do some clean-up on the code? It looks as if there's duplicated code between __init__ and _single_frame. Thanks.

else:
raise AttributeError(
f"{self.grouping} is not a valid value for grouping."
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add

self.totalmass = np.sum(self.masses)

here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fails the tests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be for updating atomgroups. Ok.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this one works, I was testing on an old iteration of the code

Comment thread package/MDAnalysis/analysis/lineardensity.py
)

def _single_frame(self):
# Get masses and charges for the selection

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added all the mass/charge extraction to __init__ which makes sense to me. Can you then remove the code here that's now in init? Neither masses nor charges should change during the simulation.

You should then also move the self.totalmass = np.sum(self.masses) line to init for completeness.

Can you please check that this will still work and pass the tests? (Or do you see a problem arising by doing this?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't pass the tests :(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing.

I also read the line in the versionchanged 2.2.0 LinearDensity now works with updating atom groups. — for this to work, we do need to keep the masses/charges extraction in _single_frame.

I do not quite get why the parallel analysis fails when you initialize them to None in __init__ but I think we'll go with it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Get masses and charges for the selection
# Get masses and charges for the selection (e.g. UpdatingAtomGroup)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@orbeckst orbeckst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix-ups.

I looked at totalmass and it's neither used anywhere nor documented. It's also ambiguous as to what it should contain. Therefore, let's remove it completely:

  • Please delete the self.totalmass line(s).
  • Add a note to your versionchanged 2.10.0. "Removed undocumented and unused attribute totalmass."
  • Add to CHANGELOG under Changes: "Removed undocumented and unused attribute analysis.lineardensity.LinearDensity.totalmass (PR #5007)"

(Normally we don't remove anything without deprecation but because it's not documented (and may even lead people to using it wrongly) we can just remove it.)

Thanks. Otherwise ready to merge.

Comment thread package/MDAnalysis/analysis/lineardensity.py Outdated
@tulga-rdn

Copy link
Copy Markdown
Contributor Author

Done, for some reason, read the docs couldn't pull the last commit. @orbeckst maybe you can re-run the read the docs build?

@orbeckst orbeckst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for the contribution.

(I am rerunning RTD and once all of this looks good we can merge.)

@orbeckst orbeckst merged commit e213f2b into MDAnalysis:develop Apr 18, 2025
@orbeckst

Copy link
Copy Markdown
Member

Congratulations @tulga-rdn , PR is merged 🎉 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MDAnalysis.analysis.lineardensity: Implement parallelization or mark as unparallelizable

4 participants