Parallelize lindensity#5007
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #5007 +/- ##
===========================================
- Coverage 93.62% 93.62% -0.01%
===========================================
Files 177 177
Lines 21978 21995 +17
Branches 3110 3112 +2
===========================================
+ Hits 20578 20593 +15
- Misses 946 947 +1
- Partials 454 455 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@marinegor would you be able to review? |
|
My primary concern is that parallelization seems to be kinda slow. I also changed the definition of |
|
Can you show benchmark data? |
|
On waterPSF and waterDCD, the parallelized version is about 100x slower:
|
|
10 workers is a lot and DCD is terrible for parallel trajectory access. What about n_workers = 1, 2, 4, 8 ? |
|
It becomes faster as I decrease the number of workers, would you recommend to switch the test case to other trajectory types? (if yes, then please tell your recommendations to which trajectory format exactly) |
|
Benchmarking for performance is not easy. For instance, you have overheads that may eat up all your parallel gains on short trajectories. You also have to look at trajectory formats, invalidate OS caches for file access, etc...
I'd say it would be good enough if you can show some speed-up relative to serial for 2-4 cores. Post a plot as part of your PR. You'll probably need a trajectory with a few hundred frames. You can always create one on the fly from your test traj n_frames = 300 # how many frames do we want?
u = mda.Universe(PSF, DCD)
n_repeats = int(np.ceil(n_frames/u.trajectory.n_frames))
u_long = mda.Universe(PSF, n_repeats * [DCD])
u_long.atoms.write("long.dcd", frames="all")This will produce the long.dcd file with at least 300 frames. (If you want exactly 300 frames then set |
|
@PicoCentauri would you be able to review? |
There was a problem hiding this comment.
I had a quick look and overall this looks good. Please address my minor comments. Also do the following
- add your name to AUTHORS (if not in there yet)
- add an entry to Enhancements in CHANGELOG (and your GH handle to the 2.10.0 line)
- add a
.. versionchanged:: 2.10.0entry to the docs for LinearDensity (see eg DensityAnalysis for what we write when we add parallelization)
I'd still be grateful for comments from @marinegor @talagayev @PicoCentauri even if they don't manage to review everything but at least to check that I am not overlooking anything obvious.
|
@tulga-rdn getting this PR in would be good for your GSOC proposal. If you could address my comments quickly (and assuming that nobody else finds any major issues), I think this could get merged soon. |
orbeckst
left a comment
There was a problem hiding this comment.
Thank you for addressing my comments.
@PicoCentauri @marinegor @talagayev it would be super-helpful if you could have a quick look at the PR and if looks good from your view, approve it. My plan is to merge it in the next two days. Thank you!!
(This is relevant for a GSOC application so this should be done in a timely manner.)
PicoCentauri
left a comment
There was a problem hiding this comment.
Thanks @tulga-rdn and sorry @orbeckst for the delay.
I am overall happy with the changes.
b4dc272 to
12b487d
Compare
|
Please also resolve conflicts. Thanks. |
|
Thanks for the review @PicoCentauri ! |
orbeckst
left a comment
There was a problem hiding this comment.
Please make sure that no arrays get accidentally changed.
Please fix conflicts.
orbeckst
left a comment
There was a problem hiding this comment.
Can you please do some clean-up on the code? It looks as if there's duplicated code between __init__ and _single_frame. Thanks.
| else: | ||
| raise AttributeError( | ||
| f"{self.grouping} is not a valid value for grouping." | ||
| ) |
There was a problem hiding this comment.
Can you add
self.totalmass = np.sum(self.masses)here?
There was a problem hiding this comment.
Also fails the tests
There was a problem hiding this comment.
May be for updating atomgroups. Ok.
There was a problem hiding this comment.
Sorry, this one works, I was testing on an old iteration of the code
| ) | ||
|
|
||
| def _single_frame(self): | ||
| # Get masses and charges for the selection |
There was a problem hiding this comment.
You added all the mass/charge extraction to __init__ which makes sense to me. Can you then remove the code here that's now in init? Neither masses nor charges should change during the simulation.
You should then also move the self.totalmass = np.sum(self.masses) line to init for completeness.
Can you please check that this will still work and pass the tests? (Or do you see a problem arising by doing this?)
There was a problem hiding this comment.
No, it doesn't pass the tests :(
There was a problem hiding this comment.
Thanks for testing.
I also read the line in the versionchanged 2.2.0 LinearDensity now works with updating atom groups. — for this to work, we do need to keep the masses/charges extraction in _single_frame.
I do not quite get why the parallel analysis fails when you initialize them to None in __init__ but I think we'll go with it.
There was a problem hiding this comment.
| # Get masses and charges for the selection | |
| # Get masses and charges for the selection (e.g. UpdatingAtomGroup) |
orbeckst
left a comment
There was a problem hiding this comment.
Thanks for the fix-ups.
I looked at totalmass and it's neither used anywhere nor documented. It's also ambiguous as to what it should contain. Therefore, let's remove it completely:
- Please delete the self.totalmass line(s).
- Add a note to your versionchanged 2.10.0. "Removed undocumented and unused attribute
totalmass." - Add to CHANGELOG under Changes: "
Removed undocumented and unused attribute analysis.lineardensity.LinearDensity.totalmass (PR #5007)"
(Normally we don't remove anything without deprecation but because it's not documented (and may even lead people to using it wrongly) we can just remove it.)
Thanks. Otherwise ready to merge.
|
Done, for some reason, read the docs couldn't pull the last commit. @orbeckst maybe you can re-run the read the docs build? |
orbeckst
left a comment
There was a problem hiding this comment.
Looks good. Thank you for the contribution.
(I am rerunning RTD and once all of this looks good we can merge.)
|
Congratulations @tulga-rdn , PR is merged 🎉 ! |
Fixes #4678
Changes made in this Pull Request:
MDAnalysis.analysis.lineardensity.LinearDensity). As density profiles are computed independently for each timestep, the current parallelization methods allow the calculation of the density profiles without any problems.PR Checklist
package/CHANGELOGfile updated?package/AUTHORS? (If it is not, add it!)Developers Certificate of Origin
I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5007.org.readthedocs.build/en/5007/