Skip to content

Beam Diagnostics is too Slow #1102

@ax3l

Description

@ax3l

The diagnostics code in reduced_beam_characteristics(pc) is too slow. In 1-MPI-rank simulations like the HTU beamline, when setting sim.particle_container().store_beam_moments = True, it is dominating the runtime by ~1.5x compared to the next costly element of the actual simulation.

TinyProfiler total time across processes [min...avg...max]: 0.02604 ... 0.02604 ... 0.02604

-------------------------------------------------------------------------------------------------------
Name                                                    NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
-------------------------------------------------------------------------------------------------------
impactx::diagnostics::reduced_beam_characteristics(pc)      91    0.01197    0.01197    0.01197  45.96%
impactx::Push::ChrQuad                                      34   0.007997   0.007997   0.007997  30.71%
impactx::Push::ExactDrift                                   33   0.001654   0.001654   0.001654   6.35%
impactx::Push::ExactSbend                                    5  0.0004234  0.0004234  0.0004234   1.63%
impactX::collect_lost_particles                             91  0.0003877  0.0003877  0.0003877   1.49%
ImpactX::evolve::slice_step                                 91  0.0003815  0.0003815  0.0003815   1.47%
ImpactX::add_particles                                       1  0.0003395  0.0003395  0.0003395   1.30%
impactx::Push::Kicker                                        8  0.0002024  0.0002024  0.0002024   0.78%
ImpactXParticleContainer::record_beam_moments               91  0.0001794  0.0001794  0.0001794   0.69%
DistributionMapping::LeastUsedCPUs()                         1  0.0001495  0.0001495  0.0001495   0.57%
ImpactX::track_particles                                     1   3.08e-05   3.08e-05   3.08e-05   0.12%
impactx::Push                                               91  1.807e-05  1.807e-05  1.807e-05   0.07%
AmrMesh::MakeDistributionMap()                               1  7.808e-06  7.808e-06  7.808e-06   0.03%
DistributionMapping::SFCProcessorMapDoIt()                   1  2.937e-06  2.937e-06  2.937e-06   0.01%
Other                                                      357  0.0001655  0.0001655  0.0001655   0.64%
-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------
Name                                                    NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
-------------------------------------------------------------------------------------------------------
ImpactX::track_particles                                     1    0.02335    0.02335    0.02335  89.69%
ImpactX::evolve::slice_step                                 91    0.02331    0.02331    0.02331  89.52%
ImpactXParticleContainer::record_beam_moments               91    0.01215    0.01215    0.01215  46.65%
impactx::diagnostics::reduced_beam_characteristics(pc)      91    0.01197    0.01197    0.01197  45.96%
impactx::Push                                               91     0.0103     0.0103     0.0103  39.56%
impactx::Push::ChrQuad                                      34   0.007999   0.007999   0.007999  30.72%
impactx::Push::ExactDrift                                   33   0.001656   0.001656   0.001656   6.36%
impactx::Push::ExactSbend                                    5  0.0004239  0.0004239  0.0004239   1.63%
ImpactX::add_particles                                       1  0.0003912  0.0003912  0.0003912   1.50%
impactX::collect_lost_particles                             91  0.0003877  0.0003877  0.0003877   1.49%
impactx::Push::Kicker                                        8   0.000203   0.000203   0.000203   0.78%
AmrMesh::MakeDistributionMap()                               1  0.0001608  0.0001608  0.0001608   0.62%
DistributionMapping::SFCProcessorMapDoIt()                   1   0.000153   0.000153   0.000153   0.59%
DistributionMapping::LeastUsedCPUs()                         1  0.0001495  0.0001495  0.0001495   0.57%
Other                                                      357  0.0001655  0.0001655  0.0001655   0.64%
-------------------------------------------------------------------------------------------------------

I think that amrex::ParticleReduce is OpenMP parallelized over particle tiles, but maybe it is not working or can be optimized?

Additionally can some operations be vectorized on CPU that are not auto-vectorized?

Or do we just calculate/reduce way too many variables (currently: two full-Np reductions with the 2nd one on 22 variables) and need to introduce a more fine-tuned approach, as we do for optionally calculating the (costly) eigenemittances?

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions