Skip to content

Fix a memory leak in trajectory sampler#4057

Merged
mathomp4 merged 7 commits intodevelopfrom
feature/ygyu/bugfix_redist
Oct 14, 2025
Merged

Fix a memory leak in trajectory sampler#4057
mathomp4 merged 7 commits intodevelopfrom
feature/ygyu/bugfix_redist

Conversation

@metdyn
Copy link
Contributor

@metdyn metdyn commented Sep 15, 2025

Types of change(s)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Trivial change (affects only documentation or cleanup)
  • Refactor (no functional changes, no api changes)

Checklist

  • Tested this change with a run of GEOSgcm
  • Ran the Unit Tests (make tests)

Description

Thanks to @atrayano for finding a bug in trajectory sampler code that is causing a big memory leakage. It shows up when repeatedly working on destroy grid, regenerate grid, regrid, then redist. Running ExtDadaDriver with synthetic data created an output on airs_aqua trajectory. A slice of 3D data shows reasonable value.

Screenshot 2025-09-15 at 2 37 36 PM

Related Issue

@metdyn metdyn requested a review from a team as a code owner September 15, 2025 20:40
@metdyn metdyn added the 🪲 Bugfix This fixes a bug! label Sep 15, 2025
tclune
tclune previously approved these changes Sep 16, 2025
Copy link
Collaborator

@tclune tclune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this would be a memory leak. Yes, Regrid does more work than Redist, but both create a RouteHandle.

@mathomp4
Copy link
Member

@metdyn I fixed up CHANGELOG.md as your entry was in the 2.60 part rather than unreleased.

@mathomp4
Copy link
Member

Query for @metdyn and @atrayano: I looked at MAPL and I see other places we do an ESMF_FieldRedist:

gridcomps/History/Sampler/MAPL_MaskMod_smod.F90
451:       call ESMF_FieldRedistStore (fieldA, fieldB, RH, _RC)
454:       call ESMF_FieldRedist      (fieldA, fieldB, RH, _RC)
460:       call ESMF_FieldRedist      (fieldA, fieldB, RH, _RC)
474:       call ESMF_FieldRedistRelease(RH, noGarbage=.true., _RC)

gridcomps/History/Sampler/MAPL_TrajectoryMod_smod.F90
1196:         call ESMF_FieldRedistStore (this%fieldA, this%fieldB, RH, _RC)
1197:         call ESMF_FieldRedist      (this%fieldA, this%fieldB, RH, _RC)
1199:         call ESMF_FieldRedistRelease(RH, noGarbage=.true., _RC)
1300:         call ESMF_FieldRedistStore(src_field,chunk_field,RH,_RC)
1369:                  call ESMF_FieldRedist( acc_field,  acc_field_2d_chunk, RH, _RC )
1557:         call ESMF_FieldRedistRelease(RH, noGarbage=.true., _RC)

gridcomps/History/Sampler/MAPL_StationSamplerMod.F90
419:    call ESMF_FieldRedistStore(src_field,chunk_field,this%RH,_RC)
620:             call ESMF_FieldRedist (field_ds_2d, field_chunk_2d, this%RH, _RC )
649:             call ESMF_FieldRedist (new_dst_field, field_chunk_3d, this%RH, _RC)

In the first two, we do Store→Redist→Release, but in the MAPL_StationSamplerMod.F90 there is no Release. Should there be? Or is that Release being "taken care of" in MaskMod or TrajectoryMod?

@mathomp4
Copy link
Member

Also, in MAPL_StationSamplerMod.F90 there might be a %destroy missing:

History/Sampler/MAPL_TrajectoryMod_smod.F90
545:         this%regridder = LocStreamRegridder(grid,this%LS_ds,_RC)
1649:                    call this%regridder%regrid(p_src_2d,p_dst_2d,_RC)
1664:                       call this%regridder%regrid(p_new_lev,p_dst_3d,_RC)
1669:                       call this%regridder%regrid(p_src_3d,p_dst_3d,_RC)
1700:           call this%regridder%destroy(_RC)

History/Sampler/MAPL_StationSamplerMod.F90
409:    this%regridder = LocStreamRegridder(grid,this%LS_ds,_RC)
619:             call ESMF_FieldRegrid (src_field, field_ds_2d, this%regridder%route_handle, _RC)
643:             call ESMF_FieldRegrid (new_src_field, new_dst_field, this%regridder%route_handle, _RC)

In Trajectory we seem to do create → regrid → destroy on %regridder. But in Station it is more like create → use → nothing

@metdyn
Copy link
Contributor Author

metdyn commented Sep 17, 2025

Hi @mathomp4,
Thank you for checking the code, you are right on each of them.
We have five sampler types and each of them in each of the individual files are independent and does not mix using RH.

Thanks for pointing out the incomplete cycle in station sampler:
call ESMF_FieldRedistRelease
call this%regridder%destroy
I will create a finalize subroutine and add them in. station_samper%fianlize will be called by the MAPL_HistoryGridComp.F90: subroutine Finalize.

Lastly but not the least, station sample does not vary with time, thus memory did not leak in this case, luckily.

@tclune
Copy link
Collaborator

tclune commented Sep 17, 2025

I wanted @atrayano to be aware that a PR is coming.

@metdyn
Copy link
Contributor Author

metdyn commented Oct 8, 2025

@atrayano,
I see I missed to release another RH:
https://github.com/GEOS-ESM/MAPL/blob/develop/gridcomps/History/Sampler/MAPL_MaskMod_smod.F90#L503-L504
It is in the mask sampler, not in the trajectory sampler, though (less impactful to memory leakage).

@metdyn metdyn added the 0 Diff Trivial The changes in this pull request are trivially zero-diff (documentation, build failure, &c.) label Oct 14, 2025
@metdyn
Copy link
Contributor Author

metdyn commented Oct 14, 2025

Hi @mathomp4,
This PR has been here for long. The one line revision in trajectory sampler found by @atrayano is essential. The line changes such as destroy RH in station sampler is now added here, but it does not contribute to memory change, since grids for Aeronet and GHCND are time independent and the route handle annihilation will be automatic at the finalize step in history grid component. Hence this PR does not affect others' activity. And I have tested the output for five samplers types on Mac. I donot see an anomaly in lat/lon coordinates or interpolated field variables. Hence I feel this PR can be merged.

@mathomp4 mathomp4 added 0 Diff The changes in this pull request have verified to be zero-diff with the target branch. and removed 0 Diff Trivial The changes in this pull request are trivially zero-diff (documentation, build failure, &c.) labels Oct 14, 2025
@mathomp4
Copy link
Member

@metdyn Okay. If it passes the CI tests, I'll merge into develop.

@mathomp4
Copy link
Member

Also, @metdyn, do you need a release for this? Or is working from develop okay for now?

@mathomp4 mathomp4 merged commit fe5b5ca into develop Oct 14, 2025
53 of 54 checks passed
@mathomp4 mathomp4 deleted the feature/ygyu/bugfix_redist branch October 14, 2025 19:11
@mathomp4 mathomp4 added the 😖 Shutdown This occurred during shutdown label Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0 Diff The changes in this pull request have verified to be zero-diff with the target branch. 🪲 Bugfix This fixes a bug! 😖 Shutdown This occurred during shutdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants