Skip to content

Conversation

@benegee
Copy link
Collaborator

@benegee benegee commented Aug 25, 2023

Some experimental code to run a gtest with MPI.

The good thing is, it seems to work within our CI.

The bad thing is that it fails.
What actually fails is however instructive.

  • The number of elements is 128 instead of 256. Given the fact that 2 processes were used, this seems plausible.
  • Some computed cell average value is just 0. So it seems the solution data was not synchronized, which is plausible as I did not care about it.

@benegee benegee mentioned this pull request Aug 25, 2023
5 tasks
@sloede
Copy link
Member

sloede commented Aug 28, 2023

  • The number of elements is 128 instead of 256. Given the fact that 2 processes were used, this seems plausible.

Yes, this is good news 👍 However, it also shows that we should update this information from Trixi.jl in the AnalysisCallback to show the global number of elements, and not just the local number (which it does right now). It also shows that you need to make GTest MPI aware. Thus, you should possibly link it against MPI and initialize MPI within GTest, such that you also have access to MPI_Comm_rank, MPI_Comm_size etc. to figure out how to adjust your expectations (e.g., for the number of local elements).

Another option would be to set a special environment variable (e.g., LIBTRIXI_NUM_RANKS) and if it is set, get the information from there. This might be easier in the beginning, but eventually you will have to use actual MPI support, so maybe just go down this road immediately.

  • Some computed cell average value is just 0. So it seems the solution data was not synchronized, which is plausible as I did not care about it.

I'm not sure it's a necessarily synchronization problem: Each rank is supposed to compute the cell averages of their own rank. So when you're checking for the average in cell 929 (928 + 1) in https://github.com/trixi-framework/libtrixi/pull/85/files#diff-bf51675732cd01fd6ac5d3c6f3645b10163b540a95927b3c2b8dc15a529ee673R55, not only does the it give you the value of a different cell, but also of a different variable:

In the original case, you had 1600 entries per variable (Structure of Arrays layout), with 2 ranks you only have 800 on each rank. Thus, before you were checking the 929th density value, with 2 ranks you are checking the 129th x-velocity value.

Here's a good resource on using Google Test with MPI: https://elib.dlr.de/144193/1/Testing%20HPC%20C%2B%2B%20software%20with%20GoogleTest.pdf
Maybe this will give you some ideas how to add parallel testing with GTest?

It seems like also the t8code folks use it (see, e.g., https://github.com/DLR-AMR/t8code/blob/main/test/t8_gtest_main.cxx), so maybe this could be something to bring up at the next ADAPTEX meeting where t8code folks are in attendance?

@benegee
Copy link
Collaborator Author

benegee commented Aug 28, 2023

Thanks for the detailed analysis and the helpful hints!

@benegee
Copy link
Collaborator Author

benegee commented Aug 28, 2023

Just for reference:
https://github.com/DLR-SC/googletest_mpi
It seems a little outdated though.

@sloede
Copy link
Member

sloede commented Aug 31, 2023

I don't get the errors: For example, https://github.com/trixi-framework/libtrixi/actions/runs/6038153366/job/16383897333?pr=85#step:19:382 compares two values, 0.0028566952356658794 and 0.0028566952356658751, which have a difference of -4.336808689942018e-18. Maybe your tolerances are too tight?

@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.03% 🎉

Comparison is base (dd260de) 98.20% compared to head (e2e8102) 98.23%.

❗ Current head e2e8102 differs from pull request most recent head 28ef707. Consider uploading reports for the commit 28ef707 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   98.20%   98.23%   +0.03%     
==========================================
  Files          12       12              
  Lines         500      510      +10     
==========================================
+ Hits          491      501      +10     
  Misses          9        9              
Flag Coverage Δ
unittests 98.23% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
LibTrixi.jl/src/LibTrixi.jl 100.00% <ø> (ø)
src/api.f90 97.36% <ø> (ø)
LibTrixi.jl/src/api_c.jl 100.00% <100.00%> (ø)
LibTrixi.jl/src/api_jl.jl 98.38% <100.00%> (+0.08%) ⬆️
src/api.c 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benegee
Copy link
Collaborator Author

benegee commented Aug 31, 2023

Maybe your tolerances are too tight?

Exactly! I had used EXPECT_DOUBLE_EQ, which according to the docs

Verifies that the two double values val1 and val2 are approximately equal, to within 4 ULPs from each other.

Units in the Last Place (ULPs), I had not heard of before.

I now use EXPECT_NEAR with an absolute tolerance of 1e-14 (values to compare are in the range [0.1, 0.9])

@benegee
Copy link
Collaborator Author

benegee commented Aug 31, 2023

Julia test for the new trixi_nelemens_global was missing.
Coverage should recover.

@benegee benegee marked this pull request as ready for review August 31, 2023 15:24
@benegee benegee requested a review from sloede August 31, 2023 15:24
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions, but other than that, this looks great!

@sloede sloede enabled auto-merge (squash) August 31, 2023 16:17
@sloede sloede merged commit 734c787 into main Aug 31, 2023
@sloede sloede deleted the bg/gtest-mpi branch August 31, 2023 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants