[DO-NOT-MERGE] Enable tests on GHA#8452
Conversation
ee48d49 to
2f14ca3
Compare
| options: --user ${{ needs.build-image.outputs.uid }} | ||
| strategy: | ||
| fail-fast: false | ||
| max-parallel: 6 |
There was a problem hiding this comment.
note, this runs in the contributor's forked repo, it's free but has concurrency limitations, also remember that each split requires a build, larger parallel splits consume more overall time.
| include: | ||
| - comment: hdfs | ||
| modules: | ||
| -pl :hadoop-hdfs |
There was a problem hiding this comment.
IIRC, this single module takes more than 4 hours to complete (even though more than 20 tests are excluded for now), we need to improve the slow tests and use tags to split it into more groups
There was a problem hiding this comment.
okay, it actually takes 5 hours to complete (with ~30 tests excluded).
I split it into 2 groups:
- slow, ~70 classes, which single test class takes more than 60s, all of them take ~150min
- other, the rest of them
hope it can be completed in 2.5 hours next round
There was a problem hiding this comment.
@ajfabbri, given the situation, I expect we may
- exclude fewer than 200 test classes
- keep others running stable on GHA, within 3 hours
There was a problem hiding this comment.
not ready but almost here, with ~160 test suites (classes) excluded, all remaining tests run successfully in 2.5h
https://github.com/pan3793/hadoop/actions/runs/24870046901
I still need to re-run several rounds to ensure the initial test list runs stably.
@ajfabbri @steveloughran @slfan1989 would be great if you could take a look first.
There was a problem hiding this comment.
@pan3793 Will look at it when I get back. I am on vacation until tomorrow night.
There was a problem hiding this comment.
This seems fine. We have some work to do on improving tests: We need to make them faster, and completely eliminate flaky tests (ideally). This gives us an iterative way to tackle it. 👍
8a5ffe4 to
1dcdec9
Compare
2dffde6 to
30c19c0
Compare
1a874a0 to
5ee44de
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
ajfabbri
left a comment
There was a problem hiding this comment.
Made an initial review of the current Draft. I had a couple of questions, but it looks good overall.
| # | ||
|
|
||
| # hadoop-common | ||
| **/org/apache/hadoop/ipc/TestRPC.java |
There was a problem hiding this comment.
Which of these are just slow, and which tests are flaky / failures? Would be good to leave comments here and separate into sections, e.g.
# hadoop-hdfs
...
# slow
...
# flaky
for each project.
There was a problem hiding this comment.
this list does not contain slow tests, only flaky or consistently failing cases
TBH, it will take a lot of effort to classify that, but it might not be worth doing. Contributors who want to improve it are easy to classify by running the test locally a few times, fixing a flaky or consistently failing test does not have much difference IMO
There was a problem hiding this comment.
Agreed. Please add a comment at the top of the file, e.g. "Flaky and/or broken tests needing attention.". The intention is that we will fix or remove these over time. Slow tests are handled with the annotation. 👍
| include: | ||
| - comment: hdfs | ||
| modules: | ||
| -pl :hadoop-hdfs |
There was a problem hiding this comment.
This seems fine. We have some work to do on improving tests: We need to make them faster, and completely eliminate flaky tests (ideally). This gives us an iterative way to tackle it. 👍
| @@ -22,6 +22,8 @@ | |||
| import static org.junit.jupiter.api.Assertions.assertFalse; | |||
There was a problem hiding this comment.
This needs to be a separate commit. When you are ready to merge, we can do an interactive rebase to squash all commits except this one, and add a descriptive log message to this commit.
There was a problem hiding this comment.
I will split this PR into independent PRs, each of which focuses on one topic.
There was a problem hiding this comment.
Either way is fine with me.
| } | ||
|
|
||
| @test "hadoop_stop_daemon_force_kill" { | ||
| skip "Skip on GitHub Actions now" |
There was a problem hiding this comment.
Why are we disabling this? Does this affect Yetus CI? We could do a conditional skip based on an environment variable which is only set for GH actions.
There was a problem hiding this comment.
yea, need to change the condition to skip only on GHA.
| # Verify that host directories get mounted without z option | ||
| # and INFO messages get printed out | ||
| @test "start-build-env.sh (Docker without z mount option)" { | ||
| skip "Skip on GitHub Actions now" |
|
|
||
| # Verify that host directories get mounted with z option | ||
| @test "start-build-env.sh (Docker with z mount option)" { | ||
| skip "Skip on GitHub Actions now" |
| <configuration> | ||
| <forkCount>0</forkCount> | ||
| </configuration> | ||
| </plugin> |
There was a problem hiding this comment.
Interesting. I agree we should fix the underlying issue. How can we make sure we catch "silent VM crashes" on jenkins?
There was a problem hiding this comment.
It is ok if we don't have an answer, but should we start a mailing list thread, or file a JIRA? Just don't want to accidentally, silently, break a test.
| -Drequire.snappy | ||
| -Drequire.valgrind | ||
| -Dmaven.test.failure.ignore=false | ||
| -Dsurefire.rerunFailingTestsCount=2 |
There was a problem hiding this comment.
same as Jenkins, rerun will cover some flaky tests
Description of PR
This PR demonstrates how to run tests in parallel on GitHub Actions, but there are a lot of issues that need to be addressed or discussed before moving forward.
Current status:
Run the GHA workflow, if a test fails or aborts, add it to
exclude-tests.txt. Repeat until 5 consecutive successes.Current goals:
How was this patch tested?
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
No AI usage.