Filter out incomplete activity datapoints#295
Conversation
|
So strange that we can't find the issues with that dropoff... |
Culprit found. You'd have to ask PRUNERS/llvm-project what they're doing differently. They accounted for half of our weekly activity at least, but suddenly dropped off. |
|
Interesting... @lee218llnl -- Pruners is in your sphere of knowledge isn't it? I popped over to https://github.com/PRUNERS/llvm-project and found more information... from the looks of it, that is a fork of the upstream llvm/llvm-project repo and the fork hasn't been updated since Sept 1 -- So that explains the drop off. That then makes me wonder... Should our activity graph not include counts from forked repositories? Could we not include them in the graph that we generate? |
Aha, that makes sense now.
If they aren't included in the activity, should they still be included in the other stats? |
|
I guess the first question response to that is "how many forks do we have across all the repos?" -- In @llnl we apparently have 16 (https://github.com/LLNL?utf8=✓&q=&type=fork&language=) not sure about other orgs |
|
Let me know if there are still specific questions about PRUNERS and LLVM, but it sounds like this is a more general issue that applies to other projects too. |
|
It's definitely worth reviewing how we want to handle LLNL to LLNL forks. (Forking between internal and external repos shouldn't cause the same problem.) In the meantime, I think this PR is still a worthwhile independent fix to the other problem introduced by the oddities of the GitHub API itself. |
IanLee1521
left a comment
There was a problem hiding this comment.
Agreed and merged. Thanks @LRWeber !
GitHub's year-long repo activity start and end points can vary by a day or two between different repos. Therefore, datapoints at the beginning and end of the period only include a fraction of the repos, causing misleading dips in the activity graph.
This fix filters out any datapoints that do not include data from the full number of repos.
Before:

After:

Corresponding Filter Logs:
Note: The expected count is 582 even though we show 584 total repos at this time; Two repos are empty and are therefore have no activity data.