Skip to content

Filter out incomplete activity datapoints#295

Merged
IanLee1521 merged 1 commit into
masterfrom
activityGraphFix
Oct 18, 2019
Merged

Filter out incomplete activity datapoints#295
IanLee1521 merged 1 commit into
masterfrom
activityGraphFix

Conversation

@LRWeber
Copy link
Copy Markdown
Member

@LRWeber LRWeber commented Oct 16, 2019

GitHub's year-long repo activity start and end points can vary by a day or two between different repos. Therefore, datapoints at the beginning and end of the period only include a fraction of the repos, causing misleading dips in the activity graph.

This fix filters out any datapoints that do not include data from the full number of repos.

Before:
ActivityBefore

After:
ActivityAfter

Corresponding Filter Logs:

line_repoActivity.js:227 Repo count mismatch for activity on 2018-10-01: expected 582, found 14
line_repoActivity.js:227 Repo count mismatch for activity on 2018-10-08: expected 582, found 50
line_repoActivity.js:227 Repo count mismatch for activity on 2019-10-07: expected 582, found 568
line_repoActivity.js:227 Repo count mismatch for activity on 2019-10-14: expected 582, found 532

Note: The expected count is 582 even though we show 584 total repos at this time; Two repos are empty and are therefore have no activity data.

@LRWeber LRWeber requested review from a team and IanLee1521 October 16, 2019 23:29
@IanLee1521
Copy link
Copy Markdown
Collaborator

So strange that we can't find the issues with that dropoff...

@LRWeber
Copy link
Copy Markdown
Member Author

LRWeber commented Oct 18, 2019

So strange that we can't find the issues with that dropoff...

Culprit found. You'd have to ask PRUNERS/llvm-project what they're doing differently. They accounted for half of our weekly activity at least, but suddenly dropped off.
{ "total": 626, "week": "2019-08-26" }, { "total": 10, "week": "2019-09-02" }
https://github.com/PRUNERS/llvm-project/graphs/commit-activity

@IanLee1521
Copy link
Copy Markdown
Collaborator

Interesting... @lee218llnl -- Pruners is in your sphere of knowledge isn't it?

I popped over to https://github.com/PRUNERS/llvm-project and found more information... from the looks of it, that is a fork of the upstream llvm/llvm-project repo and the fork hasn't been updated since Sept 1 -- So that explains the drop off.

That then makes me wonder... Should our activity graph not include counts from forked repositories? Could we not include them in the graph that we generate?

@LRWeber
Copy link
Copy Markdown
Member Author

LRWeber commented Oct 18, 2019

I popped over to https://github.com/PRUNERS/llvm-project and found more information... from the looks of it, that is a fork of the upstream llvm/llvm-project repo and the fork hasn't been updated since Sept 1 -- So that explains the drop off.

Aha, that makes sense now.

That then makes me wonder... Should our activity graph not include counts from forked repositories? Could we not include them in the graph that we generate?

If they aren't included in the activity, should they still be included in the other stats?
I'd imagine there are certainly cases in which forks are legitimately active/extended projects in their own right, but it is strange to have some of them essentially multiplying certain statistics. (Activity is likely the biggest issue, but they could also affect repo counts for internal and external contributions, and ratios for licenses, languages, and topics.)

@IanLee1521
Copy link
Copy Markdown
Collaborator

I guess the first question response to that is "how many forks do we have across all the repos?" -- In @llnl we apparently have 16 (https://github.com/LLNL?utf8=✓&q=&type=fork&language=) not sure about other orgs

@lee218llnl
Copy link
Copy Markdown
Contributor

Let me know if there are still specific questions about PRUNERS and LLVM, but it sounds like this is a more general issue that applies to other projects too.

@LRWeber
Copy link
Copy Markdown
Member Author

LRWeber commented Oct 18, 2019

It's definitely worth reviewing how we want to handle LLNL to LLNL forks. (Forking between internal and external repos shouldn't cause the same problem.)
I added an issue to continue tracking. #300

In the meantime, I think this PR is still a worthwhile independent fix to the other problem introduced by the oddities of the GitHub API itself.

Copy link
Copy Markdown
Collaborator

@IanLee1521 IanLee1521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and merged. Thanks @LRWeber !

@IanLee1521 IanLee1521 merged commit 53aff0e into master Oct 18, 2019
@LRWeber LRWeber deleted the activityGraphFix branch October 18, 2019 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants