-
Notifications
You must be signed in to change notification settings - Fork 70
Add src users clean to src-cli
#826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Noting from @mike-r-mclaughlin that there should be a flag to bypass the scanln safety check, so users can run this programmatically ✅ |
|
@DaedalusG Do you actually want a review now? The PR description seems to imply we have more work to do on it. |
unknwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a bit of TODOs, could you collect them to a GitHub issue and be linked from these TODOs?
|
Please re-request review once you think it's ready again 😁 |
|
Hey @unknwon @mrnugget addressed many of your review comments here. The current output from the command is as below: |
cmd/src/users_clean.go
Outdated
| query Users($first: Int, $query: String) { | ||
| users(first: $first, query: $query) { | ||
| nodes { | ||
| ...UserFields | ||
| } | ||
| } | ||
| } | ||
| ` + userFragment | ||
|
|
||
| // get users to delete | ||
| var result struct { | ||
| Users struct { | ||
| Nodes []User | ||
| } | ||
| } | ||
| if ok, err := client.NewRequest(query, nil).Do(ctx, &result); err != nil || !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell you don't use $first and $query so I'd remove them.
But more importantly: don't you need to handle pagination here? Does this request always return all users? Is that okay?
Taking a step back, I think the better approach would be to extend the GraphQL API and add an activeSince: date parameter to the users query, so it only returns users that have been active since the given that. That should be easy (he naively said) to query on the backend and it would return the data being sent to src-cli and you could save yourself the client-side filtering. If you want, I can help you do this by pairing with you on it. I don't think it's necessary for a first version of src users clean to land though, but I'd highly recommend it.
Edit: I just checked on S2 with this query and it feels like there's an N+1 query on the backend, which makes this slow and might make the backend changes required on large instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrnugget this is great feedback. This request does always return all users, and that makes sense because we need to check every users activity for this. That being said I'd be way down to learn about altering the graphQL schema -- and just generally exploring the n+1 problem for the backend. This does need to be stable on very large instances as its being developed in direct relation to requests from such orgs 👍 My naive understanding is that the client side computation shouldn't be too big a deal though, since its just basic dates and arithmetic?
Will throw some time on to discuss with you further 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My naive understanding is that the client side computation shouldn't be too big a deal though, since its just basic dates and arithmetic?
True. The computation here is not the problem. It would be totally fine to compute this locally if you already had the data locally.
In this case, though, you first need to fetch the data. And if possible it's good to reduce amount of data being sent over the wire. Especially here, since you're loading all users just to filter them (imagine a scenario in which you'd want to compute "average time since last activity" - for that you'd need to fetch all users locally).
On top of that: that particular query is slow, because I assume it's doing an N+1 query. The 1 query: it loads all users. The N queries: load usageStatistics for each users. So on an instance with 5000 users you end up running 5001 SQL queries, just to fetch the list of users and their usageStatistics. (I haven't looked at the code yet, so might be completely off).
Try running this query on sourcegraph.com and see how long it takes.
We can fix the N+1. That might make the request faster. But I'd still think it'd nicer to do the filtering on the server and send less data. The database would have to do less work, the internet would have to transfer less data, the client would need to do less.
Let's use an extreme example to kinda highlight the pattern: say you want to build something like src download-archives -repo-has-file=horse.txt that downloads ZIP archives of all repositories that contain horse.txt files. You could download all archives of all repositories and then check them locally and throw away the rest. Even ignoring that it'd take up disk space locally, that feels a bit wasteful, right? So instead we can move the compution to the data (instead of moving data to the computation) and ask our server to find repositories that contain this horse.txt file (since the server already has the files on disk) and only return those.
unknwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @mrnugget's other comment on pushing computation to the backend, but I think could be a followup PR if you commit 😁
Co-authored-by: Thorsten Ball <[email protected]>
Co-authored-by: Thorsten Ball <[email protected]>
Co-authored-by: Thorsten Ball <[email protected]>
Co-authored-by: Joe Chen <[email protected]>
c3940eb to
3065b9f
Compare
* register users clean * added usage statistics to User Node type * mark * define worker function for user removal * before implementing time.Parse(time.RFC3339, payload.UsageStatistics.LastActiveTime) * now computes time since last active * added utility function structure * initialize array to store users to be deleted * working query to remove users and flags * added a user verification to command * corrects logic around removeNeverActive flag * better warning messaging * formating warning * add flag to skip verify check * commented out placeholder code and added TODO comments * addressed many review concerns, added lower bound on -days flag, made better table * Update cmd/src/users.go Co-authored-by: Thorsten Ball <[email protected]> * Update cmd/src/users_clean.go Co-authored-by: Thorsten Ball <[email protected]> * Update cmd/src/users_clean.go Co-authored-by: Thorsten Ball <[email protected]> * correct variable naming bug * remove unused params in get users query * Update cmd/src/users.go Co-authored-by: Joe Chen <[email protected]> * admins must be explcitly removed * commit dependencies * camel case * ensure clean doesnt clean the user issuing the command Co-authored-by: Thorsten Ball <[email protected]> Co-authored-by: Joe Chen <[email protected]>
* register users clean * added usage statistics to User Node type * mark * define worker function for user removal * before implementing time.Parse(time.RFC3339, payload.UsageStatistics.LastActiveTime) * now computes time since last active * added utility function structure * initialize array to store users to be deleted * working query to remove users and flags * added a user verification to command * corrects logic around removeNeverActive flag * better warning messaging * formating warning * add flag to skip verify check * commented out placeholder code and added TODO comments * addressed many review concerns, added lower bound on -days flag, made better table * Update cmd/src/users.go Co-authored-by: Thorsten Ball <[email protected]> * Update cmd/src/users_clean.go Co-authored-by: Thorsten Ball <[email protected]> * Update cmd/src/users_clean.go Co-authored-by: Thorsten Ball <[email protected]> * correct variable naming bug * remove unused params in get users query * Update cmd/src/users.go Co-authored-by: Joe Chen <[email protected]> * admins must be explcitly removed * commit dependencies * camel case * ensure clean doesnt clean the user issuing the command Co-authored-by: Thorsten Ball <[email protected]> Co-authored-by: Joe Chen <[email protected]>
This draft PR tracks work on registering a new
src userssubcommandsrc users cleanwhich allows admins to remove users in bulk given some conditions like admin status and days since last active use registered.https://github.com/sourcegraph/sourcegraph/issues/40102
The command is still in development at the time of publishing this draft, it seems near MVP though
Two Additional nice to have features would be:
-send-emailflagPlease check carefully, I'm new at this!
Test plan
Manually test each command option against
cse-awsandcse-k8sOpen to additional testing