Skip to content

Conversation

@DaedalusG
Copy link
Contributor

This draft PR tracks work on registering a new src users subcommand src users clean which allows admins to remove users in bulk given some conditions like admin status and days since last active use registered.
https://github.com/sourcegraph/sourcegraph/issues/40102

The command is still in development at the time of publishing this draft, it seems near MVP though

Two Additional nice to have features would be:

  • output json file containing users removed and some data on the users (for use to send emails to removed users mostly)
  • open an editor to compose general email to be sent to removed users and send. Functionality behind -send-email flag

Please check carefully, I'm new at this!

Test plan

Manually test each command option against cse-aws and cse-k8s

Open to additional testing

@DaedalusG
Copy link
Contributor Author

DaedalusG commented Aug 18, 2022

Noting from @mike-r-mclaughlin that there should be a flag to bypass the scanln safety check, so users can run this programmatically ✅

@unknwon
Copy link
Contributor

unknwon commented Aug 30, 2022

@DaedalusG Do you actually want a review now? The PR description seems to imply we have more work to do on it.

@DaedalusG
Copy link
Contributor Author

Yes please @unknwon, I need to add a lower bound for the -d flag after talking with @ryphil but aside for some styling I think this is about ready to merge 👍

Copy link
Contributor

@unknwon unknwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of TODOs, could you collect them to a GitHub issue and be linked from these TODOs?

@unknwon
Copy link
Contributor

unknwon commented Sep 1, 2022

Please re-request review once you think it's ready again 😁

@DaedalusG
Copy link
Contributor Author

Hey @unknwon @mrnugget addressed many of your review comments here. The current output from the command is as below:

warrengifford@Warrens-MacBook-Pro cmd % src users clean
Users to remove from instance at https://cse-k8s.sgdev.org
╭───────────────────┬──────────────────────────────────┬────────────────────────╮
│ USERNAME          │ EMAIL                            │ DAYS SINCE LAST ACTIVE │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ adeola            │ [email protected]           │                    300 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ nonso             │ [email protected]     │                    160 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ sourcegraph       │ [email protected]  │                    273 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ g-bot             │ [email protected]           │                    253 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ Keegan            │ [email protected]           │                    211 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ testAtSourcegraph │ [email protected] │                    224 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ Malo              │ [email protected]             │                    191 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ ErikSeliger       │ [email protected]             │                    162 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ support           │                                  │                    103 │
╰───────────────────┴──────────────────────────────────┴────────────────────────╯
Do you  wish to proceed with user removal [y/N]: n
Aborting removal
warrengifford@Warrens-MacBook-Pro cmd % src users clean -noAdmin
Users to remove from instance at https://cse-k8s.sgdev.org
╭───────────────────┬──────────────────────────────────┬────────────────────────╮
│ USERNAME          │ EMAIL                            │ DAYS SINCE LAST ACTIVE │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ sourcegraph       │ [email protected]  │                    273 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ g-bot             │ [email protected]           │                    253 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ testAtSourcegraph │ [email protected] │                    224 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ Malo              │ [email protected]             │                    191 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ ErikSeliger       │ [email protected]             │                    162 │
├───────────────────┼──────────────────────────────────┼────────────────────────┤
│ support           │                                  │                    103 │
╰───────────────────┴──────────────────────────────────┴────────────────────────╯
Do you  wish to proceed with user removal [y/N]: n
Aborting removal
warrengifford@Warrens-MacBook-Pro cmd % src users clean -days 1
-days flag must be set to 60 or greater
warrengifford@Warrens-MacBook-Pro cmd % src users clean -days 1000 -force
warrengifford@Warrens-MacBook-Pro cmd %

Comment on lines 54 to 69
query Users($first: Int, $query: String) {
users(first: $first, query: $query) {
nodes {
...UserFields
}
}
}
` + userFragment

// get users to delete
var result struct {
Users struct {
Nodes []User
}
}
if ok, err := client.NewRequest(query, nil).Do(ctx, &result); err != nil || !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell you don't use $first and $query so I'd remove them.

But more importantly: don't you need to handle pagination here? Does this request always return all users? Is that okay?

Taking a step back, I think the better approach would be to extend the GraphQL API and add an activeSince: date parameter to the users query, so it only returns users that have been active since the given that. That should be easy (he naively said) to query on the backend and it would return the data being sent to src-cli and you could save yourself the client-side filtering. If you want, I can help you do this by pairing with you on it. I don't think it's necessary for a first version of src users clean to land though, but I'd highly recommend it.

Edit: I just checked on S2 with this query and it feels like there's an N+1 query on the backend, which makes this slow and might make the backend changes required on large instances.

Copy link
Contributor Author

@DaedalusG DaedalusG Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrnugget this is great feedback. This request does always return all users, and that makes sense because we need to check every users activity for this. That being said I'd be way down to learn about altering the graphQL schema -- and just generally exploring the n+1 problem for the backend. This does need to be stable on very large instances as its being developed in direct relation to requests from such orgs 👍 My naive understanding is that the client side computation shouldn't be too big a deal though, since its just basic dates and arithmetic?

Will throw some time on to discuss with you further 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My naive understanding is that the client side computation shouldn't be too big a deal though, since its just basic dates and arithmetic?

True. The computation here is not the problem. It would be totally fine to compute this locally if you already had the data locally.

In this case, though, you first need to fetch the data. And if possible it's good to reduce amount of data being sent over the wire. Especially here, since you're loading all users just to filter them (imagine a scenario in which you'd want to compute "average time since last activity" - for that you'd need to fetch all users locally).

On top of that: that particular query is slow, because I assume it's doing an N+1 query. The 1 query: it loads all users. The N queries: load usageStatistics for each users. So on an instance with 5000 users you end up running 5001 SQL queries, just to fetch the list of users and their usageStatistics. (I haven't looked at the code yet, so might be completely off).

Try running this query on sourcegraph.com and see how long it takes.

We can fix the N+1. That might make the request faster. But I'd still think it'd nicer to do the filtering on the server and send less data. The database would have to do less work, the internet would have to transfer less data, the client would need to do less.

Let's use an extreme example to kinda highlight the pattern: say you want to build something like src download-archives -repo-has-file=horse.txt that downloads ZIP archives of all repositories that contain horse.txt files. You could download all archives of all repositories and then check them locally and throw away the rest. Even ignoring that it'd take up disk space locally, that feels a bit wasteful, right? So instead we can move the compution to the data (instead of moving data to the computation) and ask our server to find repositories that contain this horse.txt file (since the server already has the files on disk) and only return those.

Copy link
Contributor

@unknwon unknwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @mrnugget's other comment on pushing computation to the backend, but I think could be a followup PR if you commit 😁

@DaedalusG DaedalusG enabled auto-merge (squash) September 8, 2022 20:41
@DaedalusG DaedalusG merged commit 8d1c247 into main Sep 8, 2022
@DaedalusG DaedalusG deleted the users-clean branch September 8, 2022 20:43
eseliger pushed a commit that referenced this pull request Sep 8, 2022
* register users clean

* added usage statistics to User Node type

* mark

* define worker function for user removal

* before implementing time.Parse(time.RFC3339, payload.UsageStatistics.LastActiveTime)

* now computes time since last active

* added utility function structure

* initialize array to store users to be deleted

* working query to remove users and flags

* added a user verification to command

* corrects logic around removeNeverActive flag

* better warning messaging

* formating warning

* add flag to skip verify check

* commented out placeholder code and added TODO comments

* addressed many review concerns, added lower bound on -days flag, made better table

* Update cmd/src/users.go

Co-authored-by: Thorsten Ball <[email protected]>

* Update cmd/src/users_clean.go

Co-authored-by: Thorsten Ball <[email protected]>

* Update cmd/src/users_clean.go

Co-authored-by: Thorsten Ball <[email protected]>

* correct variable naming bug

* remove unused params in get users query

* Update cmd/src/users.go

Co-authored-by: Joe Chen <[email protected]>

* admins must be explcitly removed

* commit dependencies

* camel case

* ensure clean doesnt clean the user issuing the command

Co-authored-by: Thorsten Ball <[email protected]>
Co-authored-by: Joe Chen <[email protected]>
scjohns pushed a commit that referenced this pull request Apr 24, 2023
* register users clean

* added usage statistics to User Node type

* mark

* define worker function for user removal

* before implementing time.Parse(time.RFC3339, payload.UsageStatistics.LastActiveTime)

* now computes time since last active

* added utility function structure

* initialize array to store users to be deleted

* working query to remove users and flags

* added a user verification to command

* corrects logic around removeNeverActive flag

* better warning messaging

* formating warning

* add flag to skip verify check

* commented out placeholder code and added TODO comments

* addressed many review concerns, added lower bound on -days flag, made better table

* Update cmd/src/users.go

Co-authored-by: Thorsten Ball <[email protected]>

* Update cmd/src/users_clean.go

Co-authored-by: Thorsten Ball <[email protected]>

* Update cmd/src/users_clean.go

Co-authored-by: Thorsten Ball <[email protected]>

* correct variable naming bug

* remove unused params in get users query

* Update cmd/src/users.go

Co-authored-by: Joe Chen <[email protected]>

* admins must be explcitly removed

* commit dependencies

* camel case

* ensure clean doesnt clean the user issuing the command

Co-authored-by: Thorsten Ball <[email protected]>
Co-authored-by: Joe Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants