Multi tier database start/stop operations#4081
Conversation
64b44e9 to
6a9ff90
Compare
8277be8 to
70fdb28
Compare
70fdb28 to
cf93a8b
Compare
|
@arbulu89 I am not going to be able to test this until past Eastern. If you are confident that things are working overall fine, go ahead and merge it and I will run a battery of tests from main when I get back. |
nelsonkopliku
left a comment
There was a problem hiding this comment.
Hey Xabi, I just have a couple of questions.
For the rest looks good.
| runningOperations, | ||
| database.id, | ||
| DATABASE_START, | ||
| matchesSite(null) |
There was a problem hiding this comment.
question: matchesSite(null) means "a database instance that doesn't seem to be on any site"
There was a problem hiding this comment.
Partially true XD.
Basically, we need to this for the new Stop/start of full database, as we are not providing the site (better said, we are given null as site). This way the spinning icon on the operations button will be only in the top operations button.
The database instance can be sited, but we don't care, as we the operation applies to all sites.
Anyway, I said partially as it covers the scenario where system replication is not enabled that you mentioned.
| disabled={!someHostHeartbeatPassing} | ||
| disabledTooltip={operationNotAllowedMsg} |
There was a problem hiding this comment.
question: help me remember, why are we removing hasSystemReplication from the check?
Is it because it would be caught by the API and eventually forbid the operation anyway?
There was a problem hiding this comment.
Until now, in the databases view we had 2 ways of doing operations.
- Site operations: the operation applies to a specific site and the button is in the site table
- Operations for databases without system replication: As we didn't have site tables here, to start/stop a database we needed to use this button with the affected code change, but only it made sense in scenarios where system replication was not configured. That's why it was disabled as well.
Now, we can start/stop complete databases with system replication, so the condition must go away as we reuse this buttom.
| |> Enum.group_by(& &1.system_replication_site) | ||
| |> Enum.reject(fn {_site, grouped_instances} -> | ||
| Enum.any?(grouped_instances, fn %DatabaseInstanceReadModel{ | ||
| host: %HostReadModel{heartbeat: heartbeat} | ||
| } -> | ||
| heartbeat == :passing | ||
| end) | ||
| end) | ||
| |> Enum.map(fn {site, _} -> site end) |
There was a problem hiding this comment.
thought: at the end we want a unique list of sites with all the instances in that site not passing.
Would it work if we swapped to the following?
instances
|> filter_by_site(params)
|> reject all instances with passing heartbeat
|> uniq by site
it might avoid us grouping by site and ungrouping later.
There was a problem hiding this comment.
thought: at the end we want a unique list of sites with all the instances in that site not passing.
Correct.
Would it work if we swapped to the following?
But if I reject before grouping, how do I know if there is some passing host in that site?
For example:
I have:
- Site1: host1 passing, host 2 critical
- Site2: host3 passing, host4 critical
If I reject all instances with passing, i would have [host 2, host 4]. And here, unique sites would be Site1, Site2. But in my case, I should have no sites, as each of them have a passing host as well.
I think doing some grouping is mandatory at some point
PD:
At the end, this is the reverse logic of what a "positive" filtering would be, but as I wanted to have the list of sites without passing heartbeat, we have the reject.
Initially I had:
instances
|> filter_by_site
|> group_by_site
# look if every site has at least one passing host
|> Enum.all?(fn {_site, grouped_instances} ->
Enum.any?(grouped_instances, fn %DatabaseInstanceReadModel{
host: %HostReadModel{heartbeat: heartbeat}
} ->
heartbeat == :passing
end)
end)
There was a problem hiding this comment.
I think I got you, thanks!
Description
Enalbe multi-tier database start/stop operations.
system_replication_tiervalue to wanda during operation request. We just need to send one instance per site, that's why theuniq_byDid you add the right label?
How was this tested?
UT and manual testing
@abravosuse I did a bunch of tests with a real scale up cluster, but feel free to use the env image to test it yourself (update wanda to latest version)
Did you update the documentation?
Pending to update user facing docs