Replies: 1 comment
-
|
Any thoughts on this from the team, @darrachequesne? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As far as I understand, connection state recovery works like this, on a high level, when used with
redis-streams-adapter:Events are stored in a stream, which is a list where every position has a certain offset (index). When an event is delivered to a client, the client keeps track of the offset of that event. When a client reconnects, it sends its last processed offset to the server. The server checks if the offset exists in the stream. If it does, it will read all events from that offset and until the end of the stream into memory. Then it will loop over the events and for each event decide if it should be sent to this particular client (it checks the events against the rooms that the client is a member of).
And this is the key part: the stream is global meaning that it contains all events emitted, not segmented by room or anything else. So if you are a client with a last processed offset X, the server will have read and inspect all messages that have been emitted between X and the end of the stream, even if none of those will actually be sent to you. And it does this for each client in isolation; events read are not cached for other reconnections to use, they all perform their own
XRANGE {offset} +commands.Let's say we use a
maxDisconnectionDurationsetting of 30 seconds in the connection state recovery feature. This means that in the worst case, every client could force the server to read the last 30 seconds' worth of events into memory.In our scenario, during peaks we usually see up to ~3000 events per 30 seconds. You could argue this is not a particularly huge number.
Based on observations from our production environment, each event on average occupies ~360 bytes of Redis memory. So in the worst case, the Socket.io server could read 3000*360 B = ~1 MB of events into memory for each client (assuming the event takes up the same amount of memory in the application process, which might not be true but could be a reasonable simplification).
It is not hard to see how this becomes a problem if you have 1000+ active connections as they will all reconnect at the same time when you perform a deployment, reload ingress configuration, restart a host machine or similar.
We would really like to use connection state recovery, but when we have tried - even for a small portion of clients - memory spikes are huge and will easily cause the Socket.io server to run out of memory.
So what could be done about it? Perhaps the Socket.io server could stream events from the adapter rather than reading them into memory? Alternatively - and this might be easier to implement - the adapter could cache events in memory for a certain amount of time, so that simultaneous connection recoveries can share the same event objects rather than having their own copy of the stream segment?
Beta Was this translation helpful? Give feedback.
All reactions