I was trying to make it so that we didn't have to start a background task when handling RDATA, but that is a bigger job (due to all the code in `generic_worker`). However I still think not pulling the event from the DB may help reduce some DB usage due to replication, even if most workers will simply go and pull that event from the DB later anyway.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This is a requirement for [knocking](https://github.com/matrix-org/synapse/pull/6739), and is abstracting some code that was originally used by the invite flow. I'm separating it out into this PR as it's a fairly contained change.
For a bit of context: when you invite a user to a room, you send them [stripped state events](https://matrix.org/docs/spec/server_server/unstable#put-matrix-federation-v2-invite-roomid-eventid) as part of `invite_room_state`. This is so that their client can display useful information such as the room name and avatar. The same requirement applies to knocking, as it would be nice for clients to be able to display a list of rooms you've knocked on - room name and avatar included.
The reason we're sending membership events down as well is in the case that you are invited to a room that does not have an avatar or name set. In that case, the client should use the displayname/avatar of the inviter. That information is located in the inviter's membership event.
This is optional as knocks don't really have any user in the room to link up to. When you knock on a room, your knock is sent by you and inserted into the room. It wouldn't *really* make sense to show the avatar of a random user - plus it'd be a data leak. So I've opted not to send membership events to the client here. The UX on the client for when you knock on a room without a name/avatar is a separate problem.
In essence this is just moving some inline code to a reusable store method.
There's no need for it to be in the dict as well as the events table. Instead,
we store it in a separate attribute in the EventInternalMetadata object, and
populate that on load.
This means that we can rely on it being correctly populated for any event which
has been persited to the database.
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.
We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.