MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
Erik Johnston	fef08cbee8	Fix sending out of order `POSITION` over replication (#16639 ) If a worker reconnects to Redis we send out the current positions of all our streams. However, if we're also trying to send out a backlog of RDATA at the same time then we can end up sending a `POSITION` with the current token before we've sent all the RDATA before the current token. This doesn't cause actual bugs as the receiving servers see the POSITION, fetch the relevant rows from the DB, and then ignore the old RDATA as they come in. However, this is inefficient so it'd be better if we didn't send out-of-order positions	2023-11-16 13:05:09 +00:00
Erik Johnston	898655fd12	More efficiently handle no-op POSITION (#16640 ) We may receive `POSITION` commands where we already know that worker has advanced past that position, so there is no point in handling it.	2023-11-16 12:32:17 +00:00
Erik Johnston	408c13801a	Add fast path for replication events stream fetch (#16580 ) We can bail early if the from token is greater than or equal to the current token.	2023-10-30 14:47:57 +00:00
Erik Johnston	5413cefe32	Reduce amount of caches POSITIONS we send (#16561 ) Follow on from / actually correctly does #16557	2023-10-27 16:07:11 +01:00
Erik Johnston	89dbbd68e1	Reduce spurious replication catchup (#16555 )	2023-10-27 13:27:20 +00:00
Erik Johnston	0680d76659	Reduce replication traffic due to reflected cache stream POSITION (#16557 )	2023-10-27 12:51:08 +01:00
Erik Johnston	ba47fea528	Allow multiple workers to write to receipts stream. (#16432 ) Fixes #16417	2023-10-25 16:16:19 +01:00
Jason Little	ffbe9b7666	Remove duplicate call to wake a remote destination when using federation sending worker (#16515 )	2023-10-24 08:09:59 -04:00
Erik Johnston	8f35f8148e	Fix bug where a new writer advances their token too quickly (#16473 ) * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted before it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position	2023-10-23 16:57:30 +01:00
Patrick Cloke	49c9745b45	Avoid sending massive replication updates when purging a room. (#16510 )	2023-10-18 12:26:01 -04:00
Patrick Cloke	ae5b997cfa	Fix comments related to replication. (#16428 )	2023-10-06 07:25:44 -04:00
Patrick Cloke	4e302b30b6	Add __slots__ to replication commands. (#16429 ) To slightly reduce the amount of memory each command takes.	2023-10-05 07:38:55 -04:00
Erik Johnston	80ec81dcc5	Some refactors around receipts stream (#16426 )	2023-10-04 16:28:40 +01:00
Erik Johnston	20fb08ec80	Downgrade repl stream time out error to warning (#16401 ) This is because if a worker reaches ~100% CPU then everything starts lagging and we hit the log line a lot. When at error we invoke sentry and that has a lot of overhead, which then puts even more pressure on the worker.	2023-09-29 11:52:48 +00:00
Patrick Cloke	f84da3c32e	Add a cache around server ACL checking (#16360 ) * Pre-compiles the server ACLs onto an object per room and invalidates them when new events come in. * Converts the server ACL checking into Rust.	2023-09-26 11:57:50 -04:00
Erik Johnston	329597022e	Some minor performance fixes for task schedular (#16313 )	2023-09-14 16:20:47 +01:00
Erik Johnston	ab13fb08bf	Improve logging of replication (#16309 )	2023-09-13 09:51:50 +00:00
Patrick Cloke	55c20da4a3	Merge remote-tracking branch 'origin/release-v1.91' into release-v1.92	2023-09-06 11:25:28 -04:00
Quentin Gliech	1940d990a3	Revert MSC3861 introspection cache, admin impersonation and account lock (#16258 )	2023-09-06 15:19:51 +01:00
Erik Johnston	d35bed8369	Don't wake up destination transaction queue if they're not due for retry. (#16223 )	2023-09-04 17:14:09 +01:00
Patrick Cloke	e9235d92f2	Track currently syncing users by device for presence (#16172 ) Refactoring to use both the user ID & the device ID when tracking the currently syncing users in the presence handler. This is done both locally and over replication. Note that the device ID is discarded but will be used in a future change.	2023-08-29 11:44:07 -04:00
Mathieu Velten	501da8ecd8	Task scheduler: add replication notify for new task to launch ASAP (#16184 )	2023-08-28 14:03:51 +00:00
Erik Johnston	803f63df1c	Fix perf of `wait_for_stream_positions` (#16148 )	2023-08-22 15:11:22 +00:00
Shay	69048f7b48	Add an admin endpoint to allow authorizing server to signal token revocations (#16125 )	2023-08-22 14:15:34 +00:00
Patrick Cloke	ad3f43be9a	Run pyupgrade for python 3.7 & 3.8. (#16110 )	2023-08-15 08:11:20 -04:00
Erik Johnston	ae55cc1e6b	Add ability to wait for locks and add locks to purge history / room deletion (#15791 ) c.f. #13476	2023-07-31 10:58:03 +01:00
Jason Little	c835befd10	Add Unix socket support for Redis connections (#15644 ) Adds a new configuration setting to connect to Redis via a Unix socket instead of over TCP. Disabled by default.	2023-05-26 15:28:39 -04:00
Patrick Cloke	375b0a8a11	Update code to refer to "workers". (#15606 ) A bunch of comments and variables are out of date and use obsolete terms.	2023-05-16 15:56:38 -04:00
Roel ter Maat	2611433b70	Add redis SSL configuration options (#15312 ) * Add SSL options to redis config * fix lint issues * Add documentation and changelog file * add missing . at the end of the changelog * Move client context factory to new file * Rename ssl to tls and fix typo * fix lint issues * Added when redis attributes were added	2023-05-11 13:02:51 +01:00
Mathieu Velten	9228ae633f	Add some clarification to the doc/comments regarding TCP replication (#15354 )	2023-03-30 12:51:35 +02:00
Patrick Cloke	afb216c202	Remove no-op send_command for Redis replication. (#15274 ) With Redis commands do not need to be re-issued by the main process (they fan-out to all processes at once) and thus it is no longer necessary to worry about them reflecting recursively forever.	2023-03-16 11:13:30 -04:00
Patrick Cloke	3bf973edc7	Remove unused class: DirectTcpReplicationClientFactory. (#15272 )	2023-03-15 15:42:20 -04:00
H. Shay	b2fd03d075	Merge branch 'master' into develop	2023-02-28 10:14:20 -08:00
Erik Johnston	b2357a898c	Fix bug where 5s delays would occasionally happen. (#15150 ) This only affects deployments using workers.	2023-02-24 14:39:50 +00:00
dependabot[bot]	9bb2eac719	Bump black from 22.12.0 to 23.1.0 (#15103 )	2023-02-22 15:29:09 -05:00
reivilibre	addd12f16d	Tweak logging for when a worker waits for its view of a replication stream to catch up. (#15120 )Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Improve logging messages for the 'wait for repl stream' read-after-write consistency feature * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Update synapse/replication/tcp/client.py Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2023-02-21 12:26:00 +00:00
David Robertson	80d44060c9	Faster joins: omit partial rooms from eager syncs until the resync completes (#14870 ) * Allow `AbstractSet` in `StrCollection` Or else frozensets are excluded. This will be useful in an upcoming commit where I plan to change a function that accepts `List[str]` to accept `StrCollection` instead. * `rooms_to_exclude` -> `rooms_to_exclude_globally` I am about to make use of this exclusion mechanism to exclude rooms for a specific user and a specific sync. This rename helps to clarify the distinction between the global config and the rooms to exclude for a specific sync. * Better function names for internal sync methods * Track a list of excluded rooms on SyncResultBuilder I plan to feed a list of partially stated rooms for this sync to ignore * Exclude partial state rooms during eager sync using the mechanism established in the previous commit * Track un-partial-state stream in sync tokens So that we can work out which rooms have become fully-stated during a given sync period. * Fix mutation of `@cached` return value This was fouling up a complement test added alongside this PR. Excluding a room would mean the set of forgotten rooms in the cache would be extended. This means that room could be erroneously considered forgotten in the future. Introduced in #12310, Synapse 1.57.0. I don't think this had any user-visible side effects (until now). * SyncResultBuilder: track rooms to force as newly joined Similar plan as before. We've omitted rooms from certain sync responses; now we establish the mechanism to reintroduce them into future syncs. * Read new field, to present rooms as newly joined * Force un-partial-stated rooms to be newly-joined for eager incremental syncs only, provided they're still fully stated * Notify user stream listeners to wake up long polling syncs * Changelog * Typo fix Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Unnecessary list cast Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Rephrase comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Another comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Fixup merge(?) * Poke notifier when receiving un-partial-stated msg over replication * Fixup merge whoops Thanks MV :) Co-authored-by: Mathieu Velen <mathieuv@matrix.org> Co-authored-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2023-01-23 15:44:39 +00:00
Sean Quah	2ec9c58496	Faster joins: Update room stats and the user directory on workers when finishing join (#14874 ) * Faster joins: Update room stats and user directory on workers when done When finishing a partial state join to a room, we update the current state of the room without persisting additional events. Workers receive notice of the current state update over replication, but neglect to wake the room stats and user directory updaters, which then get incidentally triggered the next time an event is persisted or an unrelated event persister sends out a stream position update. We wake the room stats and user directory updaters at the appropriate time in this commit. Part of #12814 and #12815. Signed-off-by: Sean Quah <seanq@matrix.org> * fixup comment Signed-off-by: Sean Quah <seanq@matrix.org>	2023-01-23 10:31:36 +00:00
reivilibre	22cc93afe3	Enable Faster Remote Room Joins against worker-mode Synapse. (#14752 ) * Enable Complement tests for Faster Remote Room Joins on worker-mode * (dangerous) Add an override to allow Complement to use FRRJ under workers * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Fix race where we didn't send out replication notification * MORE HACKS * Fix get_un_partial_stated_rooms_token to take instance_name * Fix bad merge * Remove warning * Correctly advance un_partial_stated_room_stream * Fix merge * Add another notify_replication * Fixups * Create a separate ReplicationNotifier * Fix test * Fix portdb * Create a separate ReplicationNotifier * Fix test * Fix portdb * Fix presence test * Newsfile * Apply suggestions from code review * Update changelog.d/14752.misc Co-authored-by: Erik Johnston <erik@matrix.org> * lint Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Erik Johnston <erik@matrix.org>	2023-01-22 21:10:11 +00:00
Erik Johnston	0ec12a3753	Reduce max time we wait for stream positions (#14881 ) Now that we wait for stream positions whenever we do a HTTP replication hit, we need to be less brutal in the case where we do timeout (as we have bugs around this).	2023-01-20 21:04:33 +00:00
Erik Johnston	cdf2707678	Fix bug in wait for stream position (#14872 ) This caused some requests to fail. This caused some requests to fail. This really only started causing issues due to #14856	2023-01-19 22:19:56 +00:00
Erik Johnston	9187fd940e	Wait for streams to catch up when processing HTTP replication. (#14820 ) This should hopefully mitigate a class of races where data gets out of sync due a HTTP replication request racing with the replication streams.	2023-01-18 19:35:29 +00:00
Erik Johnston	316590d1ea	Fix bug in `wait_for_stream_position` (#14856 ) We were incorrectly checking if the local token had been advanced, rather than the token for the remote instance. In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).	2023-01-17 09:58:22 +00:00
Erik Johnston	2b084c5b71	Merge device list replication streams (#14833 )	2023-01-17 09:29:58 +00:00
Erik Johnston	73ff493dfb	Merge account data streams (#14826 )	2023-01-13 14:57:43 +00:00
Nick Mills-Barrett	db1cfe9c80	Update all stream IDs after processing replication rows (#14723 ) This creates a new store method, `process_replication_position` that is called after `process_replication_rows`. By moving stream ID advances here this guarantees any relevant cache invalidations will have been applied before the stream is advanced. This avoids race conditions where Python switches between threads mid way through processing the `process_replication_rows` method where stream IDs may be advanced before caches are invalidated due to class resolution ordering. See this comment/issue for further discussion: https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703	2023-01-04 11:49:26 +00:00
reivilibre	2888d7ec83	Faster remote room joins: invalidate caches and unblock requests when receiving un-partial-stated event notifications over replication. [rei:frrj/streams/unpsr] (#14546 )	2022-12-19 14:57:51 +00:00
reivilibre	fb60cb16fe	Faster remote room joins: stream the un-partial-stating of events over replication. [rei:frrj/streams/unpsr] (#14545 )	2022-12-14 14:47:11 +00:00
reivilibre	9e82caac45	Faster remote room joins: unblock tasks waiting for full room state when the un-partial-stating of that room is received over the replication stream. [rei:frrj/streams/unpsr] (#14474 )	2022-12-06 15:48:42 +00:00
reivilibre	501f62d1a6	Faster remote room joins: stream the un-partial-stating of rooms over replication. [rei:frrj/streams/unpsr] (#14473 )	2022-12-05 13:07:55 +00:00

1 2 3 4 5 ...

320 Commits (bdb0cbc5cafb40f6d4920caac763feeeff0f60cf)