MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
Patrick Cloke	3bf973edc7	Remove unused class: DirectTcpReplicationClientFactory. (#15272 )	2023-03-15 15:42:20 -04:00
reivilibre	addd12f16d	Tweak logging for when a worker waits for its view of a replication stream to catch up. (#15120 )Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Improve logging messages for the 'wait for repl stream' read-after-write consistency feature * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Update synapse/replication/tcp/client.py Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2023-02-21 12:26:00 +00:00
David Robertson	80d44060c9	Faster joins: omit partial rooms from eager syncs until the resync completes (#14870 ) * Allow `AbstractSet` in `StrCollection` Or else frozensets are excluded. This will be useful in an upcoming commit where I plan to change a function that accepts `List[str]` to accept `StrCollection` instead. * `rooms_to_exclude` -> `rooms_to_exclude_globally` I am about to make use of this exclusion mechanism to exclude rooms for a specific user and a specific sync. This rename helps to clarify the distinction between the global config and the rooms to exclude for a specific sync. * Better function names for internal sync methods * Track a list of excluded rooms on SyncResultBuilder I plan to feed a list of partially stated rooms for this sync to ignore * Exclude partial state rooms during eager sync using the mechanism established in the previous commit * Track un-partial-state stream in sync tokens So that we can work out which rooms have become fully-stated during a given sync period. * Fix mutation of `@cached` return value This was fouling up a complement test added alongside this PR. Excluding a room would mean the set of forgotten rooms in the cache would be extended. This means that room could be erroneously considered forgotten in the future. Introduced in #12310, Synapse 1.57.0. I don't think this had any user-visible side effects (until now). * SyncResultBuilder: track rooms to force as newly joined Similar plan as before. We've omitted rooms from certain sync responses; now we establish the mechanism to reintroduce them into future syncs. * Read new field, to present rooms as newly joined * Force un-partial-stated rooms to be newly-joined for eager incremental syncs only, provided they're still fully stated * Notify user stream listeners to wake up long polling syncs * Changelog * Typo fix Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Unnecessary list cast Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Rephrase comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Another comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Fixup merge(?) * Poke notifier when receiving un-partial-stated msg over replication * Fixup merge whoops Thanks MV :) Co-authored-by: Mathieu Velen <mathieuv@matrix.org> Co-authored-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2023-01-23 15:44:39 +00:00
Sean Quah	2ec9c58496	Faster joins: Update room stats and the user directory on workers when finishing join (#14874 ) * Faster joins: Update room stats and user directory on workers when done When finishing a partial state join to a room, we update the current state of the room without persisting additional events. Workers receive notice of the current state update over replication, but neglect to wake the room stats and user directory updaters, which then get incidentally triggered the next time an event is persisted or an unrelated event persister sends out a stream position update. We wake the room stats and user directory updaters at the appropriate time in this commit. Part of #12814 and #12815. Signed-off-by: Sean Quah <seanq@matrix.org> * fixup comment Signed-off-by: Sean Quah <seanq@matrix.org>	2023-01-23 10:31:36 +00:00
Erik Johnston	0ec12a3753	Reduce max time we wait for stream positions (#14881 ) Now that we wait for stream positions whenever we do a HTTP replication hit, we need to be less brutal in the case where we do timeout (as we have bugs around this).	2023-01-20 21:04:33 +00:00
Erik Johnston	cdf2707678	Fix bug in wait for stream position (#14872 ) This caused some requests to fail. This caused some requests to fail. This really only started causing issues due to #14856	2023-01-19 22:19:56 +00:00
Erik Johnston	9187fd940e	Wait for streams to catch up when processing HTTP replication. (#14820 ) This should hopefully mitigate a class of races where data gets out of sync due a HTTP replication request racing with the replication streams.	2023-01-18 19:35:29 +00:00
Erik Johnston	316590d1ea	Fix bug in `wait_for_stream_position` (#14856 ) We were incorrectly checking if the local token had been advanced, rather than the token for the remote instance. In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).	2023-01-17 09:58:22 +00:00
Erik Johnston	2b084c5b71	Merge device list replication streams (#14833 )	2023-01-17 09:29:58 +00:00
Erik Johnston	73ff493dfb	Merge account data streams (#14826 )	2023-01-13 14:57:43 +00:00
Nick Mills-Barrett	db1cfe9c80	Update all stream IDs after processing replication rows (#14723 ) This creates a new store method, `process_replication_position` that is called after `process_replication_rows`. By moving stream ID advances here this guarantees any relevant cache invalidations will have been applied before the stream is advanced. This avoids race conditions where Python switches between threads mid way through processing the `process_replication_rows` method where stream IDs may be advanced before caches are invalidated due to class resolution ordering. See this comment/issue for further discussion: https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703	2023-01-04 11:49:26 +00:00
reivilibre	2888d7ec83	Faster remote room joins: invalidate caches and unblock requests when receiving un-partial-stated event notifications over replication. [rei:frrj/streams/unpsr] (#14546 )	2022-12-19 14:57:51 +00:00
reivilibre	9e82caac45	Faster remote room joins: unblock tasks waiting for full room state when the un-partial-stating of that room is received over the replication stream. [rei:frrj/streams/unpsr] (#14474 )	2022-12-06 15:48:42 +00:00
Shay	7b7478e8b6	Batch up notifications after event persistence (#14033 )	2022-10-05 10:12:48 -07:00
Patrick Cloke	efd108b45d	Accept & store thread IDs for receipts (implement MSC3771). (#13782 ) Updates the `/receipts` endpoint and receipt EDU handler to parse a `thread_id` from the body and insert it in the database.	2022-09-23 14:33:28 +00:00
Brendan Abolivier	8ae42ab8fa	Support enabling/disabling pushers (from MSC3881) (#13799 ) Partial implementation of MSC3881	2022-09-21 14:39:01 +00:00
Šimon Brandner	0e99f07952	Remove support for unstable private read receipts (#13653 ) Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>	2022-09-01 13:31:54 +01:00
Šimon Brandner	ab18441573	Support stable identifiers for MSC2285: private read receipts. (#13273 ) This adds support for the stable identifiers of MSC2285 while continuing to support the unstable identifiers behind the configuration flag. These will be removed in a future version.	2022-08-05 11:09:33 -04:00
David Robertson	b977867358	Rate limit joins per-room (#13276 )	2022-07-19 11:45:17 +00:00
Erik Johnston	f721f1baba	Revert "Make all `process_replication_rows` methods async (#13304 )" (#13312 ) This reverts commit `5d4028f217`.	2022-07-18 14:28:14 +01:00
Nick Mills-Barrett	5d4028f217	Make all `process_replication_rows` methods async (#13304 ) More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so). Signed off by Nick @ Beeper (@Fizzadar)	2022-07-17 22:19:43 +01:00
Patrick Cloke	cf05258f76	Remove groups replication code. (#12900 ) The replication logic for groups is no longer used, so the message passing infrastructure can be removed.	2022-05-31 13:04:08 -04:00
Andrew Morgan	83be72d76c	Add `StreamKeyType` class and replace string literals with constants (#12567 )	2022-05-16 15:35:31 +00:00
Šimon Brandner	ef86cf3d28	Update `_on_new_receipts()` to work with MSC2285 changes. (#12636 )	2022-05-05 13:25:51 +00:00
Sean Quah	800ba87cc8	Refactor and convert `Linearizer` to async (#12357 ) Refactor and convert `Linearizer` to async. This makes a `Linearizer` cancellation bug easier to fix. Also refactor to use an async context manager, which eliminates an unlikely footgun where code that doesn't immediately use the context manager could forget to release the lock. Signed-off-by: Sean Quah <seanq@element.io>	2022-04-05 15:43:52 +01:00
Patrick Cloke	3e4af36bc8	Rename get_tcp_replication to get_replication_command_handler. (#12192 ) Since the object it returns is a ReplicationCommandHandler. This is clean-up from adding support to Redis where the command handler was added as an additional layer of abstraction from the TCP protocol.	2022-03-10 13:01:56 +00:00
Erik Johnston	423cca9efe	Spread out sending device lists to remote hosts (#12132 )	2022-03-04 11:48:15 +00:00
Richard van der Hoff	e24ff8ebe3	Remove `HomeServer.get_datastore()` (#12031 ) The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733	2022-02-23 11:04:02 +00:00
Patrick Cloke	d0e78af35e	Add missing type hints to synapse.replication. (#11938 )	2022-02-08 11:03:08 -05:00
Brendan Abolivier	c7a5e49664	Implement an `on_new_event` callback (#11126 ) Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2021-10-26 15:17:36 +02:00
Patrick Cloke	f4b1a9a527	Require direct references to configuration variables. (#10985 ) This removes the magic allowing accessing configurable variables directly from the config object. It is now required that a specific configuration class is used (e.g. `config.foo` must be replaced with `config.server.foo`).	2021-10-06 10:47:41 -04:00
Patrick Cloke	01c88a09cd	Use direct references for some configuration variables (#10798 ) Instead of proxying through the magic getter of the RootConfig object. This should be more performant (and is more explicit).	2021-09-13 13:07:12 -04:00
Richard van der Hoff	d9cb658c78	Fix up type hints for Twisted 21.7 (#10490 ) Mostly this involves decorating a few Deferred declarations with extra type hints. We wrap the types in quotes to avoid runtime errors when running against older versions of Twisted that don't have generics on Deferred.	2021-07-28 12:04:11 +00:00
Šimon Brandner	c3b037795a	Support for MSC2285 (hidden read receipts) (#10413 ) Implementation of matrix-org/matrix-doc#2285	2021-07-28 10:05:11 +02:00
Jonathan de Jong	bf72d10dbf	Use inline type hints in various other places (in `synapse/`) (#10380 )	2021-07-15 11:02:43 +01:00
Richard van der Hoff	b378d98c8f	Add debug logging for issue #9533 (#9959 ) Hopefully this will help us track down where to-device messages are getting lost/delayed.	2021-05-11 11:04:03 +01:00
Erik Johnston	de0d088adc	Add presence federation stream (#9819 )	2021-04-20 14:11:24 +01:00
Erik Johnston	00a6db9676	Move some replication processing out of generic_worker (#9796 ) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2021-04-14 17:06:06 +01:00
Jonathan de Jong	4b965c862d	Remove redundant "coding: utf-8" lines (#9786 ) Part of #9744 Removes all redundant `# -- coding: utf-8 --` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`	2021-04-14 15:34:27 +01:00
Patrick Cloke	33a02f0f52	Fix additional type hints from Twisted upgrade. (#9518 )	2021-03-03 15:47:38 -05:00
Erik Johnston	a6ea1a957e	Don't pull event from DB when handling replication traffic. (#8669 ) I was trying to make it so that we didn't have to start a background task when handling RDATA, but that is a bigger job (due to all the code in `generic_worker`). However I still think not pulling the event from the DB may help reduce some DB usage due to replication, even if most workers will simply go and pull that event from the DB later anyway. Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>	2020-10-28 12:11:45 +00:00
Erik Johnston	8de3703d21	Make event persisters periodically announce position over replication. (#8499 ) Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress. This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.	2020-10-12 15:51:41 +01:00
Erik Johnston	ea70f1c362	Various clean ups to room stream tokens. (#8423 )	2020-09-29 21:48:33 +01:00
Erik Johnston	ac11fcbbb8	Add EventStreamPosition type (#8388 ) The idea is to remove some of the places we pass around `int`, where it can represent one of two things: 1. the position of an event in the stream; or 2. a token that partitions the stream, used as part of the stream tokens. The valid operations are then: 1. did a position happen before or after a token; 2. get all events that happened before or after a token; and 3. get all events between two tokens. (Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)	2020-09-24 13:24:17 +01:00
Erik Johnston	5d3e306d9f	Clean up `Notifier.on_new_room_event` code path (#8288 ) The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out when the max stream position has caught up to the event and we can notify people about it. This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens. The valid operations here are: 1. Is a position before or after a token 2. Fetching all events between two tokens 3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A or before B, then P is before C. Future PR will change the token type to a dedicated type.	2020-09-10 13:24:43 +01:00
Erik Johnston	c9dbee50ae	Fixup pusher pool notifications (#8287 ) `pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed. I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.	2020-09-09 16:56:08 +01:00
Erik Johnston	dc9dcdbd59	Revert "Fixup pusher pool notifications" This reverts commit `e7fd336a53`.	2020-09-09 16:19:22 +01:00
Erik Johnston	e7fd336a53	Fixup pusher pool notifications	2020-09-09 16:17:50 +01:00
Erik Johnston	3b4556cf87	Fix `wait_for_stream_position` for multiple waiters. (#8196 ) This fixes a bug where having multiple callers waiting on the same stream and position will cause it to try and compare two deferreds, which fails (due to the sorted list having an entry of `Tuple[int, Deferred]`).	2020-08-28 17:12:45 +01:00
Erik Johnston	84d099ae11	Fix typing replication not being handled on master (#7959 ) Handling of incoming typing stream updates from replication was not hooked up on master, effecting set ups where typing was handled on a different worker. This is really only a problem if the master process is also handling sync requests, which is unlikely for those that are at the stage of moving typing off. The other observable effect is that if a worker restarts or a replication connect drops then the typing worker will issue a `POSITION typing`, triggering master process to try and stream all typing updates from position 0. Fixes #7907	2020-07-27 14:10:53 +01:00

1 2

82 Commits (5bf9ec9e3e2c54448708f5d534aa50a68d680cc0)