MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
David Robertson	c3627d0f99	Correctly read to-device stream pos on SQLite (#16682 )	2023-11-24 13:42:38 +00:00
Erik Johnston	6fec2d035f	Also discard 'caches' and 'backfill' stream POSITIONS (#16655 ) Follow on from #16640	2023-11-17 14:14:29 +00:00
Erik Johnston	fef08cbee8	Fix sending out of order `POSITION` over replication (#16639 ) If a worker reconnects to Redis we send out the current positions of all our streams. However, if we're also trying to send out a backlog of RDATA at the same time then we can end up sending a `POSITION` with the current token before we've sent all the RDATA before the current token. This doesn't cause actual bugs as the receiving servers see the POSITION, fetch the relevant rows from the DB, and then ignore the old RDATA as they come in. However, this is inefficient so it'd be better if we didn't send out-of-order positions	2023-11-16 13:05:09 +00:00
Erik Johnston	898655fd12	More efficiently handle no-op POSITION (#16640 ) We may receive `POSITION` commands where we already know that worker has advanced past that position, so there is no point in handling it.	2023-11-16 12:32:17 +00:00
Erik Johnston	408c13801a	Add fast path for replication events stream fetch (#16580 ) We can bail early if the from token is greater than or equal to the current token.	2023-10-30 14:47:57 +00:00
Erik Johnston	8c63e93286	Fix HTTP repl response to use minimum token (#16578 )	2023-10-30 12:27:14 +00:00
Erik Johnston	5413cefe32	Reduce amount of caches POSITIONS we send (#16561 ) Follow on from / actually correctly does #16557	2023-10-27 16:07:11 +01:00
Erik Johnston	89dbbd68e1	Reduce spurious replication catchup (#16555 )	2023-10-27 13:27:20 +00:00
Erik Johnston	0680d76659	Reduce replication traffic due to reflected cache stream POSITION (#16557 )	2023-10-27 12:51:08 +01:00
Erik Johnston	ba47fea528	Allow multiple workers to write to receipts stream. (#16432 ) Fixes #16417	2023-10-25 16:16:19 +01:00
Jason Little	ffbe9b7666	Remove duplicate call to wake a remote destination when using federation sending worker (#16515 )	2023-10-24 08:09:59 -04:00
Erik Johnston	8f35f8148e	Fix bug where a new writer advances their token too quickly (#16473 ) * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted before it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position	2023-10-23 16:57:30 +01:00
Patrick Cloke	49c9745b45	Avoid sending massive replication updates when purging a room. (#16510 )	2023-10-18 12:26:01 -04:00
Richard van der Hoff	109882230c	Clean up logging on event persister endpoints (#16488 )	2023-10-14 17:57:27 +01:00
Patrick Cloke	ae5b997cfa	Fix comments related to replication. (#16428 )	2023-10-06 07:25:44 -04:00
Patrick Cloke	4e302b30b6	Add __slots__ to replication commands. (#16429 ) To slightly reduce the amount of memory each command takes.	2023-10-05 07:38:55 -04:00
Erik Johnston	80ec81dcc5	Some refactors around receipts stream (#16426 )	2023-10-04 16:28:40 +01:00
Erik Johnston	20fb08ec80	Downgrade repl stream time out error to warning (#16401 ) This is because if a worker reaches ~100% CPU then everything starts lagging and we hit the log line a lot. When at error we invoke sentry and that has a lot of overhead, which then puts even more pressure on the worker.	2023-09-29 11:52:48 +00:00
Patrick Cloke	f84da3c32e	Add a cache around server ACL checking (#16360 ) * Pre-compiles the server ACLs onto an object per room and invalidates them when new events come in. * Converts the server ACL checking into Rust.	2023-09-26 11:57:50 -04:00
Erik Johnston	329597022e	Some minor performance fixes for task schedular (#16313 )	2023-09-14 16:20:47 +01:00
Erik Johnston	ab13fb08bf	Improve logging of replication (#16309 )	2023-09-13 09:51:50 +00:00
Erik Johnston	1cd410a783	Recheck if remote device is cached before requesting it (#16252 ) This fixes a bug where we could get stuck re-requesting the device over replication again and again.	2023-09-07 12:45:43 +00:00
Patrick Cloke	55c20da4a3	Merge remote-tracking branch 'origin/release-v1.91' into release-v1.92	2023-09-06 11:25:28 -04:00
Quentin Gliech	1940d990a3	Revert MSC3861 introspection cache, admin impersonation and account lock (#16258 )	2023-09-06 15:19:51 +01:00
Erik Johnston	d35bed8369	Don't wake up destination transaction queue if they're not due for retry. (#16223 )	2023-09-04 17:14:09 +01:00
David Robertson	e9eb26e3af	Cache device resync requests over replication (#16241 )	2023-09-04 11:57:59 +01:00
Patrick Cloke	e9235d92f2	Track currently syncing users by device for presence (#16172 ) Refactoring to use both the user ID & the device ID when tracking the currently syncing users in the presence handler. This is done both locally and over replication. Note that the device ID is discarded but will be used in a future change.	2023-08-29 11:44:07 -04:00
Patrick Cloke	40901af5e0	Pass the device ID around in the presence handler (#16171 ) Refactoring to pass the device ID (in addition to the user ID) through the presence handler (specifically the `user_syncing`, `set_state`, and `bump_presence_active_time` methods and their replication versions).	2023-08-28 13:08:49 -04:00
Patrick Cloke	1bf143699c	Combine logic about not overriding BUSY presence. (#16170 ) Simplify some of the presence code by reducing duplicated code between worker & non-worker modes. The main change is to push some of the logic from `user_syncing` into `set_state`. This is done by passing whether the user is setting the presence via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some additional logic is performed: * Don't override `busy` presence. * Update the `last_user_sync_ts`. * Never update the status message.	2023-08-28 11:03:23 -04:00
Mathieu Velten	501da8ecd8	Task scheduler: add replication notify for new task to launch ASAP (#16184 )	2023-08-28 14:03:51 +00:00
Erik Johnston	803f63df1c	Fix perf of `wait_for_stream_positions` (#16148 )	2023-08-22 15:11:22 +00:00
Shay	69048f7b48	Add an admin endpoint to allow authorizing server to signal token revocations (#16125 )	2023-08-22 14:15:34 +00:00
Patrick Cloke	ad3f43be9a	Run pyupgrade for python 3.7 & 3.8. (#16110 )	2023-08-15 08:11:20 -04:00
Erik Johnston	ae55cc1e6b	Add ability to wait for locks and add locks to purge history / room deletion (#15791 ) c.f. #13476	2023-07-31 10:58:03 +01:00
Shay	68b2611783	Clarify comment on key uploads over replication (#16016 )	2023-07-27 15:08:46 -07:00
Jason Little	c835befd10	Add Unix socket support for Redis connections (#15644 ) Adds a new configuration setting to connect to Redis via a Unix socket instead of over TCP. Disabled by default.	2023-05-26 15:28:39 -04:00
Jason Little	1df0221bda	Use a custom scheme & the worker name for replication requests. (#15578 ) All the information needed is already in the `instance_map`, so use that instead of passing the hostname / IP & port manually for each replication request. This consolidates logic for future improvements of using e.g. UNIX sockets for workers.	2023-05-23 09:05:30 -04:00
Patrick Cloke	375b0a8a11	Update code to refer to "workers". (#15606 ) A bunch of comments and variables are out of date and use obsolete terms.	2023-05-16 15:56:38 -04:00
Roel ter Maat	2611433b70	Add redis SSL configuration options (#15312 ) * Add SSL options to redis config * fix lint issues * Add documentation and changelog file * add missing . at the end of the changelog * Move client context factory to new file * Rename ssl to tls and fix typo * fix lint issues * Added when redis attributes were added	2023-05-11 13:02:51 +01:00
Jason Little	e4f545c452	Remove `worker_replication_` settings (#15491 ) Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master. * Fix typo in drive by. * Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place) * Several updates: 1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future. 2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map. 3. Clean up old comments/commented out code. 4. Adjust unit tests to match with new code. 5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template. * Initial Docs upload * Changelog * Missed some commented out code that can go now * Remove TODO comment that no longer holds true. * Fix links in docs * More docs * Remove debug logging * Apply suggestions from code review Co-authored-by: reivilibre <olivier@librepush.net> * Apply suggestions from code review Co-authored-by: reivilibre <olivier@librepush.net> * Update version to latest, include completeish before/after examples in upgrade notes. * Fix up and docs too --------- Co-authored-by: reivilibre <olivier@librepush.net>	2023-05-11 11:30:56 +01:00
Jason Little	d3bd03559b	HTTP Replication Client (#15470 ) Separate out a HTTP client for replication in preparation for also supporting using UNIX sockets. The major difference from the base class is that this does not use treq to handle HTTP requests.	2023-05-09 14:25:20 -04:00
Alok Kumar Singh	197fbb123b	Remove legacy code of single user device resync api (#15418 ) * Removed single-user resync usage and updated it to use multi-user counterpart Signed-off-by: Alok Kumar Singh alokaks601@gmail.com	2023-04-21 12:06:39 +01:00
Mathieu Velten	9228ae633f	Add some clarification to the doc/comments regarding TCP replication (#15354 )	2023-03-30 12:51:35 +02:00
David Robertson	1bc9985eb7	Have replication clients remove _INT_STREAM_POS (#15309 ) * Have replication clients remove _INT_STREAM_POS Suppose worker A makes an internal http request from worker B. B may make changes that A later learns about over replication. We want A's request to block until it has seen those changes—mainly to ensure A's caches are invalidated promptly. This helps provide read-after-write consistency, eliminating entire categories of races and test flakes. To implement this, B includes a top-level field `_INT_STREAM_POS` in its response JSON. Roughly speaking, the field's value tells A what to wait for. But we weren't removing that internal field before A's request completed! Introduced in https://github.com/matrix-org/synapse/pull/14820. Fixes #15308. * Changelog	2023-03-22 12:53:55 +00:00
Patrick Cloke	afb216c202	Remove no-op send_command for Redis replication. (#15274 ) With Redis commands do not need to be re-issued by the main process (they fan-out to all processes at once) and thus it is no longer necessary to worry about them reflecting recursively forever.	2023-03-16 11:13:30 -04:00
Patrick Cloke	3bf973edc7	Remove unused class: DirectTcpReplicationClientFactory. (#15272 )	2023-03-15 15:42:20 -04:00
Dirk Klimpel	ecbe0ddbe7	Add support for knocking to workers. (#15133 )	2023-03-02 12:59:53 -05:00
H. Shay	b2fd03d075	Merge branch 'master' into develop	2023-02-28 10:14:20 -08:00
Erik Johnston	b2357a898c	Fix bug where 5s delays would occasionally happen. (#15150 ) This only affects deployments using workers.	2023-02-24 14:39:50 +00:00
dependabot[bot]	9bb2eac719	Bump black from 22.12.0 to 23.1.0 (#15103 )	2023-02-22 15:29:09 -05:00

1 2 3 4 5 ...

704 Commits (77882b6a7d1ad1ab76b0ff878b3daed894bdb26e)