Commit Graph

646 Commits (e1b2c7095d2107b1f466f1688e3a2040e23c7e9c)

Author SHA1 Message Date
Erik Johnston 316590d1ea
Fix bug in `wait_for_stream_position` (#14856)
We were incorrectly checking if the *local* token had been advanced, rather than the token for the remote instance.

In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).
2023-01-17 09:58:22 +00:00
Erik Johnston 2b084c5b71
Merge device list replication streams (#14833) 2023-01-17 09:29:58 +00:00
Erik Johnston 73ff493dfb
Merge account data streams (#14826) 2023-01-13 14:57:43 +00:00
reivilibre ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Nick Mills-Barrett db1cfe9c80
Update all stream IDs after processing replication rows (#14723)
This creates a new store method, `process_replication_position` that
is called after `process_replication_rows`. By moving stream ID advances
here this guarantees any relevant cache invalidations will have been
applied before the stream is advanced.

This avoids race conditions where Python switches between threads mid
way through processing the `process_replication_rows` method where stream
IDs may be advanced before caches are invalidated due to class resolution
ordering.

See this comment/issue for further discussion:
	https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703
2023-01-04 11:49:26 +00:00
Andrew Morgan c4456114e1
Add experimental support for MSC3391: deleting account data (#14714) 2023-01-01 03:40:46 +00:00
reivilibre 2888d7ec83
Faster remote room joins: invalidate caches and unblock requests when receiving un-partial-stated event notifications over replication. [rei:frrj/streams/unpsr] (#14546) 2022-12-19 14:57:51 +00:00
reivilibre fb60cb16fe
Faster remote room joins: stream the un-partial-stating of events over replication. [rei:frrj/streams/unpsr] (#14545) 2022-12-14 14:47:11 +00:00
reivilibre 9e82caac45
Faster remote room joins: unblock tasks waiting for full room state when the un-partial-stating of that room is received over the replication stream. [rei:frrj/streams/unpsr] (#14474) 2022-12-06 15:48:42 +00:00
reivilibre 501f62d1a6
Faster remote room joins: stream the un-partial-stating of rooms over replication. [rei:frrj/streams/unpsr] (#14473) 2022-12-05 13:07:55 +00:00
Patrick Cloke 6d47b7e325
Add a type hint for `get_device_handler()` and fix incorrect types. (#14055)
This was the last untyped handler from the HomeServer object. Since
it was being treated as Any (and thus unchecked) it was being used
incorrectly in a few places.
2022-11-22 14:08:04 -05:00
Andrew Morgan e7132c3f81
Fix check to ignore blank lines in incoming TCP replication (#14449) 2022-11-17 16:09:56 +00:00
David Robertson 115f0eb233
Reintroduce #14376, with bugfix for monoliths (#14468)
* Add tests for StreamIdGenerator

* Drive-by: annotate all defs

* Revert "Revert "Remove slaved id tracker (#14376)" (#14463)"

This reverts commit d63814fd73, which in
turn reverted 36097e88c4. This restores
the latter.

* Fix StreamIdGenerator not handling unpersisted IDs

Spotted by @erikjohnston.

Closes #14456.

* Changelog

Co-authored-by: Nick Mills-Barrett <nick@fizzadar.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2022-11-16 22:16:46 +00:00
realtyem c15e9a0edb
Remove need for `worker_main_http_uri` setting to use /keys/upload. (#14400) 2022-11-16 22:16:25 +00:00
Patrick Cloke d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Erik Johnston d63814fd73
Revert "Remove slaved id tracker (#14376)" (#14463)
This reverts commit 36097e88c4.
2022-11-16 13:50:07 +00:00
Tuomas Ojamies b5ab2c428a
Support using SSL on worker endpoints. (#14128)
* Fix missing SSL support in worker endpoints.

* Add changelog

* SSL for Replication endpoint

* Remove unit test change

* Refactor listener creation to reduce duplicated code

* Fix the logger message

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Add config documentation for new TLS option

Co-authored-by: Tuomas Ojamies <tojamies@palantir.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-11-15 12:55:00 +00:00
Nick Mills-Barrett 36097e88c4
Remove slaved id tracker (#14376)
This matches the multi instance writer ID generator class which can
both handle advancing the current token over replication and by calling
the database.
2022-11-14 17:31:36 +00:00
Nick Mills-Barrett 3a4f80f8c6
Merge/remove `Slaved*` stores into `WorkerStores` (#14375) 2022-11-11 10:51:49 +00:00
Patrick Cloke bc2bd92b93 Merge remote-tracking branch 'origin/release-v1.69' into develop 2022-10-14 14:11:27 -04:00
Brendan Abolivier 422cff7df6
Fallback if 'approved' isn't included in a registration replication request (#14135) 2022-10-11 14:41:06 +02:00
Shay 7b7478e8b6
Batch up notifications after event persistence (#14033) 2022-10-05 10:12:48 -07:00
Brendan Abolivier be76cd8200
Allow admins to require a manual approval process before new accounts can be used (using MSC3866) (#13556) 2022-09-29 15:23:24 +02:00
Shay 8ab16a92ed
Persist CreateRoom events to DB in a batch (#13800) 2022-09-28 10:11:48 +00:00
Patrick Cloke efd108b45d
Accept & store thread IDs for receipts (implement MSC3771). (#13782)
Updates the `/receipts` endpoint and receipt EDU handler to parse a
`thread_id` from the body and insert it in the database.
2022-09-23 14:33:28 +00:00
Brendan Abolivier 8ae42ab8fa
Support enabling/disabling pushers (from MSC3881) (#13799)
Partial implementation of MSC3881
2022-09-21 14:39:01 +00:00
Patrick Cloke 32fc3b7ba4
Remove configuration options for direct TCP replication. (#13647)
Removes the ability to configure legacy direct TCP replication. Workers now require Redis to run.
2022-09-06 07:50:02 +00:00
Šimon Brandner 0e99f07952
Remove support for unstable private read receipts (#13653)
Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>
2022-09-01 13:31:54 +01:00
reivilibre 7bc110a19e
Generalise the `@cancellable` annotation so it can be used on functions other than just servlet methods. (#13662) 2022-08-31 11:16:05 +00:00
Erik Johnston aec87a0f93
Speed up fetching large numbers of push rules (#13592) 2022-08-23 13:15:43 +01:00
Šimon Brandner ab18441573
Support stable identifiers for MSC2285: private read receipts. (#13273)
This adds support for the stable identifiers of MSC2285 while
continuing to support the unstable identifiers behind the configuration
flag. These will be removed in a future version.
2022-08-05 11:09:33 -04:00
Nick Mills-Barrett 86e366a46e
Remove old empty/redundant slaved stores. (#13349) 2022-07-21 17:56:45 +00:00
Nick Mills-Barrett 190f49d8ab
Use cache store remove base slaved (#13329)
This comes from two identical definitions in each of the base stores, and means the base slaved store is now empty and can be removed.
2022-07-21 11:51:30 +01:00
Patrick Cloke a6895dd576
Add type annotations to `trace` decorator. (#13328)
Functions that are decorated with `trace` are now properly typed
and the type hints for them are fixed.
2022-07-19 14:14:30 -04:00
David Robertson b977867358
Rate limit joins per-room (#13276) 2022-07-19 11:45:17 +00:00
Erik Johnston f721f1baba
Revert "Make all `process_replication_rows` methods async (#13304)" (#13312)
This reverts commit 5d4028f217.
2022-07-18 14:28:14 +01:00
Nick Mills-Barrett 5d4028f217
Make all `process_replication_rows` methods async (#13304)
More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so).

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-17 22:19:43 +01:00
Sean Quah 1391a76cd2
Faster room joins: fix race in recalculation of current room state (#13151)
Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.

Also give recalculation of a room's current state a real stream
ordering.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-07 12:19:31 +00:00
Sean Quah 68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
David Robertson 97e9fbe1b2
Type annotations in `synapse.databases.main.devices` (#13025)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-06-15 15:20:04 +00:00
Patrick Cloke cf05258f76
Remove groups replication code. (#12900)
The replication logic for groups is no longer used, so the message
passing infrastructure can be removed.
2022-05-31 13:04:08 -04:00
Erik Johnston 1e453053cb
Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
reivilibre 39dee30f01
Send `USER_IP` commands on a different Redis channel, in order to reduce traffic to workers that do not process these commands. (#12809) 2022-05-20 15:28:23 +01:00
reivilibre 177b884ad7
Lay some foundation work to allow workers to only subscribe to some kinds of messages, reducing replication traffic. (#12672) 2022-05-19 16:29:08 +01:00
Andrew Morgan 83be72d76c
Add `StreamKeyType` class and replace string literals with constants (#12567) 2022-05-16 15:35:31 +00:00
Sean Quah a559c8b0d9
Respect the `@cancellable` flag for `ReplicationEndpoint`s (#12700)
While `ReplicationEndpoint`s register themselves via `JsonResource`,
they pass a method that calls the handler, instead of the handler itself,
to `register_paths`. As a result, `JsonResource` will not correctly pick
up the `@cancellable` flag and we have to apply it ourselves.

Signed-off-by: Sean Quah <seanq@element.io>
2022-05-11 12:25:39 +01:00
Shay d80a7ab151
Update `replication.md` with info on TCP module structure (#12621) 2022-05-09 14:46:43 -07:00
Šimon Brandner ef86cf3d28
Update `_on_new_receipts()` to work with MSC2285 changes. (#12636) 2022-05-05 13:25:51 +00:00
Erik Johnston c0379d6e5b
Reduce log spam when running multiple event persisters (#12610) 2022-05-05 10:20:23 +01:00
Erik Johnston d1cd96ce29
Add opentracing spans to calls to external cache (#12380) 2022-04-07 13:18:29 +01:00