MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
reivilibre	e17e5c97e0	Faster Room Joins: don't leave a stuck room partial state flag if the join fails. (#13403 )	2022-08-01 16:45:39 +00:00
David Teller	11f811470f	Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return `Tuple[Codes, dict]` (#13044 ) Signed-off-by: David Teller <davidt@element.io> Co-authored-by: Brendan Abolivier <babolivier@matrix.org>	2022-07-11 16:52:10 +00:00
Sean Quah	1391a76cd2	Faster room joins: fix race in recalculation of current room state (#13151 ) Bounce recalculation of current state to the correct event persister and move recalculation of current state into the event persistence queue, to avoid concurrent updates to a room's current state. Also give recalculation of a room's current state a real stream ordering. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-07 12:19:31 +00:00
Sean Quah	68db233f0c	Handle race between persisting an event and un-partial stating a room (#13100 ) Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in a lot of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-05 16:12:52 +01:00
David Teller	a164a46038	Uniformize spam-checker API, part 4: port other spam-checker callbacks to return `Union[Allow, Codes]`. (#12857 ) Co-authored-by: Brendan Abolivier <babolivier@matrix.org>	2022-06-13 18:16:16 +00:00
Richard van der Hoff	f68b5e5773	Merge branch 'rav/simplify_event_auth_interface' into develop	2022-06-13 11:34:59 +01:00
Richard van der Hoff	c1b28b8842	Remove redundant `room_version` param from `check_auth_rules_from_context` It's now implied by the room_version property on the event.	2022-06-12 23:13:10 +01:00
Richard van der Hoff	68be42f6b6	Remove `room_version` param from `validate_event_for_room_version` Instead, use the `room_version` property of the event we're validating. The `room_version` was originally added as a parameter somewhere around #4482, but really it's been redundant since #6875 added a `room_version` field to `EventBase`.	2022-06-12 23:13:09 +01:00
Richard van der Hoff	7c6b2204d1	Faster joins: add issue links to the TODOs (#13004 ) ... to help us keep track of these things	2022-06-09 10:13:03 +00:00
Erik Johnston	e3163e2e11	Reduce the amount of state we pull from the DB (#12811 )	2022-06-06 09:24:12 +01:00
Erik Johnston	888a29f412	Wait for lazy join to complete when getting current state (#12872 )	2022-06-01 16:02:53 +01:00
Sean Quah	641908f72f	Faster room joins: Resume state re-syncing after a Synapse restart (#12813 ) Signed-off-by: Sean Quah <seanq@matrix.org>	2022-05-31 15:15:08 +00:00
Sean Quah	2fba1076c5	Faster room joins: Try other destinations when resyncing the state of a partial-state room (#12812 ) Signed-off-by: Sean Quah <seanq@matrix.org>	2022-05-31 15:50:29 +01:00
Erik Johnston	1e453053cb	Rename storage classes (#12913 )	2022-05-31 12:17:50 +00:00
Erik Johnston	4660d9fdcf	Fix up `state_store` naming (#12871 )	2022-05-25 12:59:04 +01:00
Shay	71e8afe34d	Update EventContext `get_current_event_ids` and `get_prev_event_ids` to accept state filters and update calls where possible (#12791 )	2022-05-20 09:54:12 +01:00
Erik Johnston	c72d26c1e1	Refactor `EventContext` (#12689 ) Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing. The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching). One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used. Part of #12684	2022-05-10 19:43:13 +00:00
andrew do	01e625513a	remove constantly lib use and switch to enums. (#12624 )	2022-05-04 11:26:11 +00:00
Richard van der Hoff	17d99f758a	Optimise backfill calculation (#12522 ) Try to avoid an OOM by checking fewer extremities. Generally this is a big rewrite of _maybe_backfill, to try and fix some of the TODOs and other problems in it. It's best reviewed commit-by-commit.	2022-04-26 10:27:11 +01:00
Richard van der Hoff	320186319a	Resync state after partial-state join (#12394 ) We work through all the events with partial state, updating the state at each of them. Once it's done, we recalculate the state for the whole room, and then mark the room as having complete state.	2022-04-12 13:23:43 +00:00
Sean Quah	800ba87cc8	Refactor and convert `Linearizer` to async (#12357 ) Refactor and convert `Linearizer` to async. This makes a `Linearizer` cancellation bug easier to fix. Also refactor to use an async context manager, which eliminates an unlikely footgun where code that doesn't immediately use the context manager could forget to release the lock. Signed-off-by: Sean Quah <seanq@element.io>	2022-04-05 15:43:52 +01:00
Richard van der Hoff	afa17f0eab	Return a 404 from `/state` for an outlier (#12087 ) * Replace `get_state_for_pdu` with `get_state_ids_for_pdu` and `get_events_as_list`. * Return a 404 from `/state` and `/state_ids` for an outlier	2022-03-21 11:23:32 +00:00
Richard van der Hoff	dc8d825ef2	Skip attempt to get state at backwards-extremities (#12173 ) We don't have the state at a backwards-extremity, so this is never going to do anything useful.	2022-03-09 11:00:48 +00:00
Richard van der Hoff	e2e1d90a5e	Faster joins: persist to database (#12012 ) When we get a partial_state response from send_join, store information in the database about it: * store a record about the room as a whole having partial state, and stash the list of member servers too. * flag the join event itself as having partial state * also, for any new events whose prev-events are partial-stated, note that they will also be partial-stated. We don't yet make any attempt to interpret this data, so API calls (and a bunch of other things) are just going to get incorrect data.	2022-03-01 12:49:54 +00:00
Richard van der Hoff	e24ff8ebe3	Remove `HomeServer.get_datastore()` (#12031 ) The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733	2022-02-23 11:04:02 +00:00
Richard van der Hoff	81364db49b	Run `_handle_queued_pdus` as a background process (#12041 ) ... to ensure it gets a proper log context, mostly.	2022-02-22 13:33:22 +00:00
Richard van der Hoff	3070af4809	remote join processing: get create event from state, not auth_chain (#12039 ) A follow-up to #12005, in which I apparently missed that there are a bunch of other places that assume the create event is in the auth chain.	2022-02-21 19:27:35 +00:00
Eric Eastwood	fef2e792be	Fix historical messages backfilling in random order on remote homeservers (MSC2716) (#11114 ) Fix https://github.com/matrix-org/synapse/issues/11091 Fix https://github.com/matrix-org/synapse/issues/10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`) 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`) - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)). - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort. - This will be even more the case when we add the [MSC3030 jump to date API endpoint](https://github.com/matrix-org/matrix-doc/pull/3030) so the static archives can navigate and jump to a certain date. - We're solving this in the future by switching to [online topological ordering](https://github.com/matrix-org/gomatrixserverlib/issues/187) and [chunking](https://github.com/matrix-org/synapse/issues/3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order. 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes https://github.com/matrix-org/synapse/issues/11091 and https://github.com/matrix-org/synapse/issues/10764 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls. - Related discussion, https://github.com/matrix-org/synapse/pull/11114#discussion_r741514793 Before \| After --- \| --- ![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) \| ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png) #### Why aren't we sorting topologically when receiving backfill events? > The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) which could change whatever implementation anyway. > > As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering. > > -- https://github.com/matrix-org/synapse/pull/11114#discussion_r741517138 See https://github.com/matrix-org/synapse/pull/11114#discussion_r739610091 for the technical difficulties	2022-02-07 15:54:13 -06:00
Richard van der Hoff	251b5567ec	Remove `log_function` and its uses (#11761 ) I've never found this terribly useful. I think it was added in the early days of Synapse, without much thought as to what would actually be useful to log, and has just been cargo-culted ever since. Rather, it tends to clutter up debug logs with useless information.	2022-01-18 13:06:04 +00:00
Sean Quah	0147b3de20	Add missing type hints to `synapse.logging.context` (#11556 )	2021-12-14 17:35:28 +00:00
Eric Eastwood	a6f1a3abec	Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445 ) MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030 Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about. ``` GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Federation API endpoint: ``` GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Co-authored-by: Erik Johnston <erik@matrix.org>	2021-12-02 01:02:20 -06:00
Richard van der Hoff	f3efa0036b	Move _persist_auth_tree into FederationEventHandler (#11115 ) This is just a lift-and-shift, because it fits more naturally here. We do rename it to `process_remote_join` at the same time though.	2021-10-19 10:24:09 +01:00
Richard van der Hoff	a5d2ea3d08	Check all auth events for room id and rejection (#11009 ) This fixes a bug where we would accept an event whose `auth_events` include rejected events, if the rejected event was shadowed by another `auth_event` with same `(type, state_key)`. The approach is to pass a list of auth events into `check_auth_rules_for_event` instead of a dict, which of course means updating the call sites. This is an extension of #10956.	2021-10-18 18:28:30 +01:00
Eric Eastwood	daf498e099	Fix 500 error on `/messages` when we accumulate more than 5 backward extremities (#11027 ) Found while working on the Gitter backfill script and noticed it only happened after we sent 7 batches, https://gitlab.com/gitterHQ/webapp/-/merge_requests/2229#note_665906390 When there are more than 5 backward extremities for a given depth, backfill will throw an error because we sliced the extremity list to 5 but then try to iterate over the full list. This causes us to look for state that we never fetched and we get a `KeyError`. Before when calling `/messages` when there are more than 5 backward extremities: ``` Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 258, in _async_render_wrapper callback_return = await self._async_render(request) File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 446, in _async_render callback_return = await raw_callback_return File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 580, in on_GET msgs = await self.pagination_handler.get_messages( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/pagination.py", line 396, in get_messages await self.hs.get_federation_handler().maybe_backfill( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 133, in maybe_backfill return await self._maybe_backfill_inner(room_id, current_depth, limit) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 386, in _maybe_backfill_inner likely_extremeties_domains = get_domains_from_state(states[e_id]) KeyError: '$zpFflMEBtZdgcMQWTakaVItTLMjLFdKcRWUPHbbSZJl' ```	2021-10-14 18:53:45 -05:00
Patrick Cloke	eb9ddc8c2e	Remove the deprecated BaseHandler. (#11005 ) The shared ratelimit function was replaced with a dedicated RequestRatelimiter class (accessible from the HomeServer object). Other properties were copied to each sub-class that inherited from BaseHandler.	2021-10-08 07:44:43 -04:00
Patrick Cloke	d1bf5f7c9d	Strip "join_authorised_via_users_server" from join events which do not need it. (#10933 ) This fixes a "Event not signed by authorising server" error when transition room member from join -> join, e.g. when updating a display name or avatar URL for restricted rooms.	2021-09-30 11:13:59 -04:00
Richard van der Hoff	428174f902	Split `event_auth.check` into two parts (#10940 ) Broadly, the existing `event_auth.check` function has two parts: * a validation section: checks that the event isn't too big, that it has the rught signatures, etc. This bit is independent of the rest of the state in the room, and so need only be done once for each event. * an auth section: ensures that the event is allowed, given the rest of the state in the room. This gets done multiple times, against various sets of room state, because it forms part of the state res algorithm. Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think that makes everything hard to follow. Instead, we split the function in two and call each part separately where it is needed.	2021-09-29 18:59:15 +01:00
Patrick Cloke	94b620a5ed	Use direct references for configuration variables (part 6). (#10916 )	2021-09-29 06:44:15 -04:00
Richard van der Hoff	5279b9161b	Use `RoomVersion` objects (#10934 ) Various refactors to use `RoomVersion` objects instead of room version identifiers.	2021-09-29 10:57:10 +01:00
Patrick Cloke	bb7fdd821b	Use direct references for configuration variables (part 5). (#10897 )	2021-09-24 07:25:21 -04:00
Andrew Morgan	aa2c027792	Remove unnecessary parentheses around tuples returned from methods (#10889 )	2021-09-23 11:59:07 +01:00
Richard van der Hoff	26f2bfedbf	Factor out a separate `EventContext.for_outlier` (#10883 ) Constructing an EventContext for an outlier is actually really simple, and there's no sense in going via an `async` method in the `StateHandler`. This also means that we can resolve a bunch of FIXMEs.	2021-09-22 17:58:57 +01:00
Richard van der Hoff	8f2a52766b	Ensure we mark sent knocks as outliers (#10873 )	2021-09-22 15:20:18 +01:00
Patrick Cloke	b3590614da	Require type hints in the handlers module. (#10831 ) Adds missing type hints to methods in the synapse.handlers module and requires all methods to have type hints there. This also removes the unused construct_auth_difference method from the FederationHandler.	2021-09-20 08:56:23 -04:00
Patrick Cloke	01c88a09cd	Use direct references for some configuration variables (#10798 ) Instead of proxying through the magic getter of the RootConfig object. This should be more performant (and is more explicit).	2021-09-13 13:07:12 -04:00
Eric Eastwood	dc75fb7f05	Populate `rooms.creator` field for easy lookup (#10697 ) Part of https://github.com/matrix-org/synapse/pull/10566 - Fill in creator whenever we insert into the rooms table - Add background update to backfill any missing creator values	2021-09-01 16:27:58 +01:00
Richard van der Hoff	1800aabfc2	Split `FederationHandler` in half (#10692 ) The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.	2021-08-26 21:41:44 +01:00
Richard van der Hoff	96715d7633	Make `backfill` and `get_missing_events` use the same codepath (#10645 ) Given that backfill and get_missing_events are basically the same thing, it's somewhat crazy that we have entirely separate code paths for them. This makes backfill use the existing get_missing_events code, and then clears up all the unused code.	2021-08-26 18:34:57 +01:00
Richard van der Hoff	e81d62009e	Split `on_receive_pdu` in half (#10640 ) Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.	2021-08-19 17:05:12 +00:00
Richard van der Hoff	50af1efe4b	Extract `_resolve_state_at_missing_prevs` (#10624 ) This is a follow-up to #10615: it takes the code that constructs the state at a backwards extremity, and extracts it to a separate method.	2021-08-19 17:31:40 +01:00
Richard van der Hoff	964f29cb6f	Refactor `on_receive_pdu` code (#10615 ) * drop room pdu linearizer sooner No point holding onto it while we recheck the db * move out `missing_prevs` calculation we're going to need `missing_prevs` whatever we do, so we may as well calculate it eagerly and just update it if it gets outdated. * Add another `if missing_prevs` condition this should be a no-op, since all the code inside the block already checks `if missing_prevs` * reorder if conditions This shouldn't change the logic at all. * Push down `min_depth` read No point reading it from the database unless we're going to use it. * Collect the sent_to_us_directly code together Move the remaining `sent_to_us_directly` code inside the `if sent_to_us_directly` block. * Properly separate the `not sent_to_us_directly` branch Since the only way this second block is now reachable is if we didn't go into the `sent_to_us_directly` branch, we can replace it with a simple `else`. * changelog	2021-08-18 12:36:22 +01:00
Richard van der Hoff	272b89d547	Stop setting the outlier flag for things that aren't (#10614 ) Marking things as outliers to inhibit pushes is a sledgehammer to crack a nut. Move the test further down the stack so that we just inhibit the thing we want.	2021-08-17 13:13:42 +01:00
Richard van der Hoff	2d9ca4ca77	Clean up some logging in the federation event handler (#10591 ) * Include outlier status in `str(event)` In places where we log event objects, knowing whether or not you're dealing with an outlier is super useful. * Remove duplicated logging in get_missing_events When we process events received from get_missing_events, we log them twice (once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce the duplication by removing the logging in `on_receive_pdu`, and ensuring the call sites do sensible logging. * log in `on_receive_pdu` when we already have the event * Log which prev_events we are missing * changelog	2021-08-16 13:19:02 +01:00
Richard van der Hoff	1bebc0b78c	Clean up federation event auth code (#10539 ) * drop old-room hack pretty sure we don't need this any more. * Remove incorrect comment about modifying `context` It doesn't look like the supplied context is ever modified. * Stop `_auth_and_persist_event` modifying its parameters This is only called in three places. Two of them don't pass `auth_events`, and the third doesn't use the dict after passing it in, so this should be non-functional. * Stop `_check_event_auth` modifying its parameters `_check_event_auth` is only called in three places. `on_send_membership_event` doesn't pass an `auth_events`, and `prep` and `_auth_and_persist_event` do not use the map after passing it in. * Stop `_update_auth_events_and_context_for_auth` modifying its parameters Return the updated auth event dict, rather than modifying the parameter. This is only called from `_check_event_auth`. * Improve documentation on `_auth_and_persist_event` Rename `auth_events` parameter to better reflect what it contains. * Improve documentation on `_NewEventInfo` * Improve documentation on `_check_event_auth` rename `auth_events` parameter to better describe what it contains * changelog	2021-08-06 13:54:23 +01:00
Eric Eastwood	684d19a11c	Add support for MSC2716 marker events (#10498 ) * Make historical messages available to federated servers Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 Follow-up to https://github.com/matrix-org/synapse/pull/9247 * Debug message not available on federation * Add base starting insertion point when no chunk ID is provided * Fix messages from multiple senders in historical chunk Follow-up to https://github.com/matrix-org/synapse/pull/9247 Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 --- Previously, Synapse would throw a 403, `Cannot force another user to join.`, because we were trying to use `?user_id` from a single virtual user which did not match with messages from other users in the chunk. * Remove debug lines * Messing with selecting insertion event extremeties * Move db schema change to new version * Add more better comments * Make a fake requester with just what we need See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080 * Store insertion events in table * Make base insertion event float off on its own See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889 Conflicts: synapse/rest/client/v1/room.py * Validate that the app service can actually control the given user See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455 Conflicts: synapse/rest/client/v1/room.py * Add some better comments on what we're trying to check for * Continue debugging * Share validation logic * Add inserted historical messages to /backfill response * Remove debug sql queries * Some marker event implemntation trials * Clean up PR * Rename insertion_event_id to just event_id * Add some better sql comments * More accurate description * Add changelog * Make it clear what MSC the change is part of * Add more detail on which insertion event came through * Address review and improve sql queries * Only use event_id as unique constraint * Fix test case where insertion event is already in the normal DAG * Remove debug changes * Add support for MSC2716 marker events * Process markers when we receive it over federation * WIP: make hs2 backfill historical messages after marker event * hs2 to better ask for insertion event extremity But running into the `sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group` error * Add insertion_event_extremities table * Switch to chunk events so we can auth via power_levels Previously, we were using `content.chunk_id` to connect one chunk to another. But these events can be from any `sender` and we can't tell who should be able to send historical events. We know we only want the application service to do it but these events have the sender of a real historical message, not the application service user ID as the sender. Other federated homeservers also have no indicator which senders are an application service on the originating homeserver. So we want to auth all of the MSC2716 events via power_levels and have them be sent by the application service with proper PL levels in the room. * Switch to chunk events for federation * Add unstable room version to support new historical PL * Messy: Fix undefined state_group for federated historical events ``` 2021-07-13 02:27:57,810 - synapse.handlers.federation - 1248 - ERROR - GET-4 - Failed to backfill from hs1 because NOT NULL constraint failed: event_to_state_groups.state_group Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1216, in try_backfill await self.backfill( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1035, in backfill await self._auth_and_persist_event(dest, event, context, backfilled=True) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2222, in _auth_and_persist_event await self._run_push_actions_and_persist_event(event, context, backfilled) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2244, in _run_push_actions_and_persist_event await self.persist_events_and_notify( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 3290, in persist_events_and_notify events, max_stream_token = await self.storage.persistence.persist_events( File "/usr/local/lib/python3.8/site-packages/synapse/logging/opentracing.py", line 774, in _trace_inner return await func(args, kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 320, in persist_events ret_vals = await yieldable_gather_results(enqueue, partitioned.items()) File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 237, in handle_queue_loop ret = await self._per_item_callback( File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch await self.persist_events_store._persist_events_and_state_updates( File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 176, in _persist_events_and_state_updates await self.db_pool.runInteraction( File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 681, in runInteraction result = await self.runWithConnection( File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 770, in runWithConnection return await make_deferred_yieldable( File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext result = inContext.theWork() # type: ignore[attr-defined] File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda> inContext.theWork = lambda: context.call( # type: ignore[attr-defined] File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, args, *kw) File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext return func(args, *kw) File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection compat.reraise(excValue, excTraceback) File "/usr/local/lib/python3.8/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction return function(args, *kwargs) File "/usr/local/lib/python3.8/site-packages/twisted/python/compat.py", line 403, in reraise raise exception.with_traceback(traceback) File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection result = func(conn, args, *kw) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 765, in inner_func return func(db_conn, args, *kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 549, in new_transaction r = func(cursor, args, *kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/logging/utils.py", line 69, in wrapped return f(args, *kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 385, in _persist_events_txn self._store_event_state_mappings_txn(txn, events_and_contexts) File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 2065, in _store_event_state_mappings_txn self.db_pool.simple_insert_many_txn( File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 923, in simple_insert_many_txn txn.execute_batch(sql, vals) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 280, in execute_batch self.executemany(sql, args) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 300, in executemany self._do_execute(self.txn.executemany, sql, args) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 330, in _do_execute return func(sql, args) sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group ``` Revert "Messy: Fix undefined state_group for federated historical events" This reverts commit `187ab28611`. * Fix federated events being rejected for no state_groups Add fix from https://github.com/matrix-org/synapse/pull/10439 until it merges. * Adapting to experimental room version * Some log cleanup * Add better comments around extremity fetching code and why * Rename to be more accurate to what the function returns * Add changelog * Ignore rejected events * Use simplified upsert * Add Erik's explanation of extra event checks See https://github.com/matrix-org/synapse/pull/10498#discussion_r680880332 * Clarify that the depth is not directly correlated to the backwards extremity that we return See https://github.com/matrix-org/synapse/pull/10498#discussion_r681725404 * lock only matters for sqlite See https://github.com/matrix-org/synapse/pull/10498#discussion_r681728061 * Move new SQL changes to its own delta file * Clean up upsert docstring * Bump database schema version (62)	2021-08-04 12:07:57 -05:00
Eric Eastwood	d0b294ad97	Make historical events discoverable from backfill for servers without any scrollback history (MSC2716) (#10245 ) * Make historical messages available to federated servers Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 Follow-up to https://github.com/matrix-org/synapse/pull/9247 * Debug message not available on federation * Add base starting insertion point when no chunk ID is provided * Fix messages from multiple senders in historical chunk Follow-up to https://github.com/matrix-org/synapse/pull/9247 Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 --- Previously, Synapse would throw a 403, `Cannot force another user to join.`, because we were trying to use `?user_id` from a single virtual user which did not match with messages from other users in the chunk. * Remove debug lines * Messing with selecting insertion event extremeties * Move db schema change to new version * Add more better comments * Make a fake requester with just what we need See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080 * Store insertion events in table * Make base insertion event float off on its own See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889 Conflicts: synapse/rest/client/v1/room.py * Validate that the app service can actually control the given user See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455 Conflicts: synapse/rest/client/v1/room.py * Add some better comments on what we're trying to check for * Continue debugging * Share validation logic * Add inserted historical messages to /backfill response * Remove debug sql queries * Some marker event implemntation trials * Clean up PR * Rename insertion_event_id to just event_id * Add some better sql comments * More accurate description * Add changelog * Make it clear what MSC the change is part of * Add more detail on which insertion event came through * Address review and improve sql queries * Only use event_id as unique constraint * Fix test case where insertion event is already in the normal DAG * Remove debug changes * Switch to chunk events so we can auth via power_levels Previously, we were using `content.chunk_id` to connect one chunk to another. But these events can be from any `sender` and we can't tell who should be able to send historical events. We know we only want the application service to do it but these events have the sender of a real historical message, not the application service user ID as the sender. Other federated homeservers also have no indicator which senders are an application service on the originating homeserver. So we want to auth all of the MSC2716 events via power_levels and have them be sent by the application service with proper PL levels in the room. * Switch to chunk events for federation * Add unstable room version to support new historical PL * Fix federated events being rejected for no state_groups Add fix from https://github.com/matrix-org/synapse/pull/10439 until it merges. * Only connect base insertion event to prev_event_ids Per discussion with @erikjohnston, https://matrix.to/#/!UytJQHLQYfvYWsGrGY:jki.re/$12bTUiObDFdHLAYtT7E-BvYRp3k_xv8w0dUQHibasJk?via=jki.re&via=matrix.org * Make it possible to get the room_version with txn * Allow but ignore historical events in unsupported room version See https://github.com/matrix-org/synapse/pull/10245#discussion_r675592489 We can't reject historical events on unsupported room versions because homeservers without knowledge of MSC2716 or the new room version don't reject historical events either. Since we can't rely on the auth check here to stop historical events on unsupported room versions, I've added some additional checks in the processing/persisting code (`synapse/storage/databases/main/events.py` -> `_handle_insertion_event` and `_handle_chunk_event`). I've had to do some refactoring so there is method to fetch the room version by `txn`. * Move to unique index syntax See https://github.com/matrix-org/synapse/pull/10245#discussion_r675638509 * High-level document how the insertion->chunk lookup works * Remove create_event fallback for room_versions See https://github.com/matrix-org/synapse/pull/10245/files#r677641879 * Use updated method name	2021-07-28 10:46:37 -05:00
Patrick Cloke	228decfce1	Update the MSC3083 support to verify if joins are from an authorized server. (#10254 )	2021-07-26 12:17:00 -04:00
Brendan Abolivier	a743bf4694	Port the ThirdPartyEventRules module interface to the new generic interface (#10386 ) Port the third-party event rules interface to the generic module interface introduced in v1.37.0	2021-07-20 12:39:46 +02:00
Jonathan de Jong	95e47b2e78	[pyupgrade] `synapse/` (#10348 ) This PR is tantamount to running ``` pyupgrade --py36-plus --keep-percent-format `find synapse/ -type f -name "*.py"` ``` Part of #9744	2021-07-19 15:28:05 +01:00
Jonathan de Jong	98aec1cc9d	Use inline type hints in `handlers/` and `rest/`. (#10382 )	2021-07-16 18:22:36 +01:00
Erik Johnston	7695ca0618	Fix a number of logged errors caused by remote servers being down. (#10400 )	2021-07-15 10:35:46 +01:00
Patrick Cloke	8d609435c0	Move methods involving event authentication to EventAuthHandler. (#10268 ) Instead of mixing them with user authentication methods.	2021-07-01 14:25:37 -04:00
Richard van der Hoff	8165ba48b1	Return errors from `send_join` etc if the event is rejected (#10243 ) Rather than persisting rejected events via `send_join` and friends, raise a 403 if someone tries to pull a fast one.	2021-06-24 16:00:08 +01:00
Richard van der Hoff	6e8fb42be7	Improve validation for `send_{join,leave,knock}` (#10225 ) The idea here is to stop people sending things that aren't joins/leaves/knocks through these endpoints: previously you could send anything you liked through them. I wasn't able to find any security holes from doing so, but it doesn't sound like a good thing.	2021-06-24 15:30:49 +01:00
Richard van der Hoff	8beead66ae	Send out invite rejections and knocks over federation (#10223 ) ensure that events sent via `send_leave` and `send_knock` are sent on to the rest of the federation.	2021-06-23 12:54:50 +01:00
Andrew Morgan	182147195b	Check third party rules before persisting knocks over federation (#10212 ) An accidental mis-ordering of operations during #6739 technically allowed an incoming knock event over federation in before checking it against any configured Third Party Access Rules modules. This PR corrects that by performing the TPAR check before persisting the event.	2021-06-21 11:57:09 +01:00
Marcus	8070b893db	update black to 21.6b0 (#10197 ) Reformat all files with the new version. Signed-off-by: Marcus Hoffmann <bubu@bubu1.eu>	2021-06-17 15:20:06 +01:00
Eric Eastwood	a911dd768b	Add fields to better debug where events are being soft_failed (#10168 ) Follow-up to https://github.com/matrix-org/synapse/pull/10156#discussion_r650292223	2021-06-17 14:59:45 +01:00
Patrick Cloke	9e5ab6dd58	Remove the experimental flag for knocking and use stable prefixes / endpoints. (#10167 ) * Room version 7 for knocking. * Stable prefixes and endpoints (both client and federation) for knocking. * Removes the experimental configuration flag.	2021-06-15 07:45:14 -04:00
Eric Eastwood	b31daac01c	Add metrics to track how often events are `soft_failed` (#10156 ) Spawned from missing messages we were seeing on `matrix.org` from a federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770. The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066 where the message and join event race and the message is `soft_failed` before the `join` event reaches the remote federated server. Less soft_failed events = better and usually this should only trigger for events where people are doing bad things and trying to fuzz and fake everything.	2021-06-11 10:12:35 +01:00
Sorunome	d936371b69	Implement knock feature (#6739 ) This PR aims to implement the knock feature as proposed in https://github.com/matrix-org/matrix-doc/pull/2403 Signed-off-by: Sorunome mail@sorunome.de Signed-off-by: Andrew Morgan andrewm@element.io	2021-06-09 19:39:51 +01:00
Erik Johnston	a0101fc021	Handle /backfill returning no events (#10133 ) Fixes #10123	2021-06-08 10:37:01 +01:00
Erik Johnston	a0cd8ae8cb	Don't try and backfill the same room in parallel. (#10116 ) If backfilling is slow then the client may time out and retry, causing Synapse to start a new `/backfill` before the existing backfill has finished, duplicating work.	2021-06-04 10:47:58 +01:00
Erik Johnston	c96ab31dff	Limit number of events in a replication request (#10118 ) Fixes #9956.	2021-06-04 10:35:47 +01:00
Richard van der Hoff	b4b2fd2ece	add a cache to have_seen_event (#9953 ) Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.	2021-06-01 12:04:47 +01:00
Brendan Abolivier	f828a70be3	Limit the number of events sent over replication when persisting events. (#10082 )	2021-05-27 17:10:58 +01:00
Patrick Cloke	ac6bfcd52f	Refactor checking restricted join rules (#10007 ) To be more consistent with similar code. The check now automatically raises an AuthError instead of passing back a boolean. It also absorbs some shared logic between callers.	2021-05-18 12:17:04 -04:00
Erik Johnston	2b2985b5cf	Improve performance of backfilling in large rooms. (#9935 ) We were pulling the full auth chain for the room out of the DB each time we backfilled, which can be huge for large rooms and is totally unnecessary.	2021-05-10 13:29:02 +01:00
Erik Johnston	de8f0a03a3	Don't set the external cache if its been done recently (#9905 )	2021-05-05 16:53:22 +01:00
Patrick Cloke	d924827da1	Check for space membership during a remote join of a restricted room (#9814 ) When receiving a /send_join request for a room with join rules set to 'restricted', check if the user is a member of the spaces defined in the 'allow' key of the join rules. This only applies to an experimental room version, as defined in MSC3083.	2021-04-23 07:05:51 -04:00
Jonathan de Jong	495b214f4f	Fix (final) Bugbear violations (#9838 )	2021-04-20 11:50:49 +01:00
Patrick Cloke	936e69825a	Separate creating an event context from persisting it in the federation handler (#9800 ) This refactoring allows adding logic that uses the event context before persisting it.	2021-04-14 12:35:28 -04:00
Patrick Cloke	e8816c6ace	Revert "Check for space membership during a remote join of a restricted room. (#9763 )" This reverts commit `cc51aaaa7a`. The PR was prematurely merged and not yet approved.	2021-04-14 12:33:37 -04:00
Patrick Cloke	cc51aaaa7a	Check for space membership during a remote join of a restricted room. (#9763 ) When receiving a /send_join request for a room with join rules set to 'restricted', check if the user is a member of the spaces defined in the 'allow' key of the join rules. This only applies to an experimental room version, as defined in MSC3083.	2021-04-14 12:32:20 -04:00
Jonathan de Jong	4b965c862d	Remove redundant "coding: utf-8" lines (#9786 ) Part of #9744 Removes all redundant `# -- coding: utf-8 --` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`	2021-04-14 15:34:27 +01:00
Jonathan de Jong	2ca4e349e9	Bugbear: Add Mutable Parameter fixes (#9682 ) Part of #9366 Adds in fixes for B006 and B008, both relating to mutable parameter lint errors. Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>	2021-04-08 22:38:54 +01:00
Patrick Cloke	d959d28730	Add type hints to the federation handler and server. (#9743 )	2021-04-06 07:21:57 -04:00
Erik Johnston	963f4309fe	Make RateLimiter class check for ratelimit overrides (#9711 ) This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited. We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits. Fixes #9663	2021-03-30 12:06:09 +01:00
Richard van der Hoff	af2248f8bf	Optimise missing prev_event handling (#9601 ) Background: When we receive incoming federation traffic, and notice that we are missing prev_events from the incoming traffic, first we do a `/get_missing_events` request, and then if we still have missing prev_events, we set up new backwards-extremities. To do that, we need to make a `/state_ids` request to ask the remote server for the state at those prev_events, and then we may need to then ask the remote server for any events in that state which we don't already have, as well as the auth events for those missing state events, so that we can auth them. This PR attempts to optimise the processing of that state request. The `state_ids` API returns a list of the state events, as well as a list of all the auth events for all of those state events. The optimisation comes from the observation that we are currently loading all of those auth events into memory at the start of the operation, but we almost certainly aren't going to need all of the auth events. Rather, we can check that we have them, and leave the actual load into memory for later. (Ideally the federation API would tell us which auth events we're actually going to need, but it doesn't.) The effect of this is to reduce the number of events that I need to load for an event in Matrix HQ from about 60000 to about 22000, which means it can stay in my in-memory cache, whereas previously the sheer number of events meant that all 60K events had to be loaded from db for each request, due to the amount of cache churn. (NB I've already tripled the size of the cache from its default of 10K). Unfortunately I've ended up basically C&Ping `_get_state_for_room` and `_get_events_from_store_or_dest` into a new method, because `_get_state_for_room` is also called during backfill, which expects the auth events to be returned, so the same tricks don't work. That said, I don't really know why that codepath is completely different (ultimately we're doing the same thing in setting up a new backwards extremity) so I've left a TODO suggesting that we clean it up.	2021-03-15 13:51:02 +00:00
Richard van der Hoff	2b328d7e02	Improve logging when processing incoming transactions (#9596 ) Put the room id in the logcontext, to make it easier to understand what's going on.	2021-03-12 15:08:03 +00:00
Patrick Cloke	2a99cc6524	Use the chain cover index in get_auth_chain_ids. (#9576 ) This uses a simplified version of get_chain_cover_difference to calculate auth chain of events.	2021-03-10 09:57:59 -05:00
Eric Eastwood	0a00b7ff14	Update black, and run auto formatting over the codebase (#9381 ) - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](`80d6dc9783/docs/code_style.md`) - Update `code_style.md` docs around installing black to use the correct version	2021-02-16 22:32:34 +00:00
Andrew Morgan	594f2853e0	Remove dead handled_events set in invite_join (#9394 ) This PR removes a set that was created and [initially used](`1d2a0040cf (diff-0bc92da3d703202f5b9be2d3f845e375f5b1a6bc6ba61705a8af9be1121f5e42R435-R436)`), but is no longer today. May help cut down a bit on the time it takes to accept invites.	2021-02-12 22:15:50 +00:00
Erik Johnston	ff55300b91	Honour ratelimit flag for application services for invite ratelimiting (#9302 )	2021-02-03 10:17:37 +00:00
Erik Johnston	f2c1560eca	Ratelimit invites by room and target user (#9258 )	2021-01-29 16:38:29 +00:00
Erik Johnston	dd8da8c5f6	Precompute joined hosts and store in Redis (#9198 )	2021-01-26 13:57:31 +00:00
David Teller	f14428b25c	Allow spam-checker modules to be provide async methods. (#8890 ) Spam checker modules can now provide async methods. This is implemented in a backwards-compatible manner.	2020-12-11 14:05:15 -05:00
Patrick Cloke	30fba62108	Apply an IP range blacklist to push and key revocation requests. (#8821 ) Replaces the `federation_ip_range_blacklist` configuration setting with an `ip_range_blacklist` setting with wider scope. It now applies to: * Federation * Identity servers * Push notifications * Checking key validitity for third-party invite events The old `federation_ip_range_blacklist` setting is still honored if present, but with reduced scope (it only applies to federation and identity servers).	2020-12-02 11:09:24 -05:00
Richard van der Hoff	950bb0305f	Consistently use room_id from federation request body (#8776 ) * Consistently use room_id from federation request body Some federation APIs have a redundant `room_id` path param (see https://github.com/matrix-org/matrix-doc/issues/2330). We should make sure we consistently use either the path param or the body param, and the body param is easier. * Kill off some references to "context" Once upon a time, "rooms" were known as "contexts". I think this kills of the last references to "contexts".	2020-11-19 10:05:33 +00:00
Andrew Morgan	e8d0853739	Generalise _maybe_store_room_on_invite (#8754 ) There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about. This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet. There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit. Separated out from #6739.	2020-11-13 16:24:04 +00:00

1 2 3 4 5 ...

821 Commits (25c412b3c57962104d7a9452f03a0fca7e999bc2)