Commit Graph

1106 Commits (dc901a885f9a7111c97b2935d51d2d05a26db47b)

Author SHA1 Message Date
David Robertson 8e37ece015
Bump the client-side timeout for /state (#14912)
* Bump the client-side timeout for /state

to allow faster joins resyncs the chance to complete for large rooms.
We have seen this fair poorly (~90s for Matrix HQ's /state) in testing,
causing the resync to advance to another HS who hasn't seen our join yet.

* Changelog

* Milliseconds!!!!
2023-01-25 16:11:06 +00:00
Sean Quah d329a566df
Faster joins: Fix incompatibility with restricted joins (#14882)
* Avoid clearing out forward extremities when doing a second remote join

When joining a restricted room where the local homeserver does not have
a user able to issue invites, we perform a second remote join. We want
to avoid clearing out forward extremities in this case because the
forward extremities we have are up to date and clearing out forward
extremities creates a window in which the room can get bricked if
Synapse crashes.

Signed-off-by: Sean Quah <seanq@matrix.org>

* Do a full join when doing a second remote join into a full state room

We cannot persist a partial state join event into a joined full state
room, so we perform a full state join for such rooms instead. As a
future optimization, we could always perform a partial state join and
compute or retrieve the full state ourselves if necessary.

Signed-off-by: Sean Quah <seanq@matrix.org>

* Add lock around partial state flag for rooms

Signed-off-by: Sean Quah <seanq@matrix.org>

* Preserve partial state info when doing a second partial state join

Signed-off-by: Sean Quah <seanq@matrix.org>

* Add newsfile

* Add a TODO(faster_joins) marker

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-22 19:19:31 +00:00
David Robertson 5b3af1c7d0
Stabilise serving partial join responses (#14839)
Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true.

Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
2023-01-17 12:44:15 +00:00
Sean Quah db5145a31d
Add parameter to control whether we do a partial state join (#14843)
When the local homeserver is already joined to a room and wants to
perform another remote join, we may find it useful to do a non-partial
state join if we already have the full state for the room.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-16 23:15:17 +00:00
David Robertson 85a7a201fa
Also use stable name in SendJoinResponse struct (#14841)
* Also use stable name in SendJoinResponse struct

follow-up to #14832

* Changelog

* Fix a rename I missed

* Run black

* Update synapse/federation/federation_client.py

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2023-01-16 12:40:25 +00:00
David Robertson 52ae80dd1a
Use stable identifiers for faster joins (#14832)
* Use new query param when requesting a partial join

* Read new query param when serving partial join

* Provide new field names when serving partial joins

* Read new field names from partial join response

* Changelog
2023-01-13 17:58:53 +00:00
Patrick Cloke 9b6224577e
Failover on proper error responses. (#14620)
When querying a remote server handle a 404/405 with an
errcode of M_UNRECOGNIZED as an unimplemented endpoint.
2022-12-06 07:23:03 -05:00
Richard van der Hoff cb59e08062
Improve logging and opentracing for to-device message handling (#14598)
A batch of changes intended to make it easier to trace to-device messages through the system.

The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.)

I've also generally improved the data we send to opentracing for these messages.
2022-12-06 09:52:55 +00:00
Mathieu Velten 4569eda944
Use servers list approx to send read receipts when in partial state (#14549)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
2022-11-30 13:39:47 +01:00
Eric Eastwood 8f10c8b054
Move MSC3030 `/timestamp_to_event` endpoint to stable v1 location (#14471)
Fix https://github.com/matrix-org/synapse/issues/14390

 - Client API: `/_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>` -> `/_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>`
 - Federation API: `/_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>` -> `/_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>`

Complement test changes: https://github.com/matrix-org/complement/pull/559
2022-11-28 15:54:18 -06:00
Patrick Cloke d748bbc8f8
Include thread information when sending receipts over federation. (#14466)
Include the thread_id field when sending read receipts over
federation. This might result in the same user having multiple
read receipts per-room, meaning multiple EDUs must be sent
to encapsulate those receipts.

This restructures the PerDestinationQueue APIs to support
multiple receipt EDUs, queue_read_receipt now becomes linear
time in the number of queued threaded receipts in the room for
the given user, it is expected this is a small number since receipt
EDUs are sent as filler in transactions.
2022-11-28 14:40:17 +00:00
Mathieu Velten 39cde585bf
Faster joins: use initial list of servers if we don't have the full state yet (#14408)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-11-24 18:09:47 +01:00
Mathieu Velten 1526ff389f
Faster joins: filter out non local events when a room doesn't have its full state (#14404)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
2022-11-21 16:46:14 +01:00
Patrick Cloke d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
David Robertson 1eed795fc5
Include heroes in partial join responses' state (#14442)
* Pull out hero selection logic

* Include heroes in partial join response's state

* Changelog

* Fixup trial test

* Remove TODO
2022-11-15 17:35:19 +00:00
David Robertson d4fac8a3e2
Fix typo in #13320 which could cause log spam (#14347) 2022-11-01 19:20:35 +00:00
Eric Eastwood 40fa8294e3
Refactor MSC3030 `/timestamp_to_event` to move away from our snowflake pull from `destination` pattern (#14096)
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
2022-10-26 16:10:55 -05:00
Olivier Wilkinson (reivilibre) 85fcbba595 Merge branch 'release-v1.70' into develop 2022-10-25 15:39:35 +01:00
Erik Johnston 09b588854e
Fix `TypeError: 'dict_keys' object is not reversible` (#14280) 2022-10-24 13:05:14 +01:00
Andrew Morgan da2c93d4b6
Stop returning `unsigned.invite_room_state` in `PUT /_matrix/federation/v2/invite/{roomId}/{eventId}` responses (#14064)
Co-authored-by: David Robertson <davidr@element.io>
2022-10-20 15:17:45 +01:00
Eric Eastwood 70b3396506
Explain `SynapseError` and `FederationError` better (#14191)
Explain `SynapseError` and `FederationError` better

Spawning from https://github.com/matrix-org/synapse/pull/13816#discussion_r993262622
2022-10-19 15:39:43 -05:00
Andrew Morgan 97b3d037c0
Don't require optional `invite_room_state` field on fed v2 invite (#14083) 2022-10-14 13:48:33 +01:00
Andrew Morgan 9c23442ac9
Correct field name for stripped state events when knocking. `knock_state_events` -> `knock_room_state` (#14102) 2022-10-12 14:37:20 +01:00
Shay a86b2f6837
Fix a bug where redactions were not being sent over federation if we did not have the original event. (#13813) 2022-10-11 11:18:45 -07:00
David Robertson cb20b885cb
Always close _all_ `ijson` coroutines, even if doing so raises Exceptions (#14065) 2022-10-06 18:17:50 +00:00
Eric Eastwood 70a4317692
Track when the pulled event signature fails (#13815)
Because we're doing the recording in `_check_sigs_and_hash_for_pulled_events_and_fetch` (previously named `_check_sigs_and_hash_and_fetch`), this means we will track signature failures for `backfill`, `get_room_state`, `get_event_auth`, and `get_missing_events` (all pulled event scenarios). And we also record signature failures from `get_pdu`.

Part of https://github.com/matrix-org/synapse/issues/13700

Part of https://github.com/matrix-org/synapse/issues/13676 and https://github.com/matrix-org/synapse/issues/13356

This PR will be especially important for https://github.com/matrix-org/synapse/pull/13816 so we can avoid the costly `_get_state_ids_after_missing_prev_event` down the line when `/messages` calls backfill.
2022-10-03 14:53:29 -05:00
Erik Johnston 299b00d968
Prioritize outbound to-device over device list updates (#13922)
Otherwise device list changes for large accounts can temporarily delay to-device messages.
2022-09-27 15:17:41 +01:00
reivilibre c06b2b7142
Faster Remote Room Joins: tell remote homeservers that we are unable to authorise them if they query a room which has partial state on our server. (#13823) 2022-09-23 11:47:16 +01:00
Denis c802ef1411
Don't include redundant prev_state in new events (#13791) 2022-09-20 09:44:38 +01:00
reivilibre 21687ec189
Fix a long-standing spec compliance bug where Synapse would accept a trailing slash on the end of `/get_missing_events` federation requests. (#13789)
* Don't accept a trailing slash on the end of /get_missing_events

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-09-14 09:28:12 +01:00
reivilibre 526f84bc2e
Fix Prometheus recording rules to not use legacy metric names. (#13718) 2022-09-08 15:01:42 +01:00
reivilibre c2fe48a6ff
Rename the `EventFormatVersions` enum values so that they line up with room version numbers. (#13706) 2022-09-07 11:08:20 +01:00
Erik Johnston 2318603772
Add some logging to help track down #13444 (#13679) 2022-09-01 13:54:52 +01:00
reivilibre 7bc110a19e
Generalise the `@cancellable` annotation so it can be used on functions other than just servlet methods. (#13662) 2022-08-31 11:16:05 +00:00
reivilibre ba882c0357
Faster Room Joins: fix `/make_knock` blocking indefinitely when the room in question is a partial-stated room. (#13583)
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-08-24 09:09:59 +00:00
Eric Eastwood 7af07f9716
Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls (#13588)
Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls because I've see `_check_sigs_and_hash_and_fetch` take [10.41s to process 100 events](https://github.com/matrix-org/synapse/issues/13587)

Fix https://github.com/matrix-org/synapse/issues/13587

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-23 21:53:37 -05:00
Eric Eastwood 0a4efbc1dd
Instrument the federation/backfill part of `/messages` (#13489)
Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace.

Split out from https://github.com/matrix-org/synapse/pull/13440

Follow-up from https://github.com/matrix-org/synapse/pull/13368

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-16 12:39:40 -05:00
Eric Eastwood 344a2f767c
Instrument `FederationStateIdsServlet` - `/state_ids` (#13499)
Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.
2022-08-15 19:41:23 +01:00
reivilibre e9e6aacfbe
Faster Room Joins: prevent Synapse from answering federated join requests for a room which it has not fully joined yet. (#13416) 2022-08-04 16:27:04 +01:00
Eric Eastwood 92d21faf12
Instrument `/messages` for understandable traces in Jaeger (#13368)
In Jaeger:

 - Before: huge list of uncategorized database calls
 - After: nice and collapsible into units of work
2022-08-03 10:57:38 -05:00
Will Hunt 502f075e96
Implement MSC3848: Introduce errcodes for specific event sending failures (#13343)
Implements MSC3848
2022-07-27 13:44:40 +01:00
reivilibre 39be5bc550
Make minor clarifications to the error messages given when we fail to join a room via any server. (#13160) 2022-07-27 10:37:50 +00:00
Eric Eastwood 4f3082d6bf
Fix `get_pdu` asking every remote destination even after it finds an event (#13346) 2022-07-27 10:40:04 +01:00
Patrick Cloke 50122754c8
Add missing types to opentracing. (#13345)
After this change `synapse.logging` is fully typed.
2022-07-21 12:01:52 +00:00
Eric Eastwood 0f971ca68e
Update `get_pdu` to return the original, pristine `EventBase` (#13320)
Update `get_pdu` to return the untouched, pristine `EventBase` as it was originally seen over federation (no metadata added). Previously, we returned the same `event` reference that we stored in the cache which downstream code modified in place and added metadata like setting it as an `outlier`  and essentially poisoned our cache. Now we always return a copy of the `event` so the original can stay pristine in our cache and re-used for the next cache call.

Split out from https://github.com/matrix-org/synapse/pull/13205

As discussed at:

 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918365746
 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918366125

Related to https://github.com/matrix-org/synapse/issues/12584. This PR doesn't fix that issue because it hits [`get_event` which exists from the local database before it tries to `get_pdu`](7864f33e28/synapse/federation/federation_client.py (L581-L594)).
2022-07-20 15:58:51 -05:00
Patrick Cloke a6895dd576
Add type annotations to `trace` decorator. (#13328)
Functions that are decorated with `trace` are now properly typed
and the type hints for them are fixed.
2022-07-19 14:14:30 -04:00
David Robertson b977867358
Rate limit joins per-room (#13276) 2022-07-19 11:45:17 +00:00
Nick Mills-Barrett 21eeacc995
Federation Sender & Appservice Pusher Stream Optimisations (#13251)
* Replace `get_new_events_for_appservice` with `get_all_new_events_stream`

The functions were near identical and this brings the AS worker closer
to the way federation senders work which can allow for multiple workers
to handle AS traffic.

* Pull received TS alongside events when processing the stream

This avoids an extra query -per event- when both federation sender
and appservice pusher process events.
2022-07-15 09:36:56 +01:00
Sean Quah 68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
Patrick Cloke 81608490e3
Stop depending on `room_id` to be returned for children state in the hierarchy response. (#12991)
The `room_id` field was removed from MSC2946 before
it was accepted. It was initially kept for backwards compatibility
and should be removed now that the stable form of the API
is used.

This change only stops Synapse from validating that it is returned,
a future PR will remove returning it as part of the response.
2022-06-10 07:15:51 -04:00