Commit Graph

1037 Commits (4cc4229cd7a55d2556c798fecbb1c9660dc821c8)

Author SHA1 Message Date
Eric Eastwood a6f1a3abec
Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445)
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030

Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
    "event_id": ...
    "origin_server_ts": ...
}
```

Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
    "event_id": ...
    "origin_server_ts": ...
}
```

Co-authored-by: Erik Johnston <erik@matrix.org>
2021-12-02 01:02:20 -06:00
Patrick Cloke a4521ce0a8
Support the stable /hierarchy endpoint from MSC2946 (#11329)
This also makes additional updates where the implementation
had drifted from the approved MSC.

Unstable endpoints will be removed at a later data.
2021-11-29 14:32:20 -05:00
Patrick Cloke 9d1971a5c4
Return the stable `event` field from `/send_join` per MSC3083. (#11413)
This does not remove the unstable field and still parses both.
Handling of the unstable field will need to be removed in the
future.
2021-11-29 15:43:20 +00:00
Eric Eastwood f1d5c2f269
Split out federated PDU retrieval into a non-cached version (#11242)
Context: https://github.com/matrix-org/synapse/pull/11114/files#r741643968
2021-11-09 15:07:57 -06:00
Erik Johnston 98c8fc6ce8
Handle federation inbound instances being killed more gracefully (#11262)
* Make lock better handle process being killed

If the process gets killed and restarted (so that it didn't have a
chance to drop its locks gracefully) then there may still be locks in
the DB that are for the same instance that haven't yet timed out but are
safe to delete.

We handle this case by a) checking if the current instance already has
taken out the lock, and b) if not then ignoring locks that are for the
same instance.

* Periodically check for old staged events

This is to protect against other instances dying and their locks timing
out.
2021-11-08 09:54:47 +00:00
Nick Barrett af54167516
Enable passing typing stream writers as a list. (#11237)
This makes the typing stream writer config match the other stream writers
that only currently support a single worker.
2021-11-03 14:25:47 +00:00
Shay e81fa92648
Add `use_float=true` to ijson calls in Synapse (#11217)
* add use_float=true to ijson calls

* lints

* add changelog

* Update changelog.d/11217.bugfix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2021-11-01 09:28:04 -07:00
reivilibre 75ca0a6168
Annotate `log_function` decorator (#10943)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-10-27 17:27:23 +01:00
Sean Quah 2b82ec425f
Add type hints for most `HomeServer` parameters (#11095) 2021-10-22 18:15:41 +01:00
Patrick Cloke d1bf5f7c9d
Strip "join_authorised_via_users_server" from join events which do not need it. (#10933)
This fixes a "Event not signed by authorising server" error when
transition room member from join -> join, e.g. when updating a
display name or avatar URL for restricted rooms.
2021-09-30 11:13:59 -04:00
Richard van der Hoff 176aa55fd5
add event id to logcontext when handling incoming PDUs (#10936) 2021-09-29 11:59:43 +01:00
Patrick Cloke 94b620a5ed
Use direct references for configuration variables (part 6). (#10916) 2021-09-29 06:44:15 -04:00
Richard van der Hoff 85551b7a85
Factor out common code for persisting fetched auth events (#10896)
* Factor more stuff out of `_get_events_and_persist`

It turns out that the event-sorting algorithm in `_get_events_and_persist` is
also useful in other circumstances. Here we move the current
`_auth_and_persist_fetched_events` to `_auth_and_persist_fetched_events_inner`,
and then factor the sorting part out to `_auth_and_persist_fetched_events`.

* `_get_remote_auth_chain_for_event`: remove redundant `outlier` assignment

`get_event_auth` returns events with the outlier flag already set, so this is
redundant (though we need to update a test where `get_event_auth` is mocked).

* `_get_remote_auth_chain_for_event`: move existing-event tests earlier

Move a couple of tests outside the loop. This is a bit inefficient for now, but
a future commit will make it better. It should be functionally identical.

* `_get_remote_auth_chain_for_event`: use `_auth_and_persist_fetched_events`

We can use the same codepath for persisting the events fetched as part of an
auth chain as for those fetched individually by `_get_events_and_persist` for
building the state at a backwards extremity.

* `_get_remote_auth_chain_for_event`: use a dict for efficiency

`_auth_and_persist_fetched_events` sorts the events itself, so we no longer
need to care about maintaining the ordering from `get_event_auth` (and no
longer need to sort by depth in `get_event_auth`).

That means that we can use a map, making it easier to filter out events we
already have, etc.

* changelog

* `_auth_and_persist_fetched_events`: improve docstring
2021-09-24 11:56:33 +01:00
Patrick Cloke 47854c71e9
Use direct references for configuration variables (part 4). (#10893) 2021-09-23 12:03:01 -04:00
Andrew Morgan aa2c027792
Remove unnecessary parentheses around tuples returned from methods (#10889) 2021-09-23 11:59:07 +01:00
Patrick Cloke 8c7a531e27
Use direct references for some configuration variables (part 2) (#10812) 2021-09-15 08:34:52 -04:00
Patrick Cloke 01c88a09cd
Use direct references for some configuration variables (#10798)
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
2021-09-13 13:07:12 -04:00
reivilibre 524b8ead77
Add types to synapse.util. (#10601) 2021-09-10 17:03:18 +01:00
Richard van der Hoff 1800aabfc2
Split `FederationHandler` in half (#10692)
The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.
2021-08-26 21:41:44 +01:00
Patrick Cloke 5548fe0978
Cache the result of fetching the room hierarchy over federation. (#10647) 2021-08-26 07:16:53 -04:00
Patrick Cloke 31dac7ffee
Do not include stack traces for known exceptions when trying multiple federation destinations. (#10662) 2021-08-23 08:00:25 -04:00
Richard van der Hoff e81d62009e
Split `on_receive_pdu` in half (#10640)
Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.
2021-08-19 17:05:12 +00:00
Patrick Cloke c4cf0c0473
Attempt to pull from the legacy spaces summary API over federation. (#10583)
If the new /hierarchy API does not exist on all destinations,
fallback to querying the /spaces API and translating the results.

This is a backwards compatibility hack since not all of the
federated homeservers will update at the same time.
2021-08-17 08:19:12 -04:00
Patrick Cloke 5af83efe8d
Validate the max_rooms_per_space parameter to ensure it is non-negative. (#10611) 2021-08-16 12:01:30 -04:00
Michael Telatynski 0ace38b7b3
Experimental support for MSC3266 Room Summary API. (#10394) 2021-08-16 14:49:12 +00:00
Patrick Cloke 87b62f8bb2
Split `synapse.federation.transport.server` into multiple files. (#10590) 2021-08-16 10:14:31 -04:00
Richard van der Hoff 2d9ca4ca77
Clean up some logging in the federation event handler (#10591)
* Include outlier status in `str(event)`

In places where we log event objects, knowing whether or not you're dealing
with an outlier is super useful.

* Remove duplicated logging in get_missing_events

When we process events received from get_missing_events, we log them twice
(once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce
the duplication by removing the logging in `on_receive_pdu`, and ensuring the
call sites do sensible logging.

* log in `on_receive_pdu` when we already have the event

* Log which prev_events we are missing

* changelog
2021-08-16 13:19:02 +01:00
Patrick Cloke 7de445161f
Support federation in the new spaces summary API (MSC2946). (#10569) 2021-08-16 08:06:17 -04:00
Patrick Cloke c12b5577f2
Fix a harmless exception when the staged events queue is empty. (#10592) 2021-08-13 11:49:06 +00:00
Patrick Cloke 1de26b3467
Convert Transaction and Edu object to attrs (#10542)
Instead of wrapping the JSON into an object, this creates concrete
instances for Transaction and Edu. This allows for improved type
hints and simplified code.
2021-08-06 09:39:59 -04:00
Erik Johnston 60f0534b6e
Fix exceptions in logs when failing to get remote room list (#10541) 2021-08-06 14:05:41 +01:00
Patrick Cloke 3b354faad0
Refactoring before implementing the updated spaces summary. (#10527)
This should have no user-visible changes, but refactors some pieces of
the SpaceSummaryHandler before adding support for the updated
MSC2946.
2021-08-05 12:39:17 +00:00
Erik Johnston 01d45fe964
Prune inbound federation queues if they get too long (#10390) 2021-08-02 13:37:25 +00:00
Patrick Cloke 3a541a7daa
Improve failover logic for MSC3083 restricted rooms. (#10447)
If the federation client receives an M_UNABLE_TO_AUTHORISE_JOIN or
M_UNABLE_TO_GRANT_JOIN response it will attempt another server
before giving up completely.
2021-07-29 11:50:14 +00:00
Patrick Cloke 228decfce1
Update the MSC3083 support to verify if joins are from an authorized server. (#10254) 2021-07-26 12:17:00 -04:00
Patrick Cloke 4fb92d93ea
Add type hints to synapse.federation.transport.client. (#10408) 2021-07-26 11:53:09 -04:00
Patrick Cloke 590cc4e888
Add type hints to additional servlet functions (#10437)
Improves type hints for:

* parse_{boolean,integer}
* parse_{boolean,integer}_from_args
* parse_json_{value,object}_from_request

And fixes any incorrect calls that resulted from unknown types.
2021-07-21 18:12:22 +00:00
Patrick Cloke d427f64724
Do not include signatures/hashes in make_{join,leave,knock} responses. (#10404)
These signatures would end up invalid since the joining/leaving/knocking
server would modify the response before calling send_{join,leave,knock}.
2021-07-16 10:36:38 -04:00
Erik Johnston ac5c221208
Stagger send presence to remotes (#10398)
This is to help with performance, where trying to connect to thousands
of hosts at once can consume a lot of CPU (due to TLS etc).

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2021-07-15 11:52:56 +01:00
Jonathan de Jong bf72d10dbf
Use inline type hints in various other places (in `synapse/`) (#10380) 2021-07-15 11:02:43 +01:00
Patrick Cloke 30b56f6925
Add type hints to get_domain_from_id and get_localpart_from_id. (#10385) 2021-07-13 12:08:47 -04:00
Erik Johnston 1579fdd54a
Ensure we always drop the federation inbound lock (#10336) 2021-07-09 10:16:54 +01:00
Erik Johnston c65067d673
Handle old staged inbound events (#10303)
We might have events in the staging area if the service was restarted while there were unhandled events in the staging area.

Fixes #10295
2021-07-06 13:02:37 +01:00
Patrick Cloke 8d609435c0
Move methods involving event authentication to EventAuthHandler. (#10268)
Instead of mixing them with user authentication methods.
2021-07-01 14:25:37 -04:00
Erik Johnston 329ef5c715
Fix the inbound PDU metric (#10279)
This broke in #10272
2021-06-30 12:07:16 +01:00
Richard van der Hoff 785bceef72 Merge branch 'release-v1.37' into develop 2021-06-29 20:25:47 +01:00
Erik Johnston c54db67d0e
Handle inbound events from federation asynchronously (#10272)
Fixes #9490

This will break a couple of SyTest that are expecting failures to be added to the response of a federation /send, which obviously doesn't happen now that things are asynchronous.

Two drawbacks:

    Currently there is no logic to handle any events left in the staging area after restart, and so they'll only be handled on the next incoming event in that room. That can be fixed separately.
    We now only process one event per room at a time. This can be fixed up further down the line.
2021-06-29 19:55:22 +01:00
Richard van der Hoff a0ed0f363e
Soft-fail spammy events received over federation (#10263) 2021-06-29 11:08:06 +01:00
Patrick Cloke 0555d7b0dc
Add additional types to the federation transport server. (#10213) 2021-06-28 07:36:41 -04:00
Richard van der Hoff 6e8fb42be7
Improve validation for `send_{join,leave,knock}` (#10225)
The idea here is to stop people sending things that aren't joins/leaves/knocks through these endpoints: previously you could send anything you liked through them. I wasn't able to find any security holes from doing so, but it doesn't sound like a good thing.
2021-06-24 15:30:49 +01:00
Richard van der Hoff 91fa9cca99
Expose opentracing trace id in response headers (#10199)
Fixes: #9480
2021-06-18 11:43:22 +01:00
Patrick Cloke 9e5ab6dd58
Remove the experimental flag for knocking and use stable prefixes / endpoints. (#10167)
* Room version 7 for knocking.
* Stable prefixes and endpoints (both client and federation) for knocking.
* Removes the experimental configuration flag.
2021-06-15 07:45:14 -04:00
Sorunome d936371b69
Implement knock feature (#6739)
This PR aims to implement the knock feature as proposed in https://github.com/matrix-org/matrix-doc/pull/2403

Signed-off-by: Sorunome mail@sorunome.de
Signed-off-by: Andrew Morgan andrewm@element.io
2021-06-09 19:39:51 +01:00
Patrick Cloke c7f3fb2745
Add type hints to the federation server transport. (#10080) 2021-06-08 11:19:25 -04:00
Erik Johnston c842c581ed
When joining a remote room limit the number of events we concurrently check signatures/hashes for (#10117)
If we do hundreds of thousands at once the memory overhead can easily reach 500+ MB.
2021-06-08 11:07:46 +01:00
Erik Johnston fc3d2dc269
Rewrite the KeyRing (#10035) 2021-06-02 16:37:59 +01:00
Andrew Morgan 3ff6fe2851 Merge branch 'master' into develop 2021-06-01 13:47:27 +01:00
Erik Johnston 84cf3e47a0
Allow response of `/send_join` to be larger. (#10093)
Fixes #10087.
2021-05-28 16:28:01 +01:00
Richard van der Hoff ed53bf314f
Set opentracing priority before setting other tags (#10092)
... because tags on spans which aren't being sampled get thrown away.
2021-05-28 16:14:08 +01:00
Erik Johnston 8e132fe64e Synapse 1.35.0rc2 (2021-05-27)
==============================
 
 Bugfixes
 --------
 
 - Fix a bug introduced in v1.35.0rc1 when calling the spaces summary API via a GET request. ([\#10079](https://github.com/matrix-org/synapse/issues/10079))
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmCvpZMQHGVyaWtAbWF0
 cml4Lm9yZwAKCRClQuTtGw+sCdPwCACQIlIWd6eIoXLUc+wDcrd+k5xL376EdYah
 x7ABswiYSm+9C4xr58gJD3xc6eiD2PCIWdZN0rsQDLIOfSXW6x1lyKD+Ds0HySok
 MaVpsoxbb9o/Zf9qtXF2bLSArZUQwfoNaA45NLgNzfUIijf1e+bd2wNEgHlRSoMz
 m10GggOFU0Ds/CCYZpxZw/eXDbLWL7eaHR30vw/jQ1cEsV+S4ucnUmLHFCF7YCyI
 Np80pnywH5cYKerecldFWenL4YZJswVbx+AW9e3lBzq5jOrRZkmLkaHg10mCR2f6
 CV03ie65Ce+7x5UU6v6nHA0DYUTQGIjlJBtyCN3tFglUduQ8Gpu0
 =Mfsf
 -----END PGP SIGNATURE-----

Merge tag 'v1.35.0rc2' into develop

Synapse 1.35.0rc2 (2021-05-27)
==============================

Bugfixes
--------

- Fix a bug introduced in v1.35.0rc1 when calling the spaces summary API via a GET request. ([\#10079](https://github.com/matrix-org/synapse/issues/10079))
2021-05-27 14:59:46 +01:00
Patrick Cloke 8e15c92c2f
Pass the origin when calculating the spaces summary over GET. (#10079)
Fixes a bug due to conflicting PRs which were merged. (One added a new caller to
a method, the other added a new parameter to the same method.)
2021-05-27 08:52:28 -04:00
Patrick Cloke f42e4c4eb9
Remove the experimental spaces enabled flag. (#10063)
In lieu of just always enabling the unstable spaces endpoint and
unstable room version.
2021-05-26 14:35:16 -04:00
Erik Johnston 3e831f24ff
Don't hammer the database for destination retry timings every ~5mins (#10036) 2021-05-21 17:57:08 +01:00
Erik Johnston 1c6a19002c
Add `Keyring.verify_events_for_server` and reduce memory usage (#10018)
Also add support for giving a callback to generate the JSON object to
verify. This should reduce memory usage, as we no longer have the event
in memory in dict form (which has a large memory footprint) for extend
periods of time.
2021-05-20 16:25:11 +01:00
Erik Johnston 64887f06fc
Use ijson to parse the response to `/send_join`, reducing memory usage. (#9958)
Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
2021-05-20 16:11:48 +01:00
Patrick Cloke 551d2c3f4b
Allow a user who could join a restricted room to see it in spaces summary. (#9922)
This finishes up the experimental implementation of MSC3083 by showing
the restricted rooms in the spaces summary (from MSC2946).
2021-05-20 11:10:36 -04:00
Patrick Cloke f4833e0c06
Support fetching the spaces summary via GET over federation. (#9947)
Per changes in MSC2946, the C-S and S-S APIs for spaces summary
should use GET requests.

Until this is stable, the POST endpoints still exist.

This does not switch federation requests to use the GET version yet
since it is newly added and already deployed servers might not support
it. When switching to the stable endpoint we should switch to GET
requests.
2021-05-11 12:21:43 -04:00
Richard van der Hoff b378d98c8f
Add debug logging for issue #9533 (#9959)
Hopefully this will help us track down where to-device messages are getting
lost/delayed.
2021-05-11 11:04:03 +01:00
Richard van der Hoff 7967b36efe
Fix `m.room_key_request` to-device messages (#9961)
fixes #9960
2021-05-11 11:02:56 +01:00
Andrew Morgan 4e0fd35bc9 Revert "Experimental Federation Speedup (#9702)"
This reverts commit 05e8c70c05.
2021-04-28 11:38:33 +01:00
Patrick Cloke 1350b053da
Pass errors back to the client when trying multiple federation destinations. (#9868)
This ensures that something like an auth error (403) will be
returned to the requester instead of attempting to try more
servers, which will likely result in the same error, and then
passing back a generic 400 error.
2021-04-27 07:30:34 -04:00
Richard van der Hoff 294c675033
Remove `synapse.types.Collection` (#9856)
This is no longer required, since we have dropped support for Python 3.5.
2021-04-22 16:43:50 +01:00
Erik Johnston db70435de7
Fix bug where we sent remote presence states to remote servers (#9850) 2021-04-20 13:37:54 +01:00
Jonathan de Jong 495b214f4f
Fix (final) Bugbear violations (#9838) 2021-04-20 11:50:49 +01:00
Erik Johnston 2b7dd21655
Don't send normal presence updates over federation replication stream (#9828) 2021-04-19 10:50:49 +01:00
Richard van der Hoff 5a153772c1
remove `HomeServer.get_config` (#9815)
Every single time I want to access the config object, I have to remember
whether or not we use `get_config`. Let's just get rid of it.
2021-04-14 19:09:08 +01:00
Jonathan de Jong 05e8c70c05
Experimental Federation Speedup (#9702)
This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max).

Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
2021-04-14 17:19:02 +01:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Richard van der Hoff f946450184
Fix duplicate logging of exceptions in transaction processing (#9780)
There's no point logging this twice.
2021-04-09 18:12:15 +01:00
Jonathan de Jong 2ca4e349e9
Bugbear: Add Mutable Parameter fixes (#9682)
Part of #9366

Adds in fixes for B006 and B008, both relating to mutable parameter lint errors.

Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
2021-04-08 22:38:54 +01:00
Erik Johnston 3a569fb200 Fix sharded federation sender sometimes using 100% CPU.
We pull all destinations requiring catchup from the DB in batches.
However, if all those destinations get filtered out (due to the
federation sender being sharded), then the `last_processed` destination
doesn't get updated, and we keep requesting the same set repeatedly.
2021-04-08 17:34:07 +01:00
Andrew Morgan 04819239ba
Add a Synapse Module for configuring presence update routing (#9491)
At the moment, if you'd like to share presence between local or remote users, those users must be sharing a room together. This isn't always the most convenient or useful situation though.

This PR adds a module to Synapse that will allow deployments to set up extra logic on where presence updates should be routed. The module must implement two methods, `get_users_for_states` and `get_interested_users`. These methods are given presence updates or user IDs and must return information that Synapse will use to grant passing presence updates around.

A method is additionally added to `ModuleApi` which allows triggering a set of users to receive the current, online presence information for all users they are considered interested in. This is the equivalent of that user receiving presence information during an initial sync. 

The goal of this module is to be fairly generic and useful for a variety of applications, with hard requirements being:

* Sending state for a specific set or all known users to a defined set of local and remote users.
* The ability to trigger an initial sync for specific users, so they receive all current state.
2021-04-06 14:38:30 +01:00
Patrick Cloke 44bb881096
Add type hints to expiring cache. (#9730) 2021-04-06 08:58:18 -04:00
Patrick Cloke d959d28730
Add type hints to the federation handler and server. (#9743) 2021-04-06 07:21:57 -04:00
Erik Johnston 33548f37aa
Improve tracing for to device messages (#9686) 2021-04-01 17:08:21 +01:00
Erik Johnston 963f4309fe
Make RateLimiter class check for ratelimit overrides (#9711)
This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited.

We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits.

Fixes #9663
2021-03-30 12:06:09 +01:00
Patrick Cloke da75d2ea1f
Add type hints for the federation sender. (#9681)
Includes an abstract base class which both the FederationSender
and the FederationRemoteSendQueue must implement.
2021-03-29 11:43:20 -04:00
Erik Johnston c602ba8336
Fixed undefined variable error in catchup (#9664)
Broke in #9640

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-03-24 16:12:47 +00:00
Richard van der Hoff c73cc2c2ad
Spaces summary: call out to other servers (#9653)
When we hit an unknown room in the space tree, see if there are other servers that we might be able to poll to get the data.

Fixes: #9447
2021-03-24 12:45:39 +00:00
Richard van der Hoff 4ecba9bd5c
Federation API for Space summary (#9652)
Builds on the work done in #9643 to add a federation API for space summaries.

There's a bit of refactoring of the existing client-server code first, to avoid too much duplication.
2021-03-23 11:51:12 +00:00
Patrick Cloke b7748d3c00
Import HomeServer from the proper module. (#9665) 2021-03-23 07:12:48 -04:00
Erik Johnston dd71eb0f8a
Make federation catchup send last event from any server. (#9640)
Currently federation catchup will send the last *local* event that we
failed to send to the remote. This can cause issues for large rooms
where lots of servers have sent events while the remote server was down,
as when it comes back up again it'll be flooded with events from various
points in the DAG.

Instead, let's make it so that all the servers send the most recent
events, even if its not theirs. The remote should deduplicate the
events, so there shouldn't be much overhead in doing this.
Alternatively, the servers could only send local events if they were
also extremities and hope that the other server will send the event
over, but that is a bit risky.
2021-03-18 15:52:26 +00:00
Erik Johnston 026503fa3b
Don't go into federation catch up mode so easily (#9561)
Federation catch up mode is very inefficient if the number of events
that the remote server has missed is small, since handling gaps can be
very expensive, c.f. #9492.

Instead of going into catch up mode whenever we see an error, we instead
do so only if we've backed off from trying the remote for more than an
hour (the assumption being that in such a case it is more than a
transient failure).
2021-03-15 14:42:40 +00:00
Patrick Cloke 55da8df078
Fix additional type hints from Twisted 21.2.0. (#9591) 2021-03-12 11:37:57 -05:00
Richard van der Hoff 1e67bff833
Reject concurrent transactions (#9597)
If more transactions arrive from an origin while we're still processing the
first one, reject them.

Hopefully a quick fix to https://github.com/matrix-org/synapse/issues/9489
2021-03-12 15:14:55 +00:00
Richard van der Hoff 2b328d7e02
Improve logging when processing incoming transactions (#9596)
Put the room id in the logcontext, to make it easier to understand what's going on.
2021-03-12 15:08:03 +00:00
Patrick Cloke 2a99cc6524
Use the chain cover index in get_auth_chain_ids. (#9576)
This uses a simplified version of get_chain_cover_difference to calculate
auth chain of events.
2021-03-10 09:57:59 -05:00
Patrick Cloke 7fdc6cefb3
Fix additional type hints. (#9543)
Type hint fixes due to Twisted 21.2.0 adding type hints.
2021-03-09 07:41:32 -05:00
Jonathan de Jong d6196efafc
Add ResponseCache tests. (#9458) 2021-03-08 14:00:07 -05:00
Richard van der Hoff 8a4b3738f3
Replace `last_*_pdu_age` metrics with timestamps (#9540)
Following the advice at
https://prometheus.io/docs/practices/instrumentation/#timestamps-not-time-since,
it's preferable to export unix timestamps, not ages.

There doesn't seem to be any particular naming convention for timestamp
metrics.
2021-03-04 16:40:18 +00:00