Commit Graph

812 Commits (6d14fdc2710688014a7a66cc48485462c6e86a1e)

Author SHA1 Message Date
Sean Quah a302d3ecf7
Remove unnecessary reactor reference from `_PerHostRatelimiter` (#14842)
Fix up #14812 to avoid introducing a reference to the reactor.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-16 13:16:19 +00:00
Sean Quah 772e8c2385
Fix stack overflow in `_PerHostRatelimiter` due to synchronous requests (#14812)
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.

A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.

Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.

Fixes #14480.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-13 00:16:21 +00:00
reivilibre ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Patrick Cloke 630d0aeaf6
Support RFC7636 PKCE in the OAuth 2.0 flow. (#14750)
PKCE can protect against certain attacks and is enabled by default. Support
can be controlled manually by setting the pkce_method of each oidc_providers
entry to 'auto' (default), 'always', or 'never'.

This is required by Twitter OAuth 2.0 support.
2023-01-04 14:58:08 -05:00
Patrick Cloke 3aeca2588b
Add missing type hints to tests.config. (#14681) 2022-12-16 08:53:28 -05:00
reivilibre 864c3f85b0
Improve type annotations for the helper methods on a `CachedFunction`. (#14685) 2022-12-16 13:04:54 +00:00
Mathieu Velten 54c012c5a8
Make `handle_new_client_event` throws `PartialStateConflictError` (#14665)
Then adapts calling code to retry when needed so it doesn't 500
to clients.

Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-12-15 16:04:23 +00:00
Patrick Cloke 9d8a3234ba
Respond with proper error responses on unknown paths. (#14621)
Returns a proper 404 with an errcode of M_RECOGNIZED for
unknown endpoints per MSC3743.
2022-12-08 11:37:05 -05:00
Patrick Cloke da77720752
Check the stream position before checking if the cache is empty. (#14639)
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
2022-12-08 11:35:49 -05:00
Erik Johnston cee9445884
Better return type for `get_all_entities_changed` (#14604)
Help callers from using the return value incorrectly by ensuring
that callers explicitly check if there was a cache hit or not.
2022-12-05 15:19:14 -05:00
Patrick Cloke 6a8310f3df
Compare to the earliest known stream pos in the stream change cache. (#14435)
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
2022-12-05 09:00:59 -05:00
Patrick Cloke 1799a54a54
Batch fetch bundled annotations (#14491)
Avoid an n+1 query problem and fetch the bundled aggregations for
m.annotation relations in a single query instead of a query per event.

This applies similar logic for as was previously done for edits in
8b309adb43 (#11660) and threads
in b65acead42 (#11752).
2022-11-22 07:26:11 -05:00
Patrick Cloke d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Patrick Cloke 13ca8bb2fc
Remove duplicated code to evict entries. (#14410)
This code was factored out to a method, but also left in-place.

Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
2022-11-10 15:33:34 -05:00
Eric Eastwood 40fa8294e3
Refactor MSC3030 `/timestamp_to_event` to move away from our snowflake pull from `destination` pattern (#14096)
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
2022-10-26 16:10:55 -05:00
Quentin Gliech 8756d5c87e
Save login tokens in database (#13844)
* Save login tokens in database

Signed-off-by: Quentin Gliech <quenting@element.io>

* Add upgrade notes

* Track login token reuse in a Prometheus metric

Signed-off-by: Quentin Gliech <quenting@element.io>
2022-10-26 11:45:41 +01:00
Nick Mills-Barrett c9dffd5b33
Remove unused `@lru_cache` decorator (#13595)
* Remove unused `@lru_cache` decorator

Spotted this working on something else.

Co-authored-by: David Robertson <davidr@element.io>
2022-10-25 11:39:25 +01:00
dependabot[bot] 0b7830e457
Bump flake8-bugbear from 21.3.2 to 22.9.23 (#14042)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
2022-10-19 19:38:24 +00:00
Abdullah Osama a9934d48c1
Making parse_server_name more consistent (#14007)
Fixes #12122
2022-10-11 12:42:11 +00:00
David Robertson cb20b885cb
Always close _all_ `ijson` coroutines, even if doing so raises Exceptions (#14065) 2022-10-06 18:17:50 +00:00
David Robertson 6f0c3e669d
Don't require `setuptools_rust` at runtime (#13952) 2022-09-29 20:16:08 +00:00
Eric Eastwood 29269d9d3f
Fix `have_seen_event` cache not being invalidated (#13863)
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- https://github.com/matrix-org/synapse/issues/13856*


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
2022-09-27 15:55:43 -05:00
Mathieu Velten 6bd8763804
Add cache invalidation across workers to module API (#13667)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
2022-09-21 15:32:01 +02:00
reivilibre cf65433de2
Fix a memory leak when running the unit tests. (#13798) 2022-09-14 15:29:05 +00:00
Erik Johnston ebfeac7c5d
Check if Rust lib needs rebuilding. (#13759)
This protects against the common mistake of failing to remember to rebuild Rust code after making changes.
2022-09-12 10:03:42 +00:00
reivilibre cf11919ddd
Fix cache metrics not being updated when not using the legacy exposition module. (#13717) 2022-09-08 15:30:48 +01:00
reivilibre b455c2a5ec
Update Grafana dashboard to not use legacy metric names. (#13714) 2022-09-06 12:21:21 +01:00
reivilibre 7bc110a19e
Generalise the `@cancellable` annotation so it can be used on functions other than just servlet methods. (#13662) 2022-08-31 11:16:05 +00:00
David Robertson 4249082eed
Merge branch 'release-v1.66' into develop 2022-08-30 15:31:51 +01:00
Eric Eastwood 1eea73b413
Fix rate limit metrics registering twice and misreporting (#13649)
* Fix rate limit metrics registering twice and misreporting

Fix https://github.com/matrix-org/synapse/issues/13641

* Fix lints

* Add changelog

* Document `metrics_name=None`.
2022-08-30 12:08:29 +01:00
reivilibre be4250c7a8
Add experimental configuration option to allow disabling legacy Prometheus metric names. (#13540)
Co-authored-by: David Robertson <davidr@element.io>
2022-08-24 11:35:54 +00:00
Erik Johnston f7ddfe17a3
Speed up `@cachedList` (#13591)
This speeds things up by ~2x.

The vast majority of the time is now spent in `LruCache` moving things around the linked lists.

We do this via two things:
1. Don't create a deferred per-key during bulk set operations in `DeferredCache`. Instead, only create them if a subsequent caller asks for the key.
2. Add a bulk lookup API to `DeferredCache` rather than use a loop.
2022-08-23 14:53:27 +00:00
Nick Mills-Barrett 5e7847dc92
Cache user IDs instead of profile objects (#13573)
The profile objects are never used and increase cache size significantly.
2022-08-23 09:49:59 +00:00
Sean Quah b251cff819
Fix incorrect juggling of logging contexts in `_PerHostRatelimiter` (#13554)
Signed-off-by: Sean Quah <seanq@matrix.org>

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2022-08-18 16:26:26 +01:00
Eric Eastwood d64653d062
Track number of hosts affected by the rate limiter (#13541)
Track number of hosts affected by the rate limiter so we can differentiate one really noisy homeserver from a general ratelimit tuning problem across the federation.

Follow-up to https://github.com/matrix-org/synapse/pull/13534

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-18 10:05:07 -05:00
Eric Eastwood 49d04e43df
Add metrics to track how the rate limiter is affecting requests (sleep/reject) (#13534)
Related to https://github.com/matrix-org/synapse/pull/13499

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-17 16:10:07 -05:00
Eric Eastwood c6ee9c0ee4
Add metrics to track rate limiter queue timing (#13544) 2022-08-17 10:38:05 +01:00
Eric Eastwood 344a2f767c
Instrument `FederationStateIdsServlet` - `/state_ids` (#13499)
Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.
2022-08-15 19:41:23 +01:00
Nick Mills-Barrett 41320a0554
Optimise async get event lookups (#13435)
Still maintains local in memory lookup optimisation, but does any external
lookup as part of the deferred that prevents duplicate lookups for the same
event at once. This makes the assumption that fetching from an external
cache is a non-zero load operation.
2022-08-04 15:49:55 +01:00
Dirk Klimpel d6e94ad9d9
Rename `RateLimitConfig` to `RatelimitSettings` (#13442) 2022-08-03 10:40:20 +01:00
Erik Johnston 0b87eb8e0c
Make DictionaryCache have better expiry properties (#13292) 2022-07-21 17:13:44 +01:00
Nick Mills-Barrett cc21a431f3
Async get event cache prep (#13242)
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-15 09:30:46 +00:00
David Robertson 6ba732fefe
Type `tests.utils` (#13028)
* Cast to postgres types when handling postgres db

* Remove unused method

* Easy annotations

* Annotate create_room

* Use `ParamSpec` to annotate looping_call

* Annotate `default_config`

* Track `now` as a float

`time_ms` returns an int like the proper Synapse `Clock`

* Introduce a `Timer` dataclass

* Introduce a Looper type

* Suppress checking of a mock

* tests.utils is typed

* Changelog

* Whoops, import ParamSpec from typing_extensions

* ditch the psycopg2 casts
2022-07-05 15:13:47 +01:00
Quentin Gliech fe1daad672
Move the "email unsubscribe" resource, refactor the macaroon generator & simplify the access token verification logic. (#12986)
This simplifies the access token verification logic by removing the `rights`
parameter which was only ever used for the unsubscribe link in email
notifications. The latter has been moved under the `/_synapse` namespace,
since it is not a standard API.

This also makes the email verification link more secure, by embedding the
app_id and pushkey in the macaroon and verifying it. This prevents the user
from tampering the query parameters of that unsubscribe link.

Macaroon generation is refactored:

- Centralised all macaroon generation and verification logic to the
  `MacaroonGenerator`
- Moved to `synapse.utils`
- Changed the constructor to require only a `Clock`, hostname, and a secret key
  (instead of a full `Homeserver`).
- Added tests for all methods.
2022-06-14 09:12:08 -04:00
David Robertson f30bcbd84a
Fix Synapse git info missing in version strings (#12973) 2022-06-07 15:24:11 +01:00
Patrick Cloke 759f9c09e1
Fix caching behavior for relations push rules. (#12859)
By always returning all requested values from the function
wrapped by cachedList. Otherwise implicit None values get
added into the cache, which are unexpected.
2022-05-25 07:49:54 -04:00
Sean Quah 2be5a2b07b
Fix `RetryDestinationLimiter` re-starting finished log contexts (#12803)
Signed-off-by: Sean Quah <seanq@matrix.org>
2022-05-19 20:17:10 +01:00
David Robertson d4713d3e33
Discard null-containing strings before updating the user directory (#12762) 2022-05-18 11:28:14 +01:00
Shay cde8af9a49
Add config flags to allow for cache auto-tuning (#12701) 2022-05-13 12:32:39 -07:00
Erik Johnston 8dd3e0e084
Immediately retry any requests that have backed off when a server comes back online. (#12500)
Otherwise it can take up to a minute for any in-flight `/send` requests to be retried.
2022-05-10 10:39:54 +01:00