Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.
Also give recalculation of a room's current state a real stream
ordering.
Signed-off-by: Sean Quah <seanq@matrix.org>
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.
We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.
To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.
All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.
On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.
The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.
We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.
`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.
`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.
Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.
Signed-off-by: Sean Quah <seanq@matrix.org>
Instead, use the `room_version` property of the event we're validating.
The `room_version` was originally added as a parameter somewhere around #4482,
but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing.
The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching).
One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used.
Part of #12684
Try to avoid an OOM by checking fewer extremities.
Generally this is a big rewrite of _maybe_backfill, to try and fix some of the TODOs and other problems in it. It's best reviewed commit-by-commit.
We work through all the events with partial state, updating the state at each
of them. Once it's done, we recalculate the state for the whole room, and then
mark the room as having complete state.
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.
Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.
Signed-off-by: Sean Quah <seanq@element.io>
When we get a partial_state response from send_join, store information in the
database about it:
* store a record about the room as a whole having partial state, and stash the
list of member servers too.
* flag the join event itself as having partial state
* also, for any new events whose prev-events are partial-stated, note that
they will *also* be partial-stated.
We don't yet make any attempt to interpret this data, so API calls (and a bunch
of other things) are just going to get incorrect data.
I've never found this terribly useful. I think it was added in the early days
of Synapse, without much thought as to what would actually be useful to log,
and has just been cargo-culted ever since.
Rather, it tends to clutter up debug logs with useless information.
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030
Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Co-authored-by: Erik Johnston <erik@matrix.org>
This fixes a bug where we would accept an event whose `auth_events` include
rejected events, if the rejected event was shadowed by another `auth_event`
with same `(type, state_key)`.
The approach is to pass a list of auth events into
`check_auth_rules_for_event` instead of a dict, which of course means updating
the call sites.
This is an extension of #10956.
Found while working on the Gitter backfill script and noticed
it only happened after we sent 7 batches, https://gitlab.com/gitterHQ/webapp/-/merge_requests/2229#note_665906390
When there are more than 5 backward extremities for a given depth,
backfill will throw an error because we sliced the extremity list
to 5 but then try to iterate over the full list. This causes
us to look for state that we never fetched and we get a `KeyError`.
Before when calling `/messages` when there are more than 5 backward extremities:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 258, in _async_render_wrapper
callback_return = await self._async_render(request)
File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 446, in _async_render
callback_return = await raw_callback_return
File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 580, in on_GET
msgs = await self.pagination_handler.get_messages(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/pagination.py", line 396, in get_messages
await self.hs.get_federation_handler().maybe_backfill(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 133, in maybe_backfill
return await self._maybe_backfill_inner(room_id, current_depth, limit)
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 386, in _maybe_backfill_inner
likely_extremeties_domains = get_domains_from_state(states[e_id])
KeyError: '$zpFflMEBtZdgcMQWTakaVItTLMjLFdKcRWUPHbbSZJl'
```
The shared ratelimit function was replaced with a dedicated
RequestRatelimiter class (accessible from the HomeServer
object).
Other properties were copied to each sub-class that inherited
from BaseHandler.
This fixes a "Event not signed by authorising server" error when
transition room member from join -> join, e.g. when updating a
display name or avatar URL for restricted rooms.
Broadly, the existing `event_auth.check` function has two parts:
* a validation section: checks that the event isn't too big, that it has the rught signatures, etc.
This bit is independent of the rest of the state in the room, and so need only be done once
for each event.
* an auth section: ensures that the event is allowed, given the rest of the state in the room.
This gets done multiple times, against various sets of room state, because it forms part of
the state res algorithm.
Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think
that makes everything hard to follow. Instead, we split the function in two and call each part
separately where it is needed.
Constructing an EventContext for an outlier is actually really simple, and
there's no sense in going via an `async` method in the `StateHandler`.
This also means that we can resolve a bunch of FIXMEs.
Adds missing type hints to methods in the synapse.handlers
module and requires all methods to have type hints there.
This also removes the unused construct_auth_difference method
from the FederationHandler.
Given that backfill and get_missing_events are basically the same thing, it's somewhat crazy that we have entirely separate code paths for them. This makes backfill use the existing get_missing_events code, and then clears up all the unused code.
Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.
* drop room pdu linearizer sooner
No point holding onto it while we recheck the db
* move out `missing_prevs` calculation
we're going to need `missing_prevs` whatever we do, so we may as well calculate
it eagerly and just update it if it gets outdated.
* Add another `if missing_prevs` condition
this should be a no-op, since all the code inside the block already checks `if
missing_prevs`
* reorder if conditions
This shouldn't change the logic at all.
* Push down `min_depth` read
No point reading it from the database unless we're going to use it.
* Collect the sent_to_us_directly code together
Move the remaining `sent_to_us_directly` code inside the `if
sent_to_us_directly` block.
* Properly separate the `not sent_to_us_directly` branch
Since the only way this second block is now reachable is if we
*didn't* go into the `sent_to_us_directly` branch, we can replace it with a
simple `else`.
* changelog
Marking things as outliers to inhibit pushes is a sledgehammer to crack a
nut. Move the test further down the stack so that we just inhibit the thing we
want.
* Include outlier status in `str(event)`
In places where we log event objects, knowing whether or not you're dealing
with an outlier is super useful.
* Remove duplicated logging in get_missing_events
When we process events received from get_missing_events, we log them twice
(once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce
the duplication by removing the logging in `on_receive_pdu`, and ensuring the
call sites do sensible logging.
* log in `on_receive_pdu` when we already have the event
* Log which prev_events we are missing
* changelog
* drop old-room hack
pretty sure we don't need this any more.
* Remove incorrect comment about modifying `context`
It doesn't look like the supplied context is ever modified.
* Stop `_auth_and_persist_event` modifying its parameters
This is only called in three places. Two of them don't pass `auth_events`, and
the third doesn't use the dict after passing it in, so this should be non-functional.
* Stop `_check_event_auth` modifying its parameters
`_check_event_auth` is only called in three places. `on_send_membership_event`
doesn't pass an `auth_events`, and `prep` and `_auth_and_persist_event` do not
use the map after passing it in.
* Stop `_update_auth_events_and_context_for_auth` modifying its parameters
Return the updated auth event dict, rather than modifying the parameter.
This is only called from `_check_event_auth`.
* Improve documentation on `_auth_and_persist_event`
Rename `auth_events` parameter to better reflect what it contains.
* Improve documentation on `_NewEventInfo`
* Improve documentation on `_check_event_auth`
rename `auth_events` parameter to better describe what it contains
* changelog
* Make historical messages available to federated servers
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
Follow-up to https://github.com/matrix-org/synapse/pull/9247
* Debug message not available on federation
* Add base starting insertion point when no chunk ID is provided
* Fix messages from multiple senders in historical chunk
Follow-up to https://github.com/matrix-org/synapse/pull/9247
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
---
Previously, Synapse would throw a 403,
`Cannot force another user to join.`,
because we were trying to use `?user_id` from a single virtual user
which did not match with messages from other users in the chunk.
* Remove debug lines
* Messing with selecting insertion event extremeties
* Move db schema change to new version
* Add more better comments
* Make a fake requester with just what we need
See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080
* Store insertion events in table
* Make base insertion event float off on its own
See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889
Conflicts:
synapse/rest/client/v1/room.py
* Validate that the app service can actually control the given user
See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455
Conflicts:
synapse/rest/client/v1/room.py
* Add some better comments on what we're trying to check for
* Continue debugging
* Share validation logic
* Add inserted historical messages to /backfill response
* Remove debug sql queries
* Some marker event implemntation trials
* Clean up PR
* Rename insertion_event_id to just event_id
* Add some better sql comments
* More accurate description
* Add changelog
* Make it clear what MSC the change is part of
* Add more detail on which insertion event came through
* Address review and improve sql queries
* Only use event_id as unique constraint
* Fix test case where insertion event is already in the normal DAG
* Remove debug changes
* Add support for MSC2716 marker events
* Process markers when we receive it over federation
* WIP: make hs2 backfill historical messages after marker event
* hs2 to better ask for insertion event extremity
But running into the `sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group`
error
* Add insertion_event_extremities table
* Switch to chunk events so we can auth via power_levels
Previously, we were using `content.chunk_id` to connect one
chunk to another. But these events can be from any `sender`
and we can't tell who should be able to send historical events.
We know we only want the application service to do it but these
events have the sender of a real historical message, not the
application service user ID as the sender. Other federated homeservers
also have no indicator which senders are an application service on
the originating homeserver.
So we want to auth all of the MSC2716 events via power_levels
and have them be sent by the application service with proper
PL levels in the room.
* Switch to chunk events for federation
* Add unstable room version to support new historical PL
* Messy: Fix undefined state_group for federated historical events
```
2021-07-13 02:27:57,810 - synapse.handlers.federation - 1248 - ERROR - GET-4 - Failed to backfill from hs1 because NOT NULL constraint failed: event_to_state_groups.state_group
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1216, in try_backfill
await self.backfill(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1035, in backfill
await self._auth_and_persist_event(dest, event, context, backfilled=True)
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2222, in _auth_and_persist_event
await self._run_push_actions_and_persist_event(event, context, backfilled)
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2244, in _run_push_actions_and_persist_event
await self.persist_events_and_notify(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 3290, in persist_events_and_notify
events, max_stream_token = await self.storage.persistence.persist_events(
File "/usr/local/lib/python3.8/site-packages/synapse/logging/opentracing.py", line 774, in _trace_inner
return await func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 320, in persist_events
ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 237, in handle_queue_loop
ret = await self._per_item_callback(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch
await self.persist_events_store._persist_events_and_state_updates(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 176, in _persist_events_and_state_updates
await self.db_pool.runInteraction(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 681, in runInteraction
result = await self.runWithConnection(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 770, in runWithConnection
return await make_deferred_yieldable(
File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext
return func(*args, **kw)
File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
compat.reraise(excValue, excTraceback)
File "/usr/local/lib/python3.8/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
return function(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/twisted/python/compat.py", line 403, in reraise
raise exception.with_traceback(traceback)
File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
result = func(conn, *args, **kw)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 765, in inner_func
return func(db_conn, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 549, in new_transaction
r = func(cursor, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/logging/utils.py", line 69, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 385, in _persist_events_txn
self._store_event_state_mappings_txn(txn, events_and_contexts)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 2065, in _store_event_state_mappings_txn
self.db_pool.simple_insert_many_txn(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 923, in simple_insert_many_txn
txn.execute_batch(sql, vals)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 280, in execute_batch
self.executemany(sql, args)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 300, in executemany
self._do_execute(self.txn.executemany, sql, *args)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 330, in _do_execute
return func(sql, *args)
sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group
```
* Revert "Messy: Fix undefined state_group for federated historical events"
This reverts commit 187ab28611.
* Fix federated events being rejected for no state_groups
Add fix from https://github.com/matrix-org/synapse/pull/10439
until it merges.
* Adapting to experimental room version
* Some log cleanup
* Add better comments around extremity fetching code and why
* Rename to be more accurate to what the function returns
* Add changelog
* Ignore rejected events
* Use simplified upsert
* Add Erik's explanation of extra event checks
See https://github.com/matrix-org/synapse/pull/10498#discussion_r680880332
* Clarify that the depth is not directly correlated to the backwards extremity that we return
See https://github.com/matrix-org/synapse/pull/10498#discussion_r681725404
* lock only matters for sqlite
See https://github.com/matrix-org/synapse/pull/10498#discussion_r681728061
* Move new SQL changes to its own delta file
* Clean up upsert docstring
* Bump database schema version (62)
* Make historical messages available to federated servers
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
Follow-up to https://github.com/matrix-org/synapse/pull/9247
* Debug message not available on federation
* Add base starting insertion point when no chunk ID is provided
* Fix messages from multiple senders in historical chunk
Follow-up to https://github.com/matrix-org/synapse/pull/9247
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
---
Previously, Synapse would throw a 403,
`Cannot force another user to join.`,
because we were trying to use `?user_id` from a single virtual user
which did not match with messages from other users in the chunk.
* Remove debug lines
* Messing with selecting insertion event extremeties
* Move db schema change to new version
* Add more better comments
* Make a fake requester with just what we need
See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080
* Store insertion events in table
* Make base insertion event float off on its own
See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889
Conflicts:
synapse/rest/client/v1/room.py
* Validate that the app service can actually control the given user
See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455
Conflicts:
synapse/rest/client/v1/room.py
* Add some better comments on what we're trying to check for
* Continue debugging
* Share validation logic
* Add inserted historical messages to /backfill response
* Remove debug sql queries
* Some marker event implemntation trials
* Clean up PR
* Rename insertion_event_id to just event_id
* Add some better sql comments
* More accurate description
* Add changelog
* Make it clear what MSC the change is part of
* Add more detail on which insertion event came through
* Address review and improve sql queries
* Only use event_id as unique constraint
* Fix test case where insertion event is already in the normal DAG
* Remove debug changes
* Switch to chunk events so we can auth via power_levels
Previously, we were using `content.chunk_id` to connect one
chunk to another. But these events can be from any `sender`
and we can't tell who should be able to send historical events.
We know we only want the application service to do it but these
events have the sender of a real historical message, not the
application service user ID as the sender. Other federated homeservers
also have no indicator which senders are an application service on
the originating homeserver.
So we want to auth all of the MSC2716 events via power_levels
and have them be sent by the application service with proper
PL levels in the room.
* Switch to chunk events for federation
* Add unstable room version to support new historical PL
* Fix federated events being rejected for no state_groups
Add fix from https://github.com/matrix-org/synapse/pull/10439
until it merges.
* Only connect base insertion event to prev_event_ids
Per discussion with @erikjohnston,
https://matrix.to/#/!UytJQHLQYfvYWsGrGY:jki.re/$12bTUiObDFdHLAYtT7E-BvYRp3k_xv8w0dUQHibasJk?via=jki.re&via=matrix.org
* Make it possible to get the room_version with txn
* Allow but ignore historical events in unsupported room version
See https://github.com/matrix-org/synapse/pull/10245#discussion_r675592489
We can't reject historical events on unsupported room versions because homeservers without knowledge of MSC2716 or the new room version don't reject historical events either.
Since we can't rely on the auth check here to stop historical events on unsupported room versions, I've added some additional checks in the processing/persisting code (`synapse/storage/databases/main/events.py` -> `_handle_insertion_event` and `_handle_chunk_event`). I've had to do some refactoring so there is method to fetch the room version by `txn`.
* Move to unique index syntax
See https://github.com/matrix-org/synapse/pull/10245#discussion_r675638509
* High-level document how the insertion->chunk lookup works
* Remove create_event fallback for room_versions
See https://github.com/matrix-org/synapse/pull/10245/files#r677641879
* Use updated method name
The idea here is to stop people sending things that aren't joins/leaves/knocks through these endpoints: previously you could send anything you liked through them. I wasn't able to find any security holes from doing so, but it doesn't sound like a good thing.
An accidental mis-ordering of operations during #6739 technically allowed an incoming knock event over federation in before checking it against any configured Third Party Access Rules modules.
This PR corrects that by performing the TPAR check *before* persisting the event.
* Room version 7 for knocking.
* Stable prefixes and endpoints (both client and federation) for knocking.
* Removes the experimental configuration flag.
Spawned from missing messages we were seeing on `matrix.org` from a
federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770.
The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066
where the message and join event race and the message is `soft_failed` before the
`join` event reaches the remote federated server.
Less soft_failed events = better and usually this should only trigger for events
where people are doing bad things and trying to fuzz and fake everything.
If backfilling is slow then the client may time out and retry, causing
Synapse to start a new `/backfill` before the existing backfill has
finished, duplicating work.
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
To be more consistent with similar code. The check now automatically
raises an AuthError instead of passing back a boolean. It also absorbs
some shared logic between callers.
We were pulling the full auth chain for the room out of the DB each time
we backfilled, which can be *huge* for large rooms and is totally
unnecessary.
When receiving a /send_join request for a room with join rules set to 'restricted',
check if the user is a member of the spaces defined in the 'allow' key of the join rules.
This only applies to an experimental room version, as defined in MSC3083.
When receiving a /send_join request for a room with join rules set to 'restricted',
check if the user is a member of the spaces defined in the 'allow' key of the join
rules.
This only applies to an experimental room version, as defined in MSC3083.
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited.
We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits.
Fixes#9663
Background: When we receive incoming federation traffic, and notice that we are missing prev_events from
the incoming traffic, first we do a `/get_missing_events` request, and then if we still have missing prev_events,
we set up new backwards-extremities. To do that, we need to make a `/state_ids` request to ask the remote
server for the state at those prev_events, and then we may need to then ask the remote server for any events
in that state which we don't already have, as well as the auth events for those missing state events, so that we
can auth them.
This PR attempts to optimise the processing of that state request. The `state_ids` API returns a list of the state
events, as well as a list of all the auth events for *all* of those state events. The optimisation comes from the
observation that we are currently loading all of those auth events into memory at the start of the operation, but
we almost certainly aren't going to need *all* of the auth events. Rather, we can check that we have them, and
leave the actual load into memory for later. (Ideally the federation API would tell us which auth events we're
actually going to need, but it doesn't.)
The effect of this is to reduce the number of events that I need to load for an event in Matrix HQ from about
60000 to about 22000, which means it can stay in my in-memory cache, whereas previously the sheer number
of events meant that all 60K events had to be loaded from db for each request, due to the amount of cache
churn. (NB I've already tripled the size of the cache from its default of 10K).
Unfortunately I've ended up basically C&Ping `_get_state_for_room` and `_get_events_from_store_or_dest` into
a new method, because `_get_state_for_room` is also called during backfill, which expects the auth events to be
returned, so the same tricks don't work. That said, I don't really know why that codepath is completely different
(ultimately we're doing the same thing in setting up a new backwards extremity) so I've left a TODO suggesting
that we clean it up.
- Update black version to the latest
- Run black auto formatting over the codebase
- Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
- Update `code_style.md` docs around installing black to use the correct version
Replaces the `federation_ip_range_blacklist` configuration setting with an
`ip_range_blacklist` setting with wider scope. It now applies to:
* Federation
* Identity servers
* Push notifications
* Checking key validitity for third-party invite events
The old `federation_ip_range_blacklist` setting is still honored if present, but
with reduced scope (it only applies to federation and identity servers).
* Consistently use room_id from federation request body
Some federation APIs have a redundant `room_id` path param (see
https://github.com/matrix-org/matrix-doc/issues/2330). We should make sure we
consistently use either the path param or the body param, and the body param is
easier.
* Kill off some references to "context"
Once upon a time, "rooms" were known as "contexts". I think this kills of the
last references to "contexts".
There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about.
This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet.
There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit.
Separated out from #6739.