MatrixSynapse/synapse
Sean Quah 89a71e7390
Fix a rare bug where initial /syncs would fail (#15383)
This change fixes a rare bug where initial /syncs would fail with a
`KeyError` under the following circumstances:
 1. A user fast joins a remote room.
 2. The user is kicked from the room before the room's full state has
    been synced.
 3. A second local user fast joins the room.
 4. Events are backfilled into the room with a higher topological
    ordering than the original user's leave. They are assigned a
    negative stream ordering. It's not clear how backfill happened here,
    since it is expected to be equivalent to syncing the full state.
 5. The second local user leaves the room before the room's full state
    has been synced. The homeserver does not complete the sync.
 6. The original user performs an initial /sync with lazy_load_members
    enabled.
     * Because they were kicked from the room, the room is included in
       the /sync response even though the include_leave option is not
       specified.
     * To populate the room's timeline, `_load_filtered_recents` /
       `get_recent_events_for_room` fetches events with a lower stream
       ordering than the leave event and picks the ones with the highest
       topological orderings (which are most recent). This captures the
       backfilled events after the leave, since they have a negative
       stream ordering. These events are filtered out of the timeline,
       since the user was not in the room at the time and cannot view
       them. The sync code ends up with an empty timeline for the room
       that notably does not include the user's leave event.
       This seems buggy, but at least we don't disclose events the user
       isn't allowed to see.
     * Normally, `compute_state_delta` would fetch the state at the
       start and end of the room's timeline to generate the sync
       response. Since the timeline is empty, it fetches the state at
       `min(now, last event in the room)`, which corresponds with the
       second user's leave. The state during the entirety of the second
       user's membership does not include the membership for the first
       user because of partial state.
       This part is also questionable, since we are fetching state from
       outside the bounds of the user's membership.
     * `compute_state_delta` then tries and fails to find the user's
       membership in the auth events of timeline events. Because there
       is no timeline event whose auth events are expected to contain
       the user's membership, a `KeyError` is raised.

Also contains a drive-by fix for a separate unlikely race condition.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-04-04 13:10:25 +01:00
..
_scripts Make cleaning up pushers depend on the device_id instead of the token_id (#15280) 2023-03-24 11:09:39 -04:00
api Fix spinloop during partial state sync when a prev event is in backoff (#15351) 2023-03-30 13:36:41 +01:00
app Experimental Unix socket support (#15353) 2023-04-03 10:27:51 +01:00
appservice Call appservices on modern paths, falling back to legacy paths. (#15317) 2023-04-03 13:20:32 -04:00
config Experimental Unix socket support (#15353) 2023-04-03 10:27:51 +01:00
crypto Use immutabledict instead of frozendict (#15113) 2023-03-22 17:15:34 +00:00
events Bump ruff from 0.0.252 to 0.0.259 (#15328) 2023-03-28 09:46:47 +01:00
federation Implement MSC3984 to proxy /keys/query requests to appservices. (#15321) 2023-03-30 08:39:38 -04:00
handlers Fix a rare bug where initial /syncs would fail (#15383) 2023-04-04 13:10:25 +01:00
http Call appservices on modern paths, falling back to legacy paths. (#15317) 2023-04-03 13:20:32 -04:00
logging Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
media Separate HTTP preview code and URL previewer. (#15269) 2023-03-20 14:32:26 -04:00
metrics Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
module_api Move Account Validity callbacks to a dedicated file (#15237) 2023-03-16 10:35:31 +00:00
push Fix missing app variable in mail subject for password resets (#15352) 2023-03-30 11:44:53 +01:00
replication Add some clarification to the doc/comments regarding TCP replication (#15354) 2023-03-30 12:51:35 +02:00
res Fix copyright year in SSO footer template (#15358) 2023-03-31 18:20:40 +01:00
rest Load `/password_policy` endpoint on workers. (#15331) 2023-03-27 07:37:17 -04:00
server_notices Remove unused `room_alias` field from `/createRoom` response (#15093) 2023-02-22 11:07:28 +00:00
spam_checker_api
state Use immutabledict instead of frozendict (#15113) 2023-03-22 17:15:34 +00:00
static
storage Revert pruning of old devices (#15360) 2023-03-31 13:51:51 +01:00
streams Use mypy 1.0 (#15052) 2023-02-16 16:09:11 +00:00
types Experimental Unix socket support (#15353) 2023-04-03 10:27:51 +01:00
util Use immutabledict instead of frozendict (#15113) 2023-03-22 17:15:34 +00:00
__init__.py Use immutabledict instead of frozendict (#15113) 2023-03-22 17:15:34 +00:00
event_auth.py More speedups/fixes to creating batched events (#15195) 2023-03-07 13:54:39 -08:00
notifier.py Fix a bug in the send_local_online_presence_to module API (#14880) 2023-01-25 21:34:37 +00:00
py.typed
server.py Move Account Validity callbacks to a dedicated file (#15237) 2023-03-16 10:35:31 +00:00
visibility.py Refactor `filter_events_for_server` (#15240) 2023-03-10 15:31:25 +00:00