MatrixSynapse/synapse/storage
Eric Eastwood fa8616e65c
Fix MSC3030 `/timestamp_to_event` returning `outliers` that it has no idea whether are near a gap or not (#14215)
Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](2c63cdcc3f/docs/development/room-dag-concepts.md (outliers)).

Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake.  Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.

Fix  https://github.com/matrix-org/synapse/issues/13944


### Why did this fail before? Why was it flakey?

Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`

This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](cb20b885cb/synapse/handlers/room.py (L1470-L1496)) so it always runs after `sync_partial_state` completes.

```py
from twisted.internet import task as twisted_task
d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
await d
```

In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.

With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
2022-10-18 19:46:25 -05:00
..
controllers Fix performance regression in `get_users_in_room` (#13972) 2022-09-30 13:15:32 +01:00
databases Fix MSC3030 `/timestamp_to_event` returning `outliers` that it has no idea whether are near a gap or not (#14215) 2022-10-18 19:46:25 -05:00
engines Snapshot schema 72 (#13873) 2022-09-26 18:28:32 +01:00
schema Update the thread_id right before use (in case the bg update hasn't finished) (#14222) 2022-10-18 14:55:41 +00:00
util Cancel the processing of key query requests when they time out. (#13680) 2022-09-07 12:03:32 +01:00
__init__.py
_base.py Optimise get_rooms_for_user (drop with_stream_ordering) (#13787) 2022-09-29 13:55:12 +00:00
background_updates.py Generate separate snapshots for logical databases (#13792) 2022-09-20 14:14:12 +01:00
database.py When restarting a partial join resync, prioritise the server which actioned a partial join (#14126) 2022-10-18 12:33:18 +01:00
keys.py
prepare_database.py Snapshot schema 72 (#13873) 2022-09-26 18:28:32 +01:00
push_rule.py
roommember.py
state.py Update the rejected state of events during resync (#13459) 2022-08-11 10:42:24 +00:00
types.py