This field is no longer read from, so we should stop populating it. Once we're
happy that this doesn't break everything, and a rollback is unlikely, we can
think about dropping the column.
We've long passed the point where it's possible to have the same event_id in
different tables, so these join conditions are redundant: we can just join on
event_id.
event_edges is of non-trivial size, and the room_id column is wasteful, so
let's stop reading from it. In future, we can stop writing to it, and then drop
it.
(since it uses methods therein)
Turns out that we had a bunch of things which were incorrectly importing
EventWorkerStore from events.py rather than events_worker.py, which broke once
I removed the import into events.py.
This fixes a bug in _delete_existing_rows_txn which was introduced in #3435
(though it's been on matrix-org-hotfixes for *years*). This code is only called
when there is some sort of conflict the first time we try to persist an event,
so it only happens rarely. Still, the exceptions are annoying.
We need to run the errback in the sentinel context to avoid losing our own
context.
Also: add logging to runInteraction to help identify where "Starting db
connection from sentinel context" warnings are coming from
We incorrectly asserted that all contexts must have a non None state
group without consider outliers. This would usually be fine as the
assertion would never be hit, as there is a shortcut during persistence
if the forward extremities don't change.
However, if the outlier is being persisted with non-outlier events, the
function would be called and the assertion would be hit.
Fixes#3601
We don't want to bother pulling out the current state from the DB since
until we know we have to. Checking the context for state is just an
optimisation.
This fixes#3518, and ensures that we get useful logs and metrics for lots of
things that happen in the background.
(There are certainly more things that happen in the background; these are just
the common ones I've found running a single-process synapse locally).
This is in preparation for using contexts that may or may not have the
current_state_ids set. This will allow us to avoid unnecessarily pulling
out state for an event on the master process when using workers.
We also add a check to see if the state groups of the old extremities
are the same as the new ones.
It turns out that most of the time we were calling have_events, we were only
using half of the result. Replace have_events with have_seen_events and
get_rejection_reasons, so that we can see what's going on a bit more clearly.
This will allow us to measure how often we calculate state deltas in
event persistence that we would have been able to calculate at the same
time we calculated the state for the event.
The race happens when the user joins a room at the same time as doing a
sync. We fetch the current token and then get the rooms the user is in.
If the join happens after the current token, but before we get the rooms
we end up sending down a partial room entry in the sync.
This is fixed by looking at the stream ordering of the membership
returned by get_rooms_for_user, and handling the case when that stream
ordering is after the current token.
* Split state group persist into seperate storage func
* Add per database engine code for state group id gen
* Move store_state_group to StateReadStore
This allows other workers to use it, and so resolve state.
* Hook up store_state_group
* Fix tests
* Rename _store_mult_state_groups_txn
* Rename StateGroupReadStore
* Remove redundant _have_persisted_state_group_txn
* Update comments
* Comment compute_event_context
* Set start val for state_group_id_seq
... otherwise we try to recreate old state groups
* Update comments
* Don't store state for outliers
* Update comment
* Update docstring as state groups are ints
1. use `deferred.errback()` instead of `deferred.errback(e)`, which means that
a Failure object will be constructed using the current exception state,
*including* its stack trace - so the stack trace is saved in the Failure,
leading to better exception reports.
2. Set `consumeErrors=True` on the ObservableDeferred, because we know that
there will always be at least one observer - which avoids a spurious "CRITICAL:
unhandled exception in Deferred" error in the logs