Commit Graph

4641 Commits (50c92f3a692a745d2b42f9731af4da493fa27715)

Author SHA1 Message Date
Erik Johnston e8318a4333
Handle the case of remote users leaving a partial join room for device lists (#13885) 2022-09-27 13:01:08 +01:00
Patrick Cloke 2fae1a3f78
Improve tests for get_unread_push_actions_for_user_in_range_*. (#13893)
* Adds a docstring.
* Reduces a small amount of duplicated code.
* Improves tests.
2022-09-26 18:28:12 +00:00
David Robertson 0a38c7ec6d
Snapshot schema 72 (#13873)
Including another batch of fixes to the schema dump script
2022-09-26 18:28:32 +01:00
Nick Mills-Barrett 6b4593a80f
Simplify cache invalidation after event persist txn (#13796)
This moves all the invalidations into a single place and de-duplicates
the code involved in invalidating caches for a given event by using
the base class method.
2022-09-26 16:26:35 +01:00
Eric Eastwood ac1a31740b
Only try to backfill event if we haven't tried before recently (#13635)
Only try to backfill event if we haven't tried before recently (exponential backoff). No need to keep trying the same backfill point that fails over and over.

Fix https://github.com/matrix-org/synapse/issues/13622
Fix https://github.com/matrix-org/synapse/issues/8451

Follow-up to https://github.com/matrix-org/synapse/pull/13589

Part of https://github.com/matrix-org/synapse/issues/13356
2022-09-23 14:01:29 -05:00
Sean Quah f49f73c0da
Faster room joins: Avoid blocking `/keys/changes` (#13888)
Part of the work for #12993.

Once #12993 is fully resolved, we expect `/keys/changes` to behave
sensibly when joined to a room with partial state.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-23 17:55:15 +01:00
Patrick Cloke efd108b45d
Accept & store thread IDs for receipts (implement MSC3771). (#13782)
Updates the `/receipts` endpoint and receipt EDU handler to parse a
`thread_id` from the body and insert it in the database.
2022-09-23 14:33:28 +00:00
Sean Quah 03c2bfb7f8
Send device list updates out to servers in partially joined rooms (#13874)
Use the provided list of servers in the room from the `/send_join`
response, since we will not know which users are in the room.  This
isn't sufficient to ensure that all remote servers receive the right
device list updates, since the `/send_join` response may be inaccurate
or we may calculate the membership state of new users in the room
incorrectly.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-23 13:44:03 +01:00
Patrick Cloke b7272b73aa
Properly paginate forward in the /relations API. (#13840)
This fixes a bug where the `/relations` API with `dir=f` would
skip the first item of each page (except the first page), causing
incomplete data to be returned to the client.
2022-09-22 12:47:49 +00:00
Brendan Abolivier ccca14140a
Track device IDs for pushers (#13831)
Second half of the MSC3881 implementation
2022-09-21 15:31:53 +00:00
Brendan Abolivier 8ae42ab8fa
Support enabling/disabling pushers (from MSC3881) (#13799)
Partial implementation of MSC3881
2022-09-21 14:39:01 +00:00
Mathieu Velten 6bd8763804
Add cache invalidation across workers to module API (#13667)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
2022-09-21 15:32:01 +02:00
David Robertson fff9b955fa
Generate separate snapshots for logical databases (#13792)
* Generate separate snapshots for sqlite, postgres and common
* Cleanup postgres dbs in the TRAP
* Say which logical DB we're applying updates to
* Run background updates on the state DB
* Add new option for accepting a SCHEMA_NUMBER
2022-09-20 14:14:12 +01:00
Erik Johnston 42d261c32f
Port the push rule classes to Rust. (#13768) 2022-09-20 12:10:31 +01:00
Eric Eastwood 44be42338e
Add support to purge rows from MSC2716 and other tables when purging a room (#13825)
`event_failed_pull_attempts` added in https://github.com/matrix-org/synapse/pull/13589

MSC2716 related tables added in:

 - https://github.com/matrix-org/synapse/pull/10245/files#diff-3d42dfb44d02f7de3aada105e0bdc1cc9dd7f953cbf0f36c5d0f50827bf0320aR1
    - Renamed in https://github.com/matrix-org/synapse/pull/10838/files#diff-2730bfbe9e688b55e46f9371aefe67dac2bd2b2b7d9d6b92774eea1fcfae156dR1
 - https://github.com/matrix-org/synapse/pull/10498/files#diff-c52bbfbb5921a3f6f023b24343668479d966fac164f13b7c39d2197ce3afa7a5R1
2022-09-16 10:56:56 -05:00
Patrick Cloke b2b0c85279
Support providing an index predicate for upserts. (#13822)
This is useful to upsert against a table which has a unique
partial index while avoiding conflicts.
2022-09-15 18:28:48 +00:00
Eric Eastwood 957e3d74fc
Keep track when we try and fail to process a pulled event (#13589)
We can follow-up this PR with:

 1. Only try to backfill from an event if we haven't tried recently -> https://github.com/matrix-org/synapse/issues/13622
 1. When we decide to backfill that event again, process it in the background so it doesn't block and make `/messages` slow when we know it will probably fail again -> https://github.com/matrix-org/synapse/issues/13623
 1. Generally track failures everywhere we try and fail to pull an event over federation -> https://github.com/matrix-org/synapse/issues/13700

Fix https://github.com/matrix-org/synapse/issues/13621

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.qv7cj51sv9i5)
2022-09-14 13:57:50 -05:00
Patrick Cloke 666ae87729
Update event push action and receipt tables to support threads. (#13753)
Adds a `thread_id` column to the `event_push_actions`, `event_push_actions_staging`,
and `event_push_summary` tables. This will notifications to be segmented by the thread
in a future pull request. The `thread_id` column stores the root event ID or the special
value `"main"`.

The `thread_id` column for `event_push_actions` and `event_push_summary` is
backfilled with `"main"` for all existing rows. New entries into `event_push_actions`
and `event_push_actions_staging` will get the proper thread ID.

`receipts_linearized` and `receipts_graph` also gain a `thread_id` column, which is similar,
except `NULL` is a special value meaning the receipt is "unthreaded".

See MSC3771 and MSC3773 for where this data will be useful.
2022-09-14 17:11:16 +00:00
Patrick Cloke f2d12ccabe
Use partial indices on SQLIte. (#13802)
Partial indices have been supported since SQLite 3.8, but Synapse
now requires >= 3.27, so we can enable support for them.

This requires rebuilding previous indices which were partial on
PostgreSQL, but not on SQLite.
2022-09-14 12:01:42 -04:00
reivilibre 6302753012
Deduplicate `is_server_notices_room`. (#13780) 2022-09-14 15:53:18 +00:00
David Robertson 51a77e990b
Remove incorrect migration file from `state` logical DB (#13788)
* Remove incorrect migration file from `state` logical DB

The table `ex_outlier_stream` is part of the `main` logical DB; it
should not have been created in the `state` logical DB. We remove this
migration now as a tidy-up.

Note: we cannot `DROP TABLE IF EXISTS ex_outlier_stream` in a new
migration, because some (most) instances of Synapse host both of these
logical DBs on the same DB cluster.

* Changelog
2022-09-14 14:16:12 +01:00
Sean Quah c73774467e
Fix bug in device list caching when remote users leave rooms (#13749)
When a remote user leaves the last room shared with the homeserver, we
have to mark their device list as unsubscribed, otherwise we would hold
on to a stale device list in our cache. Crucially, the device list would
remain cached even after the remote user rejoined the room, which could
lead to E2EE failures until the next change to the remote user's device
list.

Fixes #13651.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-14 10:42:57 +01:00
Mathieu Velten 12dacecabd
Make sequence `cache_invalidation_stream_seq` begin at `2` (#13766)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-09-13 16:14:28 +02:00
David Robertson b60d47ab2c
Updates to the schema dump script (#13770) 2022-09-13 10:53:11 +01:00
Nick Mills-Barrett cdbb641232
Add receipts event stream ordering (#13703) 2022-09-13 08:16:37 +01:00
Nick Mills-Barrett da41a7cd61
Remove check current state membership up to date (#13745)
* Remove checks for membership column in current_state_events
* Add schema script to force through the
  `current_state_events_membership` background job

Contributed by Nick @ Beeper (@fizzadar).
2022-09-12 12:58:33 +01:00
Patrick Cloke 3d9f82efcb
Use an upsert for `receipts_graph`. (#13752)
Instead of a delete, then insert.

This was previously done for `receipts_linearized` in
2dc430d36e (#7607).
2022-09-09 07:08:41 -04:00
David Robertson f2d2481e56
Require SQLite >= 3.27.0 (#13760) 2022-09-09 11:14:10 +01:00
Dirk Klimpel f799eac7ea
Add timestamp to user's consent (#13741)
Co-authored-by: reivilibre <olivier@librepush.net>
2022-09-08 15:41:48 +00:00
Sean Quah 906cead9ca
Update docstrings to explain the impact of partial state (#13750)
Update the docstrings for `get_users_in_room` and
`get_current_hosts_in_room` to explain the impact of partial state.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-08 15:55:29 +01:00
Sean Quah 89e8b98b65
Avoid raising errors due to malformed IDs in `get_current_hosts_in_room` (#13748)
Handle malformed user IDs with no colons in `get_current_hosts_in_room`.
It's not currently possible for a malformed user ID to join a room, so
this error would never be hit.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-08 15:55:03 +01:00
Eric Eastwood d4d3249ded
Instrument `get_metadata_for_events` for tracing (#13730)
When backfilling, `_get_state_ids_after_missing_prev_event` calls [`get_metadata_for_events`](26bc26586b/synapse/handlers/federation_event.py (L1133)). For `#matrix:matrix.org`, it's called with 77k `state_events` which means 77 calls to the database and takes 28 seconds.
2022-09-07 11:41:52 -05:00
reivilibre d3d9ca156e
Cancel the processing of key query requests when they time out. (#13680) 2022-09-07 12:03:32 +01:00
reivilibre c2fe48a6ff
Rename the `EventFormatVersions` enum values so that they line up with room version numbers. (#13706) 2022-09-07 11:08:20 +01:00
Patrick Cloke 48a5c47a9f
Add a schema delta to drop unstable private read receipts. (#13692)
Otherwise they'll be leaked due to the filtering code only respecting
the stable identifiers for private read receipts.
2022-09-01 14:57:47 -04:00
Erik Johnston 9d2823ab70
Cache `is_partial_state_room` (#13693)
Fixes #13613.
2022-09-01 16:07:01 +01:00
Šimon Brandner 0e99f07952
Remove support for unstable private read receipts (#13653)
Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>
2022-09-01 13:31:54 +01:00
Nick Mills-Barrett 42b11d5565
Remove cached wrap on `_get_joined_users_from_context` method (#13569)
The method doesn't actually do any data fetching and the method that
does, `_get_joined_profile_from_event_id`, has its own cache.

Signed off by Nick @ Beeper (@Fizzadar).
2022-08-31 12:19:39 +01:00
David Robertson a160406d24
Fix admin List Room API return type on sqlite (#13509) 2022-08-31 10:38:16 +00:00
Eric Eastwood 92c5817e34
Give the correct next event when the message timestamps are the same - MSC3030 (#13658)
Discovered while working on https://github.com/matrix-org/synapse/pull/13589 and I had all the messages at the same timestamp in the tests.

Part of https://github.com/matrix-org/matrix-spec-proposals/pull/3030

Complement tests: https://github.com/matrix-org/complement/pull/457
2022-08-30 14:50:06 -05:00
Shay 20c76cecb9
Drop unused column `application_services_state.last_txn` (#13627) 2022-08-30 10:29:16 -07:00
Patrick Cloke 20df96a7a7
Speed up inserting `event_push_actions_staging`. (#13634)
By using `execute_values` instead of `execute_batch`.
2022-08-30 07:12:48 -04:00
Eric Eastwood 51d732db3b
Optimize how we calculate `likely_domains` during backfill (#13575)
Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)).

There are 3 ways we currently calculate hosts that are in the room:

 1. `get_current_state` -> `get_domains_from_state`
    - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill`
    - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳
 1. `get_current_hosts_in_room`
    - Used for other federation things like sending read receipts and typing indicators
 1. `get_hosts_in_room_at_events`
    - Used when pushing out events over federation to other servers in the `_process_event_queue_loop`

Fix https://github.com/matrix-org/synapse/issues/13626

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)


### Query performance

#### Before

The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
```
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
```

But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.

```sh
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
    type = 'm.room.member'
    AND membership = 'join'
    AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
  4130
(1 row)

Time: 13181.598 ms (00:13.182)

synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
 80814

synapse=# SELECT COUNT(*) from current_state_events;
  count
---------
 8162847

synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
 pg_size_pretty
----------------
 4702 MB
```

#### After

I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.

After a Postgres restart, it takes 330ms and running back to back takes 260ms.

```sh
$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
    substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
    c.type = 'm.room.member'
    AND c.membership = 'join'
    AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms
```

#### Going further

To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. 

Another thing we can do is optimize the query to use a index skip scan:

 - https://wiki.postgresql.org/wiki/Loose_indexscan
 - Index Skip Scan, https://commitfest.postgresql.org/37/1741/
 - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
2022-08-30 01:38:14 -05:00
Eric Eastwood d58615c82c
Directly lookup local membership instead of getting all members in a room first (`get_users_in_room` mis-use) (#13608)
See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755
2022-08-24 14:13:12 -05:00
Eric Eastwood b93bd95e8a
When loading current ids, sort by `stream_id` to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be `null` (#13585)
When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing.

Fixes the following startup error which is why I started looking into this area:

```
$ poetry run synapse_homeserver --config-path homeserver.yaml
****************************************************************
 Error during initialisation:
    '<' not supported between instances of 'NoneType' and 'str'
 There may be more information in the logs.
****************************************************************
```

Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode?
```
$ psql synapse
synapse=# SELECT * FROM stream_positions;
   stream_name   | instance_name | stream_id
-----------------+---------------+-----------
 account_data    | master        |      1242
 events          | master        |      1787
 to_device       | master        |        58
 presence_stream | master        |    485638
 receipts        | master        |       341
 backfill        | master        |   -139106
(6 rows)
synapse=# SELECT instance_name, stream_id FROM receipts_linearized;
 instance_name | stream_id
---------------+-----------
               |       211
               |         3
               |         4
               |       212
               |       213
               |       224
               |       228
               |       164
               |       313
               |       253
               |        38
               |       321
               |       324
               |       189
               |       192
               |       193
               |       194
               |       195
               |       197
               |       198
               |       275
               |        79
               |       339
               |       340
               |        82
               |       341
               |        84
               |        85
               |        91
               |       119
```
2022-08-24 12:53:46 -05:00
Nick Mills-Barrett b687010f89
Rewrite get push actions queries (#13597) 2022-08-24 10:12:51 +01:00
Erik Johnston 05c9c7363b
Fix regression caused by #13573 (#13600)
Broke in #13573.
2022-08-23 14:14:05 +00:00
Erik Johnston aec87a0f93
Speed up fetching large numbers of push rules (#13592) 2022-08-23 13:15:43 +01:00
Nick Mills-Barrett 5e7847dc92
Cache user IDs instead of profile objects (#13573)
The profile objects are never used and increase cache size significantly.
2022-08-23 09:49:59 +00:00
Quentin Gliech 3dd175b628
`synapse.api.auth.Auth` cleanup: make permission-related methods use `Requester` instead of the `UserID` (#13024)
Part of #13019

This changes all the permission-related methods to rely on the Requester instead of the UserID. This is a first step towards enabling scoped access tokens at some point, since I expect the Requester to have scope-related informations in it.

It also changes methods which figure out the user/device/appservice out of the access token to return a Requester instead of something else. This avoids having store-related objects in the methods signatures.
2022-08-22 14:17:59 +01:00