Commit Graph

3880 Commits (091e9482af6171c19f4b967d3a973ec6c2088ab7)

Author SHA1 Message Date
Patrick Cloke 629a951b49
Move additional tasks to the background worker, part 4 (#8513) 2020-10-13 08:20:32 -04:00
Erik Johnston b2486f6656
Fix message duplication if something goes wrong after persisting the event (#8476)
Should fix #3365.
2020-10-13 12:07:56 +01:00
Erik Johnston 8de3703d21
Make event persisters periodically announce position over replication. (#8499)
Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.

This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
2020-10-12 15:51:41 +01:00
Erik Johnston 5009ffcaa4
Only send RDATA for instance local events. (#8496)
When pulling events out of the DB to send over replication we were not
filtering by instance name, and so we were sending events for other
instances.
2020-10-09 13:10:33 +01:00
Patrick Cloke fe0f4a3591
Move additional tasks to the background worker, part 3 (#8489) 2020-10-09 07:37:51 -04:00
Patrick Cloke a93f3121f8
Add type hints to some handlers (#8505) 2020-10-09 07:20:51 -04:00
Hubert Chathi a97cec18bb
Invalidate the cache when an olm fallback key is uploaded (#8501) 2020-10-08 13:24:46 -04:00
Patrick Cloke e4f72ddc44
Move additional tasks to the background worker (#8458) 2020-10-07 11:27:56 -04:00
Erik Johnston ae5b2a72c0
Reduce serialization errors in MultiWriterIdGen (#8456)
We call `_update_stream_positions_table_txn` a lot, which is an UPSERT
that can conflict in `REPEATABLE READ` isolation level. Instead of doing
a transaction consisting of a single query we may as well run it outside
of a transaction.
2020-10-07 15:15:57 +01:00
Erik Johnston 52a50e8686
Use vector clocks for room stream tokens. (#8439)
Currently when using multiple event persisters we (in the worst case) don't tell clients about events until all event persisters have persisted new events after the original event. This is a suboptimal, especially if one of the event persisters goes down.

To handle this, we encode the position of each event persister in the room tokens so that we can send events to clients immediately. To reduce the size of the token we do two things:

1. We create a unique immutable persistent mapping between instance names and a generated small integer ID, which we can encode in the tokens instead of the instance name; and
2. We encode the "persisted upto position" of the room token and then only explicitly include instances that have positions strictly greater than that.

The new tokens look something like: `m3478~1.3488~2.3489`, where the first number is the min position, and the subsequent `-` separated pairs are the instance ID to positions map. (We use `.` and `~` as separators as they're URL safe and not already used by `StreamToken`).
2020-10-07 15:15:33 +01:00
Patrick Cloke b460a088c6
Add typing information to the device handler. (#8407) 2020-10-07 08:58:21 -04:00
Hubert Chathi 4cb44a1585
Add support for MSC2697: Dehydrated devices (#8380)
This allows a user to store an offline device on the server and
then restore it at a subsequent login.
2020-10-07 08:00:17 -04:00
Hubert Chathi 3cd78bbe9e
Add support for MSC2732: olm fallback keys (#8312) 2020-10-06 13:26:29 -04:00
Richard van der Hoff f31f8e6319
Remove stream ordering from Metadata dict (#8452)
There's no need for it to be in the dict as well as the events table. Instead,
we store it in a separate attribute in the EventInternalMetadata object, and
populate that on load.

This means that we can rely on it being correctly populated for any event which
has been persited to the database.
2020-10-05 14:43:14 +01:00
Patrick Cloke c5251c6fbd
Do not assume that account data is of the correct form. (#8454)
This fixes a bug where `m.ignored_user_list` was assumed to be a dict,
leading to odd behavior for users who set it to something else.
2020-10-05 09:28:05 -04:00
Erik Johnston e3debf9682
Add logging on startup/shutdown (#8448)
This is so we can tell what is going on when things are taking a while to start up.

The main change here is to ensure that transactions that are created during startup get correctly logged like normal transactions.
2020-10-02 15:20:45 +01:00
Erik Johnston ec10bdd32b
Speed up unit tests when using PostgreSQL (#8450) 2020-10-02 15:09:31 +01:00
Patrick Cloke 62894673e6
Allow background tasks to be run on a separate worker. (#8369) 2020-10-02 08:23:15 -04:00
Richard van der Hoff 462e681c79 Synapse 1.21.0rc2 (2020-10-02)
==============================
 
 Features
 --------
 
 - Convert additional templates from inline HTML to Jinja2 templates. ([\#8444](https://github.com/matrix-org/synapse/issues/8444))
 
 Bugfixes
 --------
 
 - Fix a regression in v1.21.0rc1 which broke thumbnails of remote media. ([\#8438](https://github.com/matrix-org/synapse/issues/8438))
 - Do not expose the experimental `uk.half-shot.msc2778.login.application_service` flow in the login API, which caused a compatibility problem with Element iOS. ([\#8440](https://github.com/matrix-org/synapse/issues/8440))
 - Fix malformed log line in new federation "catch up" logic. ([\#8442](https://github.com/matrix-org/synapse/issues/8442))
 - Fix DB query on startup for negative streams which caused long start up times. Introduced in [\#8374](https://github.com/matrix-org/synapse/issues/8374). ([\#8447](https://github.com/matrix-org/synapse/issues/8447))
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl93FccACgkQOSor00I9
 eP9/Egf7B4YOF6tniyAXxZvmvFOwV1WNw4sbFmF+czUKHBTAwS/Ij9MbutulD4OB
 +yqHAvu15qUCQR/G+KGjyHBDtESEUtn5SRy8znLYlR2n3qfEdEpd5y6LJSq4s7sr
 NjFVNVI1g5L8PmbvvWCINfpPm2JSm8zyOdyxy4KZifex1B+8YgPILeQOB59sWL/H
 1maFbHCgepqO3jotsA8PUXQZx5oScABmqYYe92b4sLna00uFBq2NWp0NA654dRqK
 VRFlGzId1fZNWTy1jzfOY2sJKpBCy4cMrtfGJ/eqMtryHqbnBFT6hgB8FyTNg0h0
 oew+BLV/mcJLcvB0ALRMFS7xZHdoxQ==
 =+3N3
 -----END PGP SIGNATURE-----

Merge tag 'v1.21.0rc2' into develop

Synapse 1.21.0rc2 (2020-10-02)
==============================

Features
--------

- Convert additional templates from inline HTML to Jinja2 templates. ([\#8444](https://github.com/matrix-org/synapse/issues/8444))

Bugfixes
--------

- Fix a regression in v1.21.0rc1 which broke thumbnails of remote media. ([\#8438](https://github.com/matrix-org/synapse/issues/8438))
- Do not expose the experimental `uk.half-shot.msc2778.login.application_service` flow in the login API, which caused a compatibility problem with Element iOS. ([\#8440](https://github.com/matrix-org/synapse/issues/8440))
- Fix malformed log line in new federation "catch up" logic. ([\#8442](https://github.com/matrix-org/synapse/issues/8442))
- Fix DB query on startup for negative streams which caused long start up times. Introduced in [\#8374](https://github.com/matrix-org/synapse/issues/8374). ([\#8447](https://github.com/matrix-org/synapse/issues/8447))
2020-10-02 12:59:17 +01:00
Erik Johnston 695240d34a
Fix DB query on startup for negative streams. (#8447)
For negative streams we have to negate the internal stream ID before
querying the DB.

The effect of this bug was to query far too many rows, slowing start up
time, but we would correctly filter the results afterwards so there was
no ill effect.
2020-10-02 12:22:19 +01:00
Patrick Cloke 4ff0201e62
Enable mypy checking for unreachable code and fix instances. (#8432) 2020-10-01 08:09:18 -04:00
Erik Johnston 7941372ec8
Make token serializing/deserializing async (#8427)
The idea is that in future tokens will encode a mapping of instance to position. However, we don't want to include the full instance name in the string representation, so instead we'll have a mapping between instance name and an immutable integer ID in the DB that we can use instead. We'll then do the lookup when we serialize/deserialize the token (we could alternatively pass around an `Instance` type that includes both the name and ID, but that turns out to be a lot more invasive).
2020-09-30 20:29:19 +01:00
Richard van der Hoff 20e7c4de26 Add an improved "forward extremities" metric
Hopefully, N(extremities) * N(state_events) is a more realistic approximation
to "how big a problem is this room?".
2020-09-30 16:49:15 +01:00
Richard van der Hoff 6d2d42f8fb Rewrite BucketCollector
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.

I've replaced it with something that is hopefully a bit easier to use.

(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
2020-09-30 16:49:15 +01:00
Erik Johnston ea70f1c362
Various clean ups to room stream tokens. (#8423) 2020-09-29 21:48:33 +01:00
Erik Johnston b1433bf231
Don't table scan events on worker startup (#8419)
* Fix table scan of events on worker startup.

This happened because we assumed "new" writers had an initial stream
position of 0, so the replication code tried to fetch all events written
by the instance between 0 and the current position.

Instead, set the initial position of new writers to the current
persisted up to position, on the assumption that new writers won't have
written anything before that point.

* Consider old writers coming back as "new".

Otherwise we'd try and fetch entries between the old stale token and the
current position, even though it won't have written any rows.

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-09-29 16:42:19 +01:00
Richard van der Hoff 2649d545a5
Mypy fixes for `synapse.handlers.federation` (#8422)
For some reason, an apparently unrelated PR upset mypy about this module. Here are a number of little fixes.
2020-09-29 15:57:36 +01:00
Will Hunt 8676d8ab2e
Filter out appservices from mau count (#8404)
This is an attempt to fix #8403.
2020-09-29 13:11:02 +01:00
Erik Johnston bd380d942f
Add checks for postgres sequence consistency (#8402) 2020-09-28 18:00:30 +01:00
Matthew Hodgson 4b3a1faa08 typo 2020-09-28 00:23:35 +01:00
Tdxdxoz abd04b6af0
Allow existing users to login via OpenID Connect. (#8345)
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>

This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
2020-09-25 07:01:45 -04:00
Erik Johnston 3e87d79e1c
Fix schema delta for servers that have not backfilled (#8396)
Fixes #8395.
2020-09-25 09:58:32 +01:00
Erik Johnston f112cfe5bb
Fix MultiWriteIdGenerator's handling of restarts. (#8374)
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.

We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
2020-09-24 16:53:51 +01:00
Erik Johnston ac11fcbbb8
Add EventStreamPosition type (#8388)
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:

1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.

The valid operations are then:

1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.

(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
2020-09-24 13:24:17 +01:00
Richard van der Hoff 302dc89f6a
Fix bug which caused failure on join with malformed membership events (#8385) 2020-09-23 16:42:14 +01:00
Erik Johnston cbabb312e0
Use `async with` for ID gens (#8383)
This will allow us to hit the DB after we've finished using the generated stream ID.
2020-09-23 16:11:18 +01:00
Mathieu Velten 916bb9d0d1
Don't push if an user account has expired (#8353) 2020-09-23 16:06:28 +01:00
Andrew Morgan 4325be1a52 Fix missing null character check on guest_access room state
When updating room_stats_state, we try to check for null bytes slipping
in to the
content for state events. It turns out we had added guest_access as a
field to
room_stats_state without including it in the null byte check.

Lo and behold, a null byte in a m.room.guest_access event then breaks
room_stats_state
updates.

This PR adds the check for guest_access. A further PR will improve this
function so that this hopefully does not happen again in future.
2020-09-22 19:39:29 +01:00
Dirk Klimpel 8998217540
Fixed a bug with reactivating users with the admin API (#8362)
Fixes: #8359 

Trying to reactivate a user with the admin API (`PUT /_synapse/admin/v2/users/<user_name>`) causes an internal server error.

Seems to be a regression in #8033.
2020-09-22 18:19:01 +01:00
Dirk Klimpel 4da01f9c61
Admin API for reported events (#8217)
Add an admin API to read entries of table `event_reports`. API: `GET /_synapse/admin/v1/event_reports`
2020-09-22 18:15:04 +01:00
Patrick Cloke 00db7786de Synapse 1.20.0rc5 (2020-09-18)
==============================
 
 In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3.
 
 Features
 --------
 
 - Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343))
 
 Bugfixes
 --------
 
 - Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342))
 - Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349))
 
 Internal Changes
 ----------------
 
 - Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl9kzk8ACgkQM/xY9qcR
 MEim6A//aERkhyLGRlGpLd37lCyFQCeffTMH1rTvu04iIBQBaUZ6g7CYWOpK43zT
 U8kt379+5OShjdAXs/X4XP+ucdHVbrwsRSP3hBS/fFLiDT0fJgP8uiSf5QqO6NnT
 OqDyXYjcXvj/c6tMKglVtsdh8u4hFwNZjGPMGG68IzJu14uEhnD100cL9jSB9bLB
 ongWpsQzzdGBpJPSFRjv9dCUSeRbzyUdl1t0uqzrNqyN9s/JnzFTn7ZYo6y3lnSS
 dHGVMMo/12M2PkbBHnbJVvDY5Q/R7ZxyXlpz0gvSNOQIw8FqYFnuB0Niy5dQhXSR
 Sy5h4qbczLxqbql1x+lmzeQm4ZMORsW/Tl4C3z6yK6OYaOCJHIf9en4DplTSTqp1
 t+85JxWR2wH10d99YHBpaYKmkVovpwgchrO4YWrtXljUFAhhavzf+YiAdOHYT52s
 RDsDLsvjMbxEHsz4cHfycmshYhjzjb340wkoDXuQpj0zrO99d+Zd83xdK8pS0UQn
 OaljLRAd/5iBjTSyZPSrB1U5141OzlM3QZVJzaYAnP12yhR9eaX2twSCk+lPYOWd
 nhLJjNnj1B1XSGArthuE5NLyEiCPz6KyN2RhO0EOx5YjZN9TwH7LS9upyNFe1nN1
 GIhO5gz+jWLuBZE3xzRNjJyCx/I/LolpCwGMvKDu6638rpsbrPs=
 =tT5/
 -----END PGP SIGNATURE-----

Merge tag 'v1.20.0rc5' into develop

Synapse 1.20.0rc5 (2020-09-18)
==============================

In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3.

Features
--------

- Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343))

Bugfixes
--------

- Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342))
- Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349))

Internal Changes
----------------

- Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
2020-09-18 11:17:58 -04:00
reivilibre 36efbcaf51
Catch-up after Federation Outage (bonus): Catch-up on Synapse Startup (#8322)
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Fix _set_destination_retry_timings

This came about because the code assumed that retry_interval
could not be NULL — which has been challenged by catch-up.
2020-09-18 14:59:13 +01:00
Patrick Cloke 8a4a4186de
Simplify super() calls to Python 3 syntax. (#8344)
This converts calls like super(Foo, self) -> super().

Generated with:

    sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
2020-09-18 09:56:44 -04:00
Erik Johnston 43f2b67e4d
Intelligently select extremities used in backfill. (#8349)
Instead of just using the most recent extremities let's pick the
ones that will give us results that the pagination request cares about,
i.e. pick extremities only if they have a smaller depth than the
pagination token.

This is useful when we fail to backfill an extremity, as we no longer
get stuck requesting that same extremity repeatedly.
2020-09-18 14:25:52 +01:00
Jonathan de Jong 837293c314
Remove obsolete __future__ imports (#8337) 2020-09-17 08:37:01 -04:00
Jonathan de Jong a3f124b821
Switch metaclass initialization to python 3-compatible syntax (#8326) 2020-09-16 15:15:55 -04:00
reivilibre 576bc37d31
Catch-up after Federation Outage (split, 4): catch-up loop (#8272) 2020-09-15 09:07:19 +01:00
Patrick Cloke aec294ee0d
Use slots in attrs classes where possible (#8296)
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
2020-09-14 12:50:06 -04:00
Tulir Asokan b82d68c0bd
Add the topic and avatar to the room details admin API (#8305) 2020-09-14 10:07:04 -04:00
Erik Johnston 04cc249b43
Add experimental support for sharding event persister. Again. (#8294)
This is *not* ready for production yet. Caveats:

1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
2020-09-14 10:16:41 +01:00