Commit Graph

15192 Commits (0c0c678d1c37e00bd7296bdafa0f3edb97b4224d)

Author SHA1 Message Date
Patrick Cloke 8a05d5de21
Batch look-ups to see if rooms are partial stated. (#14917)
* Batch look-ups to see if rooms are partial stated.

* Fix issues found in linting.

* Fix typo.

* Apply suggestions from code review

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Clarify comments.

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Also improve the cache size while we're at it

* is_partial_state_rooms -> is_partial_state_room_batched

* Run `black`

* Improve annotation for `simple_select_many_batch`

* Fix is_partial_state_room_batched impl

* Okay, _actually_ fix impl

* Update description.

* Update synapse/storage/databases/main/room.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Run black.

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: David Robertson <davidr@element.io>
2023-01-26 17:15:36 +00:00
Sean Quah cf66d712c6
Fix initialization of `_device_list_id_gen` (#14914)
On startup, the `_device_list_id_gen` stream id generator is initialized
using the maximum stream id seen in a list of tables. When we started
populating the `device_list_remote_pending` table in #13913, we forgot
to add it to the aforementioned list of tables, so the stream id
generator can hand out old stream ids after a restart. The end result is
that Synapse can fail to handle device list update EDUs after a restart
when a partial state join is in progress.

Add the `device_list_remote_pending` table to the list of tables to
consider when initializing the `_device_list_id_gen` stream id generator.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-26 10:38:49 +00:00
David Robertson 8e37ece015
Bump the client-side timeout for /state (#14912)
* Bump the client-side timeout for /state

to allow faster joins resyncs the chance to complete for large rooms.
We have seen this fair poorly (~90s for Matrix HQ's /state) in testing,
causing the resync to advance to another HS who hasn't seen our join yet.

* Changelog

* Milliseconds!!!!
2023-01-25 16:11:06 +00:00
Sean Quah a63d4cc9e9
Make sqlite database migrations transactional again (#14910)
#13873 introduced a regression which causes sqlite database migrations
to no longer run inside a transaction. Wrap them in a transaction again,
to avoid database corruption when migrations are interrupted.

Fixes #14909.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-25 13:38:53 +00:00
David Robertson 4607be0b7b
Request partial joins by default (#14905)
* Request partial joins by default

This is a little sloppy, but we are trying to gain confidence in faster
joins in the upcoming RC.

Admins can still opt out by adding the following to their Synapse
config:

```yaml
experimental:
    faster_joins: false
```

We may revert this change before the release proper, depending on how
testing in the wild goes.

* Changelog

* Try to fix the backfill test failures

* Upgrade notes

* Postgres compat?
2023-01-24 15:28:20 +00:00
David Robertson 80d44060c9
Faster joins: omit partial rooms from eager syncs until the resync completes (#14870)
* Allow `AbstractSet` in `StrCollection`

Or else frozensets are excluded. This will be useful in an upcoming
commit where I plan to change a function that accepts `List[str]` to
accept `StrCollection` instead.

* `rooms_to_exclude` -> `rooms_to_exclude_globally`

I am about to make use of this exclusion mechanism to exclude rooms for
a specific user and a specific sync. This rename helps to clarify the
distinction between the global config and the rooms to exclude for a
specific sync.

* Better function names for internal sync methods

* Track a list of excluded rooms on SyncResultBuilder

I plan to feed a list of partially stated rooms for this sync to ignore

* Exclude partial state rooms during eager sync

using the mechanism established in the previous commit

* Track un-partial-state stream in sync tokens

So that we can work out which rooms have become fully-stated during a
given sync period.

* Fix mutation of `@cached` return value

This was fouling up a complement test added alongside this PR.
Excluding a room would mean the set of forgotten rooms in the cache
would be extended. This means that room could be erroneously considered
forgotten in the future.

Introduced in #12310, Synapse 1.57.0. I don't think this had any
user-visible side effects (until now).

* SyncResultBuilder: track rooms to force as newly joined

Similar plan as before. We've omitted rooms from certain sync responses;
now we establish the mechanism to reintroduce them into future syncs.

* Read new field, to present rooms as newly joined

* Force un-partial-stated rooms to be newly-joined

for eager incremental syncs only, provided they're still fully stated

* Notify user stream listeners to wake up long polling syncs

* Changelog

* Typo fix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Unnecessary list cast

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Rephrase comment

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Another comment

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Fixup merge(?)

* Poke notifier when receiving un-partial-stated msg over replication

* Fixup merge whoops

Thanks MV :)

Co-authored-by: Mathieu Velen <mathieuv@matrix.org>

Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2023-01-23 15:44:39 +00:00
Patrick Cloke 82d3efa312
Skip processing stats for broken rooms. (#14873)
* Skip processing stats for broken rooms.

* Newsfragment

* Use a custom exception.
2023-01-23 11:36:20 +00:00
Sean Quah 2ec9c58496
Faster joins: Update room stats and the user directory on workers when finishing join (#14874)
* Faster joins: Update room stats and user directory on workers when done

When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.

We wake the room stats and user directory updaters at the appropriate
time in this commit.

Part of #12814 and #12815.

Signed-off-by: Sean Quah <seanq@matrix.org>

* fixup comment

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-23 10:31:36 +00:00
reivilibre 22cc93afe3
Enable Faster Remote Room Joins against worker-mode Synapse. (#14752)
* Enable Complement tests for Faster Remote Room Joins on worker-mode

* (dangerous) Add an override to allow Complement to use FRRJ under workers

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>

* Fix race where we didn't send out replication notification

* MORE HACKS

* Fix get_un_partial_stated_rooms_token to take instance_name

* Fix bad merge

* Remove warning

* Correctly advance un_partial_stated_room_stream

* Fix merge

* Add another notify_replication

* Fixups

* Create a separate ReplicationNotifier

* Fix test

* Fix portdb

* Create a separate ReplicationNotifier

* Fix test

* Fix portdb

* Fix presence test

* Newsfile

* Apply suggestions from code review

* Update changelog.d/14752.misc

Co-authored-by: Erik Johnston <erik@matrix.org>

* lint

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
2023-01-22 21:10:11 +00:00
Sean Quah d329a566df
Faster joins: Fix incompatibility with restricted joins (#14882)
* Avoid clearing out forward extremities when doing a second remote join

When joining a restricted room where the local homeserver does not have
a user able to issue invites, we perform a second remote join. We want
to avoid clearing out forward extremities in this case because the
forward extremities we have are up to date and clearing out forward
extremities creates a window in which the room can get bricked if
Synapse crashes.

Signed-off-by: Sean Quah <seanq@matrix.org>

* Do a full join when doing a second remote join into a full state room

We cannot persist a partial state join event into a joined full state
room, so we perform a full state join for such rooms instead. As a
future optimization, we could always perform a partial state join and
compute or retrieve the full state ourselves if necessary.

Signed-off-by: Sean Quah <seanq@matrix.org>

* Add lock around partial state flag for rooms

Signed-off-by: Sean Quah <seanq@matrix.org>

* Preserve partial state info when doing a second partial state join

Signed-off-by: Sean Quah <seanq@matrix.org>

* Add newsfile

* Add a TODO(faster_joins) marker

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-22 19:19:31 +00:00
Erik Johnston 0ec12a3753
Reduce max time we wait for stream positions (#14881)
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
2023-01-20 21:04:33 +00:00
Erik Johnston 65d0386693
Always notify replication when a stream advances (#14877)
This ensures that all other workers are told about stream updates in a timely manner, without having to remember to manually poke replication.
2023-01-20 18:02:18 +00:00
Sean Quah cdea7c11d0
Faster joins: Avoid starting duplicate partial state syncs (#14844)
Currently, we will try to start a new partial state sync every time we
perform a remote join, which is undesirable if there is already one
running for a given room.

We intend to perform remote joins whenever additional local users wish
to join a partial state room, so let's ensure that we do not start more
than one concurrent partial state sync for any given room.

------------------------------------------------------------------------

There is a race condition where the homeserver leaves a room and later
rejoins while the partial state sync from the previous membership is
still running. There is no guarantee that the previous partial state
sync will process the latest join, so we restart it if needed.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-20 12:06:19 +00:00
Erik Johnston cdf2707678
Fix bug in wait for stream position (#14872)
This caused some requests to fail.

This caused some requests to fail.

This really only started causing issues due to #14856
2023-01-19 22:19:56 +00:00
Andrew Morgan a7b54ca8d8
Implement MSC3930: polls push rules (#14787) 2023-01-19 12:47:10 +00:00
Erik Johnston 9187fd940e
Wait for streams to catch up when processing HTTP replication. (#14820)
This should hopefully mitigate a class of races where data gets out of
sync due a HTTP replication request racing with the replication streams.
2023-01-18 19:35:29 +00:00
Catalan Lover e8f2bf5c40
Change default room version to 10. Implements MSC3904 (#14111)
* Change Documentation to have v10 as default room version

* Change Default Room version to 10

* Add changelog entry for default room version swap

* Add changelog entry for v10 default room version in docs

* Clarify doc changelog entry

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>

* Improve Documentation changes.

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>

* Update Changelog entry to have correct format

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>

* Update Spec Version to 1.5

* Only need 1 changelog.

* Fix test.

* Update "Changed in" line

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
2023-01-18 18:59:48 +00:00
Patrick Cloke 4d6b1d3c47
Properly check for frozendicts in event auth code. (#14864)
Check for for an instance of a mapping instead of a dict.

This only affects room version 10 when frozen events are enabled.
2023-01-18 09:27:57 -05:00
David Robertson 5b3af1c7d0
Stabilise serving partial join responses (#14839)
Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true.

Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
2023-01-17 12:44:15 +00:00
Erik Johnston 316590d1ea
Fix bug in `wait_for_stream_position` (#14856)
We were incorrectly checking if the *local* token had been advanced, rather than the token for the remote instance.

In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).
2023-01-17 09:58:22 +00:00
Erik Johnston 2b084c5b71
Merge device list replication streams (#14833) 2023-01-17 09:29:58 +00:00
Sean Quah db5145a31d
Add parameter to control whether we do a partial state join (#14843)
When the local homeserver is already joined to a room and wants to
perform another remote join, we may find it useful to do a non-partial
state join if we already have the full state for the room.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-16 23:15:17 +00:00
Erik Johnston 4db3331bb9
Add an early return when handling no-op presence updates. (#14855)
This stops us from incrementing the presence stream position for no-op updates.
2023-01-16 14:20:12 +00:00
Sean Quah a302d3ecf7
Remove unnecessary reactor reference from `_PerHostRatelimiter` (#14842)
Fix up #14812 to avoid introducing a reference to the reactor.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-16 13:16:19 +00:00
David Robertson 85a7a201fa
Also use stable name in SendJoinResponse struct (#14841)
* Also use stable name in SendJoinResponse struct

follow-up to #14832

* Changelog

* Fix a rename I missed

* Run black

* Update synapse/federation/federation_client.py

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2023-01-16 12:40:25 +00:00
Andrew Morgan 54cd90ea60
Implement MSC3890: Remotely silence local notifications (#14775) 2023-01-13 19:32:10 +00:00
David Robertson 52ae80dd1a
Use stable identifiers for faster joins (#14832)
* Use new query param when requesting a partial join

* Read new query param when serving partial join

* Provide new field names when serving partial joins

* Read new field names from partial join response

* Changelog
2023-01-13 17:58:53 +00:00
Erik Johnston 73ff493dfb
Merge account data streams (#14826) 2023-01-13 14:57:43 +00:00
Dirk Klimpel 8d5325ec0c
Drop unused table `presence` (#14825) 2023-01-13 14:17:03 +00:00
Andrew Morgan 3a125625e7
Add some clarifying comments and refactor a portion of the `Keyring` class for readability (#14804) 2023-01-13 12:37:28 +00:00
Sean Quah 772e8c2385
Fix stack overflow in `_PerHostRatelimiter` due to synchronous requests (#14812)
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.

A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.

Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.

Fixes #14480.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-13 00:16:21 +00:00
Richard van der Hoff 0f061f39f0 Merge remote-tracking branch 'origin/release-v1.75' into develop 2023-01-12 16:45:23 +00:00
Erik Johnston b50c008453
Re-enable some linting (#14821)
* Re-enable some linting

* Newsfile

* Remove comment
2023-01-12 10:52:07 +00:00
Erik Johnston 84ce93c12f
Fix race calling `/members?at=` (#14817)
Fixes #14814
2023-01-12 10:29:09 +00:00
Emelie Graven dd9e71dc7f
Add `set_displayname` to the module API (#14629) 2023-01-11 18:41:52 +00:00
reivilibre 5172c8c403
Faster remote room joins (worker mode): do not populate external hosts-in-room cache when sending events as this requires blocking for full state. (#14749)
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Sean Quah <seanq@matrix.org>
2023-01-11 13:21:53 +00:00
reivilibre d6bda5addd
Add index to improve performance of the `/timestamp_to_event` endpoint used for jumping to a specific date in the timeline of a room. (#14799) 2023-01-11 12:29:13 +00:00
Patrick Cloke 3952297f6f
Calculate rooms changed for device lists to work. (#14810)
Back-out some changes from 7e582a25f8
(#14786) which skipped necessary logic to calculate device lists properly.
2023-01-11 12:16:41 +00:00
Dirk Klimpel 73f097888e
Add listener `health` (#14747)
Fixes: #8780
2023-01-11 12:00:38 +00:00
Richard van der Hoff 06ab64f201
Implement MSC3925: changes to bundling of edits (#14811)
Two parts to this:

 * Bundle the whole of the replacement with any edited events. This is backwards-compatible so I haven't put it behind a flag.
 * Optionally, inhibit server-side replacement of edited events. This has scope to break things, so it is currently disabled by default.
2023-01-10 16:31:28 +00:00
reivilibre ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Jeyachandran Rathnam 58d2adc3da
Remove undocumented device from pushrules (#14727)
* Remove undocumented device from pushrules

* Add changelog

* Update changelog.d/14727.misc

* Rename 14727.misc to 14727.bugfix

Co-authored-by: David Robertson <davidr@element.io>
2023-01-09 17:17:24 +00:00
Jeyachandran Rathnam babeeb4e7a
Unescape HTML entities in oEmbed titles. (#14781)
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.

There should not be harm in unescaping these.
2023-01-09 14:22:02 +00:00
Patrick Cloke 7e582a25f8
Improve /sync performance of when passing filters with empty arrays. (#14786)
This has two related changes:

* It enables fast-path processing for an empty filter (`[]`) which was
  previously only used for wildcard not-filters (`["*"]`).
* It special cases a `/sync` filter with no-rooms to skip all room
  processing, previously we would partially skip processing, but would
  generally still calculate intermediate values for each room which were
  then unused.

Future changes might consider further optimizations:

* Skip calculating per-room account data when all rooms are filtered (currently
  this is thrown away).
* Make similar improvements to other endpoints which support filters.
2023-01-09 08:43:50 -05:00
Jeyachandran Rathnam 5e0888076f
Disable sending confirmation email when 3pid is disabled #14682 (#14725)
* Fixes #12277 :Disable sending confirmation email when 3pid is disabled

* Fix test_add_email_if_disabled test case to reflect changes to enable_3pid_changes flag

* Add changelog file

* Rename newsfragment.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2023-01-09 11:12:03 +00:00
Patrick Cloke 630d0aeaf6
Support RFC7636 PKCE in the OAuth 2.0 flow. (#14750)
PKCE can protect against certain attacks and is enabled by default. Support
can be controlled manually by setting the pkce_method of each oidc_providers
entry to 'auto' (default), 'always', or 'never'.

This is required by Twitter OAuth 2.0 support.
2023-01-04 14:58:08 -05:00
Patrick Cloke 906dfaa2cf
Support non-OpenID compliant user info endpoints (#14753)
OpenID specifies the format of the user info endpoint and some
OAuth 2.0 IdPs do not follow it, e.g. NextCloud and Twitter.

This adds subject_template and picture_template options to the
default mapping provider for more flexibility in matching those user
info responses.
2023-01-04 08:26:10 -05:00
Nick Mills-Barrett db1cfe9c80
Update all stream IDs after processing replication rows (#14723)
This creates a new store method, `process_replication_position` that
is called after `process_replication_rows`. By moving stream ID advances
here this guarantees any relevant cache invalidations will have been
applied before the stream is advanced.

This avoids race conditions where Python switches between threads mid
way through processing the `process_replication_rows` method where stream
IDs may be advanced before caches are invalidated due to class resolution
ordering.

See this comment/issue for further discussion:
	https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703
2023-01-04 11:49:26 +00:00
Andrew Morgan c4456114e1
Add experimental support for MSC3391: deleting account data (#14714) 2023-01-01 03:40:46 +00:00
Patrick Cloke 044fa1a1de
Actually use the picture_claim as configured in OIDC config. (#14751)
Previously it was only using the default value ("picture") when
fetching the picture from the user info.
2022-12-29 12:18:06 -05:00