MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
David Robertson	91587d4cf9	Bulk-invalidate e2e cached queries after claiming keys (#16613 ) Co-authored-by: Patrick Cloke <patrickc@matrix.org>	2023-11-09 15:57:09 +00:00
Erik Johnston	8f35f8148e	Fix bug where a new writer advances their token too quickly (#16473 ) * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted before it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position	2023-10-23 16:57:30 +01:00
Patrick Cloke	02f74f3a99	Combine AbstractStreamIdTracker and AbstractStreamIdGenerator. (#15192 ) AbstractStreamIdTracker (now) has only a single sub-class: AbstractStreamIdGenerator, combine them to simplify some code and remove any direct references to AbstractStreamIdTracker.	2023-03-03 08:13:37 -05:00
Andrew Morgan	1eea662780	Add a `get_next_txn` method to `StreamIdGenerator` to match `MultiWriterIdGenerator` (#15191	2023-03-02 18:27:00 +00:00
Erik Johnston	65d0386693	Always notify replication when a stream advances (#14877 ) This ensures that all other workers are told about stream updates in a timely manner, without having to remember to manually poke replication.	2023-01-20 18:02:18 +00:00
Erik Johnston	9187fd940e	Wait for streams to catch up when processing HTTP replication. (#14820 ) This should hopefully mitigate a class of races where data gets out of sync due a HTTP replication request racing with the replication streams.	2023-01-18 19:35:29 +00:00
David Robertson	115f0eb233	Reintroduce #14376 , with bugfix for monoliths (#14468 ) * Add tests for StreamIdGenerator * Drive-by: annotate all defs * Revert "Revert "Remove slaved id tracker (#14376)" (#14463)" This reverts commit `d63814fd73`, which in turn reverted `36097e88c4`. This restores the latter. * Fix StreamIdGenerator not handling unpersisted IDs Spotted by @erikjohnston. Closes #14456. * Changelog Co-authored-by: Nick Mills-Barrett <nick@fizzadar.com> Co-authored-by: Erik Johnston <erik@matrix.org>	2022-11-16 22:16:46 +00:00
Erik Johnston	d63814fd73	Revert "Remove slaved id tracker (#14376 )" (#14463 ) This reverts commit `36097e88c4`.	2022-11-16 13:50:07 +00:00
Nick Mills-Barrett	36097e88c4	Remove slaved id tracker (#14376 ) This matches the multi instance writer ID generator class which can both handle advancing the current token over replication and by calling the database.	2022-11-14 17:31:36 +00:00
reivilibre	d3d9ca156e	Cancel the processing of key query requests when they time out. (#13680 )	2022-09-07 12:03:32 +01:00
Eric Eastwood	b93bd95e8a	When loading current ids, sort by `stream_id` to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be `null` (#13585 ) When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing. Fixes the following startup error which is why I started looking into this area: ``` $ poetry run synapse_homeserver --config-path homeserver.yaml ************************************************************** Error during initialisation: '<' not supported between instances of 'NoneType' and 'str' There may be more information in the logs. ************************************************************** ``` Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode? ``` $ psql synapse synapse=# SELECT * FROM stream_positions; stream_name \| instance_name \| stream_id -----------------+---------------+----------- account_data \| master \| 1242 events \| master \| 1787 to_device \| master \| 58 presence_stream \| master \| 485638 receipts \| master \| 341 backfill \| master \| -139106 (6 rows) synapse=# SELECT instance_name, stream_id FROM receipts_linearized; instance_name \| stream_id ---------------+----------- \| 211 \| 3 \| 4 \| 212 \| 213 \| 224 \| 228 \| 164 \| 313 \| 253 \| 38 \| 321 \| 324 \| 189 \| 192 \| 193 \| 194 \| 195 \| 197 \| 198 \| 275 \| 79 \| 339 \| 340 \| 82 \| 341 \| 84 \| 85 \| 91 \| 119 ```	2022-08-24 12:53:46 -05:00
Eric Eastwood	0a4efbc1dd	Instrument the federation/backfill part of `/messages` (#13489 ) Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356	2022-08-16 12:39:40 -05:00
Sean Quah	3f178332d6	Log the stack when waiting for an entire room to be un-partial stated (#13257 ) The stack is already logged when waiting for an event to be un-partial stated. Log the stack for rooms as well, to aid in debugging.	2022-07-12 18:57:38 +01:00
Erik Johnston	888a29f412	Wait for lazy join to complete when getting current state (#12872 )	2022-06-01 16:02:53 +01:00
Richard van der Hoff	f5668f0b4a	Await un-partial-stating after a partial-state join (#12399 ) When we join a room via the faster-joins mechanism, we end up with "partial state" at some points on the event DAG. Many parts of the codebase need to wait for the full state to load. So, we implement a mechanism to keep track of which events have partial state, and wait for them to be fully-populated.	2022-04-21 07:42:03 +01:00
Patrick Cloke	10a88ba91c	Use auto_attribs/native type hints for attrs classes. (#11692 )	2022-01-13 13:49:28 +00:00
Richard van der Hoff	ff7cc17b57	Improve log messages for stream ids (#11536 ) Somehow I'd managed to get my database in a pickle with stream ids. These changes were useful to debug.	2021-12-08 14:15:14 +00:00
Sean Quah	ffd858aa68	Add type hints to `synapse/storage/databases/main/events_worker.py` (#11411 ) Also refactor the stream ID trackers/generators a bit and try to document them better.	2021-11-26 18:41:31 +00:00
Patrick Cloke	64ef25391d	Add type hints to some storage classes (#11307 )	2021-11-11 08:47:31 -05:00
Erik Johnston	333d6f4e84	Fix race in `MultiWriterIdGenerator` (#11045 ) The race allowed the current position to advance too far when stream IDs are still being persisted. This happened when it received a new stream ID from a remote write between a new stream ID being allocated and it being added to the set of unpersisted stream IDs. Fixes #9424.	2021-10-12 14:27:09 +01:00
David Robertson	51a5da74cc	Annotate synapse.storage.util (#10892 ) Also mark `synapse.streams` as having has no untyped defs Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2021-10-08 14:25:16 +00:00
Erik Johnston	92b6ac31b2	Speed up MultiWriterIdGenerator when lots of IDs are in flight. (#10755 )	2021-09-03 18:23:46 +01:00
Jonathan de Jong	bdfde6dca1	Use inline type hints in `http/federation/`, `storage/` and `util/` (#10381 )	2021-07-15 12:46:54 -04:00
Erik Johnston	d26d15ba3d	Fix bug when running presence off master (#10149 ) Hopefully fixes #10027.	2021-06-11 10:27:12 +01:00
Jonathan de Jong	4b965c862d	Remove redundant "coding: utf-8" lines (#9786 ) Part of #9744 Removes all redundant `# -- coding: utf-8 --` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`	2021-04-14 15:34:27 +01:00
Jonathan de Jong	2ca4e349e9	Bugbear: Add Mutable Parameter fixes (#9682 ) Part of #9366 Adds in fixes for B006 and B008, both relating to mutable parameter lint errors. Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>	2021-04-08 22:38:54 +01:00
Erik Johnston	0b5c967813	Refactor to ensure we call check_consistency (#9470 ) The idea here is to stop people forgetting to call `check_consistency`. Folks can still just pass in `None` to the new args in `build_sequence_generator`, but hopefully they won't.	2021-02-24 10:13:53 +00:00
Eric Eastwood	0a00b7ff14	Update black, and run auto formatting over the codebase (#9381 ) - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](`80d6dc9783/docs/code_style.md`) - Update `code_style.md` docs around installing black to use the correct version	2021-02-16 22:32:34 +00:00
Patrick Cloke	7950aa8a27	Fix some typos.	2021-02-12 11:14:12 -05:00
Jonathan de Jong	d882fbca38	Update type hints for Cursor to match PEP 249. (#9299 )	2021-02-05 15:39:19 -05:00
Erik Johnston	758ed5f1bc	Speed up chain cover calculation (#9176 )	2021-01-21 17:00:12 +00:00
Erik Johnston	12ec55bfaa	Increase perf of handling concurrent use of StreamIDGenerators. (#9190 ) We have seen a failure mode here where if there are many in flight unfinished IDs then marking an ID as finished takes a lot of CPU (as calling deque.remove iterates over the list)	2021-01-21 16:31:51 +00:00
Erik Johnston	ccfafac882	Add schema update to fix existing DBs affected by #9193 (#9195 )	2021-01-21 16:03:25 +00:00
Erik Johnston	2506074ef0	Fix receipts or account data not being sent down sync (#9193 ) Introduced in #9104 This wasn't picked up by the tests as this is all fine the first time you run Synapse (after upgrading), but then when you restart the wrong value is pulled from `stream_positions`.	2021-01-21 15:09:09 +00:00
Erik Johnston	6633a4015a	Allow moving account data and receipts streams off master (#9104 )	2021-01-18 15:47:59 +00:00
Erik Johnston	659c415ed4	Fix chain cover background update to work with split out event persisters (#9115 )	2021-01-14 17:19:35 +00:00
Patrick Cloke	bd30cfe86a	Convert internal pusher dicts to attrs classes. (#8940 ) This improves type hinting and should use less memory.	2020-12-16 11:25:30 -05:00
Erik Johnston	618d405a32	Remove racey assertion in MultiWriterIDGenerator (#8530 ) We asserted that the IDs returned by postgres sequence was greater than any we had seen, however this is technically racey as we may update the current positions out of order. We now assert that the sequences are correct on startup, so the assertion is no longer really required, so we remove them.	2020-10-14 15:40:06 +01:00
Erik Johnston	8de3703d21	Make event persisters periodically announce position over replication. (#8499 ) Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress. This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.	2020-10-12 15:51:41 +01:00
Erik Johnston	ae5b2a72c0	Reduce serialization errors in MultiWriterIdGen (#8456 ) We call `_update_stream_positions_table_txn` a lot, which is an UPSERT that can conflict in `REPEATABLE READ` isolation level. Instead of doing a transaction consisting of a single query we may as well run it outside of a transaction.	2020-10-07 15:15:57 +01:00
Erik Johnston	e3debf9682	Add logging on startup/shutdown (#8448 ) This is so we can tell what is going on when things are taking a while to start up. The main change here is to ensure that transactions that are created during startup get correctly logged like normal transactions.	2020-10-02 15:20:45 +01:00
Richard van der Hoff	462e681c79	Synapse 1.21.0rc2 (2020-10-02) ============================== Features -------- - Convert additional templates from inline HTML to Jinja2 templates. ([\#8444](https://github.com/matrix-org/synapse/issues/8444)) Bugfixes -------- - Fix a regression in v1.21.0rc1 which broke thumbnails of remote media. ([\#8438](https://github.com/matrix-org/synapse/issues/8438)) - Do not expose the experimental `uk.half-shot.msc2778.login.application_service` flow in the login API, which caused a compatibility problem with Element iOS. ([\#8440](https://github.com/matrix-org/synapse/issues/8440)) - Fix malformed log line in new federation "catch up" logic. ([\#8442](https://github.com/matrix-org/synapse/issues/8442)) - Fix DB query on startup for negative streams which caused long start up times. Introduced in [\#8374](https://github.com/matrix-org/synapse/issues/8374). ([\#8447](https://github.com/matrix-org/synapse/issues/8447)) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl93FccACgkQOSor00I9 eP9/Egf7B4YOF6tniyAXxZvmvFOwV1WNw4sbFmF+czUKHBTAwS/Ij9MbutulD4OB +yqHAvu15qUCQR/G+KGjyHBDtESEUtn5SRy8znLYlR2n3qfEdEpd5y6LJSq4s7sr NjFVNVI1g5L8PmbvvWCINfpPm2JSm8zyOdyxy4KZifex1B+8YgPILeQOB59sWL/H 1maFbHCgepqO3jotsA8PUXQZx5oScABmqYYe92b4sLna00uFBq2NWp0NA654dRqK VRFlGzId1fZNWTy1jzfOY2sJKpBCy4cMrtfGJ/eqMtryHqbnBFT6hgB8FyTNg0h0 oew+BLV/mcJLcvB0ALRMFS7xZHdoxQ== =+3N3 -----END PGP SIGNATURE----- Merge tag 'v1.21.0rc2' into develop Synapse 1.21.0rc2 (2020-10-02) ============================== Features -------- - Convert additional templates from inline HTML to Jinja2 templates. ([\#8444](https://github.com/matrix-org/synapse/issues/8444)) Bugfixes -------- - Fix a regression in v1.21.0rc1 which broke thumbnails of remote media. ([\#8438](https://github.com/matrix-org/synapse/issues/8438)) - Do not expose the experimental `uk.half-shot.msc2778.login.application_service` flow in the login API, which caused a compatibility problem with Element iOS. ([\#8440](https://github.com/matrix-org/synapse/issues/8440)) - Fix malformed log line in new federation "catch up" logic. ([\#8442](https://github.com/matrix-org/synapse/issues/8442)) - Fix DB query on startup for negative streams which caused long start up times. Introduced in [\#8374](https://github.com/matrix-org/synapse/issues/8374). ([\#8447](https://github.com/matrix-org/synapse/issues/8447))	2020-10-02 12:59:17 +01:00
Erik Johnston	695240d34a	Fix DB query on startup for negative streams. (#8447 ) For negative streams we have to negate the internal stream ID before querying the DB. The effect of this bug was to query far too many rows, slowing start up time, but we would correctly filter the results afterwards so there was no ill effect.	2020-10-02 12:22:19 +01:00
Patrick Cloke	4ff0201e62	Enable mypy checking for unreachable code and fix instances. (#8432 )	2020-10-01 08:09:18 -04:00
Erik Johnston	b1433bf231	Don't table scan events on worker startup (#8419 ) * Fix table scan of events on worker startup. This happened because we assumed "new" writers had an initial stream position of 0, so the replication code tried to fetch all events written by the instance between 0 and the current position. Instead, set the initial position of new writers to the current persisted up to position, on the assumption that new writers won't have written anything before that point. * Consider old writers coming back as "new". Otherwise we'd try and fetch entries between the old stale token and the current position, even though it won't have written any rows. Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2020-09-29 16:42:19 +01:00
Erik Johnston	bd380d942f	Add checks for postgres sequence consistency (#8402 )	2020-09-28 18:00:30 +01:00
Erik Johnston	3e87d79e1c	Fix schema delta for servers that have not backfilled (#8396 ) Fixes #8395.	2020-09-25 09:58:32 +01:00
Erik Johnston	f112cfe5bb	Fix MultiWriteIdGenerator's handling of restarts. (#8374 ) On startup `MultiWriteIdGenerator` fetches the maximum stream ID for each instance from the table and uses that as its initial "current position" for each writer. This is problematic as a) it involves either a scan of events table or an index (neither of which is ideal), and b) if rows are being persisted out of order elsewhere while the process restarts then using the maximum stream ID is not correct. This could theoretically lead to race conditions where e.g. events that are persisted out of order are not sent down sync streams. We fix this by creating a new table that tracks the current positions of each writer to the stream, and update it each time we finish persisting a new entry. This is a relatively small overhead when persisting events. However for the cache invalidation stream this is a much bigger relative overhead, so instead we note that for invalidation we don't actually care about reliability over restarts (as there's no caches to invalidate) and simply don't bother reading and writing to the new table in that particular case.	2020-09-24 16:53:51 +01:00
Erik Johnston	cbabb312e0	Use `async with` for ID gens (#8383 ) This will allow us to hit the DB after we've finished using the generated stream ID.	2020-09-23 16:11:18 +01:00
Erik Johnston	04cc249b43	Add experimental support for sharding event persister. Again. (#8294 ) This is not ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.	2020-09-14 10:16:41 +01:00

1 2

95 Commits (8275953626cef84e33f820def22c8167daec5099)