MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
Patrick Cloke	3e4af36bc8	Rename get_tcp_replication to get_replication_command_handler. (#12192 ) Since the object it returns is a ReplicationCommandHandler. This is clean-up from adding support to Redis where the command handler was added as an additional layer of abstraction from the TCP protocol.	2022-03-10 13:01:56 +00:00
Erik Johnston	423cca9efe	Spread out sending device lists to remote hosts (#12132 )	2022-03-04 11:48:15 +00:00
Richard van der Hoff	e24ff8ebe3	Remove `HomeServer.get_datastore()` (#12031 ) The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733	2022-02-23 11:04:02 +00:00
Patrick Cloke	d0e78af35e	Add missing type hints to synapse.replication. (#11938 )	2022-02-08 11:03:08 -05:00
Brendan Abolivier	c7a5e49664	Implement an `on_new_event` callback (#11126 ) Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2021-10-26 15:17:36 +02:00
Patrick Cloke	f4b1a9a527	Require direct references to configuration variables. (#10985 ) This removes the magic allowing accessing configurable variables directly from the config object. It is now required that a specific configuration class is used (e.g. `config.foo` must be replaced with `config.server.foo`).	2021-10-06 10:47:41 -04:00
Patrick Cloke	01c88a09cd	Use direct references for some configuration variables (#10798 ) Instead of proxying through the magic getter of the RootConfig object. This should be more performant (and is more explicit).	2021-09-13 13:07:12 -04:00
Richard van der Hoff	d9cb658c78	Fix up type hints for Twisted 21.7 (#10490 ) Mostly this involves decorating a few Deferred declarations with extra type hints. We wrap the types in quotes to avoid runtime errors when running against older versions of Twisted that don't have generics on Deferred.	2021-07-28 12:04:11 +00:00
Šimon Brandner	c3b037795a	Support for MSC2285 (hidden read receipts) (#10413 ) Implementation of matrix-org/matrix-doc#2285	2021-07-28 10:05:11 +02:00
Jonathan de Jong	bf72d10dbf	Use inline type hints in various other places (in `synapse/`) (#10380 )	2021-07-15 11:02:43 +01:00
Richard van der Hoff	b378d98c8f	Add debug logging for issue #9533 (#9959 ) Hopefully this will help us track down where to-device messages are getting lost/delayed.	2021-05-11 11:04:03 +01:00
Erik Johnston	de0d088adc	Add presence federation stream (#9819 )	2021-04-20 14:11:24 +01:00
Erik Johnston	00a6db9676	Move some replication processing out of generic_worker (#9796 ) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2021-04-14 17:06:06 +01:00
Jonathan de Jong	4b965c862d	Remove redundant "coding: utf-8" lines (#9786 ) Part of #9744 Removes all redundant `# -- coding: utf-8 --` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`	2021-04-14 15:34:27 +01:00
Patrick Cloke	33a02f0f52	Fix additional type hints from Twisted upgrade. (#9518 )	2021-03-03 15:47:38 -05:00
Erik Johnston	a6ea1a957e	Don't pull event from DB when handling replication traffic. (#8669 ) I was trying to make it so that we didn't have to start a background task when handling RDATA, but that is a bigger job (due to all the code in `generic_worker`). However I still think not pulling the event from the DB may help reduce some DB usage due to replication, even if most workers will simply go and pull that event from the DB later anyway. Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>	2020-10-28 12:11:45 +00:00
Erik Johnston	8de3703d21	Make event persisters periodically announce position over replication. (#8499 ) Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress. This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.	2020-10-12 15:51:41 +01:00
Erik Johnston	ea70f1c362	Various clean ups to room stream tokens. (#8423 )	2020-09-29 21:48:33 +01:00
Erik Johnston	ac11fcbbb8	Add EventStreamPosition type (#8388 ) The idea is to remove some of the places we pass around `int`, where it can represent one of two things: 1. the position of an event in the stream; or 2. a token that partitions the stream, used as part of the stream tokens. The valid operations are then: 1. did a position happen before or after a token; 2. get all events that happened before or after a token; and 3. get all events between two tokens. (Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)	2020-09-24 13:24:17 +01:00
Erik Johnston	5d3e306d9f	Clean up `Notifier.on_new_room_event` code path (#8288 ) The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out when the max stream position has caught up to the event and we can notify people about it. This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens. The valid operations here are: 1. Is a position before or after a token 2. Fetching all events between two tokens 3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A or before B, then P is before C. Future PR will change the token type to a dedicated type.	2020-09-10 13:24:43 +01:00
Erik Johnston	c9dbee50ae	Fixup pusher pool notifications (#8287 ) `pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed. I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.	2020-09-09 16:56:08 +01:00
Erik Johnston	dc9dcdbd59	Revert "Fixup pusher pool notifications" This reverts commit `e7fd336a53`.	2020-09-09 16:19:22 +01:00
Erik Johnston	e7fd336a53	Fixup pusher pool notifications	2020-09-09 16:17:50 +01:00
Erik Johnston	3b4556cf87	Fix `wait_for_stream_position` for multiple waiters. (#8196 ) This fixes a bug where having multiple callers waiting on the same stream and position will cause it to try and compare two deferreds, which fails (due to the sorted list having an entry of `Tuple[int, Deferred]`).	2020-08-28 17:12:45 +01:00
Erik Johnston	84d099ae11	Fix typing replication not being handled on master (#7959 ) Handling of incoming typing stream updates from replication was not hooked up on master, effecting set ups where typing was handled on a different worker. This is really only a problem if the master process is also handling sync requests, which is unlikely for those that are at the stage of moving typing off. The other observable effect is that if a worker restarts or a replication connect drops then the typing worker will issue a `POSITION typing`, triggering master process to try and stream all typing updates from position 0. Fixes #7907	2020-07-27 14:10:53 +01:00
Will Hunt	62b1ce8539	isort 5 compatibility (#7786 ) The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.	2020-07-05 16:32:02 +01:00
Patrick Cloke	f1e61ef85c	Typo fixes.	2020-06-05 08:43:21 -04:00
Erik Johnston	1531b214fc	Add ability to wait for replication streams (#7542 ) The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room). Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on. People probably want to look at this commit by commit.	2020-05-22 14:21:54 +01:00
Erik Johnston	4734a7bbe4	Move EventStream handling into default ReplicationDataHandler (#7493 ) This is so that the logic can happen on both master and workers when we move event persistence out.	2020-05-14 14:01:39 +01:00
Erik Johnston	d7983b63a6	Support any process writing to cache invalidation stream. (#7436 )	2020-05-07 13:51:08 +01:00
Erik Johnston	0e719f2398	Thread through instance name to replication client. (#7369 ) For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.	2020-05-01 17:19:56 +01:00
Erik Johnston	3085cde577	Use `stream.current_token()` and remove `stream_positions()` (#7172 ) We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.	2020-05-01 15:21:35 +01:00
Erik Johnston	51f7eaf908	Add ability to run replication protocol over redis. (#7040 ) This is configured via the `redis` config options.	2020-04-22 13:07:41 +01:00
Erik Johnston	5016b162fc	Move client command handling out of TCP protocol (#7185 ) The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.	2020-04-06 09:58:42 +01:00
Erik Johnston	4f21c33be3	Remove usage of "conn_id" for presence. (#7128 ) * Remove `conn_id` usage for UserSyncCommand. Each tcp replication connection is assigned a "conn_id", which is used to give an ID to a remotely connected worker. In a redis world, there will no longer be a one to one mapping between connection and instance, so instead we need to replace such usages with an ID generated by the remote instances and included in the replicaiton commands. This really only effects UserSyncCommand. * Add CLEAR_USER_SYNCS command that is sent on shutdown. This should help with the case where a synchrotron gets restarted gracefully, rather than rely on 5 minute timeout.	2020-03-30 16:37:24 +01:00
Erik Johnston	4cff617df1	Move catchup of replication streams to worker. (#7024 ) This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.	2020-03-25 14:54:01 +00:00
Erik Johnston	c3d4ad8afd	Fix sending server up commands from workers (#6811 ) Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2020-01-30 16:42:11 +00:00
Erik Johnston	a8a50f5b57	Wake up transaction queue when remote server comes back online (#6706 ) This will be used to retry outbound transactions to a remote server if we think it might have come back up.	2020-01-17 10:27:19 +00:00
Erik Johnston	48c3a96886	Port synapse.replication.tcp to async/await (#6666 ) * Port synapse.replication.tcp to async/await * Newsfile * Correctly document type of on_<FOO> functions as async * Don't be overenthusiastic with the asyncing....	2020-01-16 09:16:12 +00:00
Erik Johnston	e8b68a4e4b	Fixup synapse.replication to pass mypy checks (#6667 )	2020-01-14 14:08:06 +00:00
Richard van der Hoff	6964ea095b	Reduce the reconnect time when replication fails. (#6617 )	2020-01-03 14:19:09 +00:00
Richard van der Hoff	cc6243b4c0	document the REPLICATE command a bit better (#6305 ) since I found myself wonder how it works	2019-11-04 12:40:18 +00:00
Andrew Morgan	54fef094b3	Remove usage of deprecated logger.warn method from codebase (#6271 ) Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.	2019-10-31 10:23:24 +00:00
Amber Brown	32e7c9e7f2	Run Black. (#5482 )	2019-06-20 19:32:02 +10:00
Richard van der Hoff	f570916a3e	Add parse_row method to replication stream class This will allow individual stream classes to override how a row is parsed.	2019-03-27 21:32:33 +00:00
Richard van der Hoff	acaa18f7dd	Fix/improve some docstrings in the replication code. (#4949 )	2019-03-27 21:12:36 +00:00
Erik Johnston	6870fc496f	Move connecting logic into ClientReplicationStreamProtocol	2019-02-27 10:23:51 +00:00
Erik Johnston	25814921f1	Increase the max delay between retry attempts Otherwise if you have many workers they can easily take out master with their connection attempts	2019-02-26 15:12:33 +00:00
Erik Johnston	313987187e	Fix tightloop over connecting to replication server If the client failed to process incoming commands during the initial set up of the replication connection it would immediately disconnect and reconnect, resulting in a tightloop. This can happen, for example, when subscribing to a stream that has a row that is too long in the backlog. The fix here is to not consider the connection successfully set up until the client has succesfully subscribed and caught up with the streams. This ensures that the retry logic timers aren't reset until then, meaning that if an error does happen during start up the client will continue backing off before retrying again.	2019-02-26 15:05:41 +00:00
Amber Brown	c4b3698a80	Make the replication logger quieter (#4108 )	2018-10-29 22:59:44 +11:00

1 2

57 Commits (735e89bd3a0755883ef0a19649adf84192b5d9fc)