MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
Richard van der Hoff	4182bb812f	move DeferredCache into its own module	2020-10-14 23:38:14 +01:00
Richard van der Hoff	9f87da0a84	Rename Cache->DeferredCache	2020-10-14 23:38:14 +01:00
Richard van der Hoff	7eff59ec91	Add some more type annotations to Cache	2020-10-14 23:38:14 +01:00
Erik Johnston	b2486f6656	Fix message duplication if something goes wrong after persisting the event (#8476 ) Should fix #3365.	2020-10-13 12:07:56 +01:00
Erik Johnston	8de3703d21	Make event persisters periodically announce position over replication. (#8499 ) Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress. This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.	2020-10-12 15:51:41 +01:00
Patrick Cloke	1781bbe319	Add type hints to response cache. (#8507 )	2020-10-09 11:35:11 -04:00
Erik Johnston	5009ffcaa4	Only send RDATA for instance local events. (#8496 ) When pulling events out of the DB to send over replication we were not filtering by instance name, and so we were sending events for other instances.	2020-10-09 13:10:33 +01:00
Patrick Cloke	c9c0ad5e20	Remove the deprecated Handlers object (#8494 ) All handlers now available via get_*_handler() methods on the HomeServer.	2020-10-09 07:24:34 -04:00
Erik Johnston	6c5d5e507e	Add unit test for event persister sharding (#8433 )	2020-10-02 09:57:12 +01:00
Patrick Cloke	4ff0201e62	Enable mypy checking for unreachable code and fix instances. (#8432 )	2020-10-01 08:09:18 -04:00
Erik Johnston	ea70f1c362	Various clean ups to room stream tokens. (#8423 )	2020-09-29 21:48:33 +01:00
Richard van der Hoff	866c84da8d	Add metrics to track success/otherwise of replication requests (#8406 ) One hope is that this might provide some insights into #3365.	2020-09-29 11:06:11 +01:00
Erik Johnston	f112cfe5bb	Fix MultiWriteIdGenerator's handling of restarts. (#8374 ) On startup `MultiWriteIdGenerator` fetches the maximum stream ID for each instance from the table and uses that as its initial "current position" for each writer. This is problematic as a) it involves either a scan of events table or an index (neither of which is ideal), and b) if rows are being persisted out of order elsewhere while the process restarts then using the maximum stream ID is not correct. This could theoretically lead to race conditions where e.g. events that are persisted out of order are not sent down sync streams. We fix this by creating a new table that tracks the current positions of each writer to the stream, and update it each time we finish persisting a new entry. This is a relatively small overhead when persisting events. However for the cache invalidation stream this is a much bigger relative overhead, so instead we note that for invalidation we don't actually care about reliability over restarts (as there's no caches to invalidate) and simply don't bother reading and writing to the new table in that particular case.	2020-09-24 16:53:51 +01:00
Erik Johnston	ac11fcbbb8	Add EventStreamPosition type (#8388 ) The idea is to remove some of the places we pass around `int`, where it can represent one of two things: 1. the position of an event in the stream; or 2. a token that partitions the stream, used as part of the stream tokens. The valid operations are then: 1. did a position happen before or after a token; 2. get all events that happened before or after a token; and 3. get all events between two tokens. (Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)	2020-09-24 13:24:17 +01:00
Patrick Cloke	8a4a4186de	Simplify super() calls to Python 3 syntax. (#8344 ) This converts calls like super(Foo, self) -> super(). Generated with: sed -i "" -Ee 's/super\([^\(]+\)/super()/g' */.py	2020-09-18 09:56:44 -04:00
Jonathan de Jong	a3f124b821	Switch metaclass initialization to python 3-compatible syntax (#8326 )	2020-09-16 15:15:55 -04:00
Patrick Cloke	aec294ee0d	Use slots in attrs classes where possible (#8296 ) slots use less memory (and attribute access is faster) while slightly limiting the flexibility of the class attributes. This focuses on objects which are instantiated "often" and for short periods of time.	2020-09-14 12:50:06 -04:00
Patrick Cloke	d2a3eb04a4	Fix typos in comments.	2020-09-14 11:46:58 -04:00
Erik Johnston	04cc249b43	Add experimental support for sharding event persister. Again. (#8294 ) This is not ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.	2020-09-14 10:16:41 +01:00
Erik Johnston	5d3e306d9f	Clean up `Notifier.on_new_room_event` code path (#8288 ) The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out when the max stream position has caught up to the event and we can notify people about it. This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens. The valid operations here are: 1. Is a position before or after a token 2. Fetching all events between two tokens 3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A or before B, then P is before C. Future PR will change the token type to a dedicated type.	2020-09-10 13:24:43 +01:00
Patrick Cloke	2ea1c68249	Remove some unused distributor signals (#8216 ) Removes the `user_joined_room` and stops calling it since there are no observers. Also cleans-up some other unused signals and related code.	2020-09-09 12:22:00 -04:00
Erik Johnston	c9dbee50ae	Fixup pusher pool notifications (#8287 ) `pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed. I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.	2020-09-09 16:56:08 +01:00
Erik Johnston	dc9dcdbd59	Revert "Fixup pusher pool notifications" This reverts commit `e7fd336a53`.	2020-09-09 16:19:22 +01:00
Erik Johnston	e7fd336a53	Fixup pusher pool notifications	2020-09-09 16:17:50 +01:00
Patrick Cloke	c619253db8	Stop sub-classing object (#8249 )	2020-09-04 06:54:56 -04:00
Brendan Abolivier	9f8abdcc38	Revert "Add experimental support for sharding event persister. (#8170 )" (#8242 ) * Revert "Add experimental support for sharding event persister. (#8170)" This reverts commit `82c1ee1c22`. * Changelog	2020-09-04 10:19:42 +01:00
Erik Johnston	82c1ee1c22	Add experimental support for sharding event persister. (#8170 ) This is not ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.	2020-09-02 15:48:37 +01:00
Richard van der Hoff	aa07c37cf0	Move and rename `get_devices_with_keys_by_user` (#8204 ) * Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore` this seems a better fit for it. This commit simply moves the existing code: no other changes at all. * Rename `get_devices_with_keys_by_user` to better reflect what it does. * get_device_stream_token abstract method To avoid referencing fields which are declared in the derived classes, make `get_device_stream_token` abstract, and define that in the classes which define `_device_list_id_gen`.	2020-09-01 12:41:21 +01:00
Erik Johnston	3b4556cf87	Fix `wait_for_stream_position` for multiple waiters. (#8196 ) This fixes a bug where having multiple callers waiting on the same stream and position will cause it to try and compare two deferreds, which fails (due to the sorted list having an entry of `Tuple[int, Deferred]`).	2020-08-28 17:12:45 +01:00
Erik Johnston	e3c91a3c55	Make SlavedIdTracker.advance have same interface as MultiWriterIDGenerator (#8171 )	2020-08-26 13:15:20 +01:00
Erik Johnston	c9c544cda5	Remove `ChainedIdGenerator`. (#8123 ) It's just a thin wrapper around two ID gens to make `get_current_token` and `get_next` return tuples. This can easily be replaced by calling the appropriate methods on the underlying ID gens directly.	2020-08-19 13:41:51 +01:00
Patrick Cloke	eebf52be06	Be stricter about JSON that is accepted by Synapse (#8106 )	2020-08-19 07:26:03 -04:00
Erik Johnston	76d21d14a0	Separate `get_current_token` into two. (#8113 ) The function is used for two purposes: 1) for subscribers of streams to get a token they can use to get further updates with, and 2) for replication to track position of the writers of the stream. For streams with a single writer the two scenarios produce the same result, however the situation becomes complicated for streams with multiple writers. The current `MultiWriterIdGenerator` does not correctly handle the first case (which is not an issue as its only used for the `caches` stream which nothing subscribes to outside of replication).	2020-08-19 10:39:31 +01:00
Patrick Cloke	ac77cdb64e	Add a shadow-banned flag to users. (#8092 )	2020-08-14 12:37:59 -04:00
David Vo	4dd27e6d11	Reduce unnecessary whitespace in JSON. (#7372 )	2020-08-07 08:02:55 -04:00
Patrick Cloke	d4a7829b12	Convert synapse.api to async/await (#8031 )	2020-08-06 08:30:06 -04:00
Erik Johnston	a7bdf98d01	Rename database classes to make some sense (#8033 )	2020-08-05 21:38:57 +01:00
Patrick Cloke	3b415e23a5	Convert replication code to async/await. (#7987 )	2020-08-03 07:12:55 -04:00
Richard van der Hoff	349119a340	Synapse 1.18.0rc2 (2020-07-28) ============================== Bugfixes -------- - Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) - Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967)) Internal Changes ---------------- - Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl8f/f8ACgkQOSor00I9 eP8/Uwf8CiVWvrBsmFZMvxJDkUWm0/f1kN4IQdm8ibDtyNyvFUx+Y1K8KOQS+VwG a3bZqSC2Vv2sO9O9kR+V2tk831l+ujO0Nlaohuqyvhcl9lzh04rRYI9x9IHlAq2H WPb0NMLwMufL6YkXDBwZT/G9TVW1vLRGASu4f7X2rXqek34VNVgYbg1hB2dp4dDa wjKk3iBZ6h34IhKPgu0sLBUcyvX4U5xdOHjEG3HXvNnvDNO0HMD8rGB7065vFMD6 PH4nUK/h+RL0UBs2sJOMK1ZazFUODdURwANJQNAQ6pNvf9/RWgw2okka2bYIcmQQ UT7tiwMsBvKdy4PER5fcDX3COY16qw== =Q+bI -----END PGP SIGNATURE----- Merge tag 'v1.18.0rc2' into develop Synapse 1.18.0rc2 (2020-07-28) ============================== Bugfixes -------- - Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) - Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967)) Internal Changes ---------------- - Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))	2020-07-28 11:31:31 +01:00
Erik Johnston	a8f7ed28c6	Typing worker needs to handle stream update requests (#7967 ) IIRC this doesn't break tests because its only hit on reconnection, or something. Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.	2020-07-28 11:04:53 +01:00
Richard van der Hoff	f57b99af22	Handle replication commands synchronously where possible (#7876 ) Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.	2020-07-27 18:54:43 +01:00
Patrick Cloke	8553f46498	Convert a synapse.events to async/await. (#7949 )	2020-07-27 13:40:22 -04:00
Erik Johnston	84d099ae11	Fix typing replication not being handled on master (#7959 ) Handling of incoming typing stream updates from replication was not hooked up on master, effecting set ups where typing was handled on a different worker. This is really only a problem if the master process is also handling sync requests, which is unlikely for those that are at the stage of moving typing off. The other observable effect is that if a worker restarts or a replication connect drops then the typing worker will issue a `POSITION typing`, triggering master process to try and stream all typing updates from position 0. Fixes #7907	2020-07-27 14:10:53 +01:00
Richard van der Hoff	931b026844	Remove an unused prometheus metric (#7878 )	2020-07-22 00:40:55 +01:00
Richard van der Hoff	05060e0223	Track command processing as a background process (#7879 ) I'm going to be doing more stuff synchronously, and I don't want to lose the CPU metrics down the sofa.	2020-07-22 00:40:42 +01:00
Karthikeyan Singaravelan	a7b06a81f0	Fix deprecation warning: import ABC from collections.abc (#7892 )	2020-07-20 13:33:04 -04:00
Erik Johnston	2d2acc1cf2	Stop using 'device_max_stream_id' (#7882 ) It serves no purpose and updating everytime we write to the device inbox stream means all such transactions will conflict, causing lots of transaction failures and retries.	2020-07-17 17:03:27 +01:00
Richard van der Hoff	e5300063ed	Optimise queueing of inbound replication commands (#7861 ) When we get behind on replication, we tend to stack up background processes behind a linearizer. Bg processes are heavy (particularly with respect to prometheus metrics) and linearizers aren't terribly efficient once the queue gets long either. A better approach is to maintain a queue of requests to be processed, and nominate a single process to work its way through the queue. Fixes: #7444	2020-07-16 15:49:37 +01:00
Erik Johnston	f2e38ca867	Allow moving typing off master (#7869 )	2020-07-16 15:12:54 +01:00
Erik Johnston	f299441cc6	Add ability to shard the federation sender (#7798 )	2020-07-10 18:26:36 +01:00
Patrick Cloke	38e1fac886	Fix some spelling mistakes / typos. (#7811 )	2020-07-09 09:52:58 -04:00
Richard van der Hoff	2ab0b021f1	Generate real events when we reject invites (#7804 ) Fixes #2181. The basic premise is that, when we fail to reject an invite via the remote server, we can generate our own out-of-band leave event and persist it as an outlier, so that we have something to send to the client.	2020-07-09 10:40:19 +01:00
Patrick Cloke	e7efd8f827	Do not use simplejson in Synapse. (#7800 )	2020-07-08 07:15:08 -04:00
Erik Johnston	67d7756fcf	Refactor getting replication updates from database v2. (#7740 )	2020-07-07 12:11:35 +01:00
Will Hunt	62b1ce8539	isort 5 compatibility (#7786 ) The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.	2020-07-05 16:32:02 +01:00
Erik Johnston	5cdca53aa0	Merge different Resource implementation classes (#7732 )	2020-07-03 19:02:19 +01:00
Richard van der Hoff	f01e2ca039	Use symbolic names for replication stream names (#7768 ) This makes it much easier to find where streams are referenced.	2020-07-01 16:35:40 +01:00
Erik Johnston	f6f7511a4c	Refactor getting replication updates from database. (#7636 ) The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.	2020-06-16 17:10:28 +01:00
Dagfinn Ilmari Mannsåker	a3f11567d9	Replace all remaining six usage with native Python 3 equivalents (#7704 )	2020-06-16 08:51:47 -04:00
Patrick Cloke	7d2532be36	Discard RDATA from already seen positions. (#7648 )	2020-06-15 08:44:54 -04:00
Erik Johnston	664409b169	Fix bug in account data replication stream. (#7656 ) * Ensure account data stream IDs are unique. The account data stream is shared between three tables, and the maximum allocated ID was tracked in a dedicated table. Updating the max ID happened outside the transaction that allocated the ID, leading to a race where if the server was restarted then the same ID could be allocated but the max ID failed to be updated, leading it to be reused. The ID generators have support for tracking across multiple tables, so we may as well use that instead of a dedicated table. * Fix bug in account data replication stream. If the same stream ID was used in both global and room account data then the getting updates for the replication stream would fail due to `heapq.merge(..)` trying to compare a `str` with a `None`. (This is because you'd have two rows like `(534, '!room')` and `(534, None)` from the room and global account data tables). Fix is just to order by stream ID, since we don't rely on the ordering beyond that. The bug where stream IDs can be reused should be fixed now, so this case shouldn't happen going forward. Fixes #7617	2020-06-09 16:28:57 +01:00
Patrick Cloke	f1e61ef85c	Typo fixes.	2020-06-05 08:43:21 -04:00
Erik Johnston	9bac5d62b3	Ensure ReplicationStreamer is always started when replication enabled. (#7579 ) Fixes #7566.	2020-05-27 11:44:19 +01:00
Erik Johnston	e5c67d04db	Add option to move event persistence off master (#7517 )	2020-05-22 16:11:35 +01:00
Erik Johnston	1531b214fc	Add ability to wait for replication streams (#7542 ) The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room). Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on. People probably want to look at this commit by commit.	2020-05-22 14:21:54 +01:00
Erik Johnston	51055c8c44	Allow ReplicationRestResource to be added to workers (#7515 ) This allows workers to talk to each other over HTTP replication.	2020-05-18 12:24:48 +01:00
Richard van der Hoff	4d1afb1dfe	Merge pull request #7519 from matrix-org/rav/kill_py2_code Kill off some old python 2 code	2020-05-18 10:45:30 +01:00
Richard van der Hoff	91f51c611c	remove redundant `__func__` this is a no-op under python 3	2020-05-15 19:37:41 +01:00
Richard van der Hoff	6c1f7c722f	Fix limit logic for AccountDataStream (#7384 ) Make sure that the AccountDataStream presents complete updates, in the right order. This is much the same fix as #7337 and #7358, but applied to a different stream.	2020-05-15 19:03:25 +01:00
Erik Johnston	1f36ff69e8	Move event stream handling out of slave store. (#7491 ) This allows us to have the logic on both master and workers, which is necessary to move event persistence off master. We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.	2020-05-15 16:43:59 +01:00
Erik Johnston	4734a7bbe4	Move EventStream handling into default ReplicationDataHandler (#7493 ) This is so that the logic can happen on both master and workers when we move event persistence out.	2020-05-14 14:01:39 +01:00
Erik Johnston	1de36407d1	Add `instance_map` config and route replication calls (#7495 )	2020-05-14 14:00:58 +01:00
Erik Johnston	7ee24c5674	Have all instances correctly respond to REPLICATE command. (#7475 ) Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands. Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.	2020-05-13 10:27:02 +01:00
Erik Johnston	8ca79613e6	Fix Redis reconnection logic (#7482 ) Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).	2020-05-13 09:57:15 +01:00
Amber Brown	7cb8b4bc67	Allow configuration of Synapse's cache without using synctl or environment variables (#6391 )	2020-05-11 18:45:23 +01:00
Andrew Morgan	5cf758cdd6	Merge branch 'release-v1.13.0' into develop * release-v1.13.0: Don't UPGRADE database rows RST indenting Put rollback instructions in upgrade notes Fix changelog typo Oh yeah, RST Absolute URL it is then Fix upgrade notes link Provide summary of upgrade issues in changelog. Fix ) Move next version notes from changelog to upgrade notes Changelog fixes 1.13.0rc1 Documentation on setting up redis (#7446) Rework UI Auth session validation for registration (#7455) Fix errors from malformed log line (#7454) Drop support for redis.dbid (#7450)	2020-05-11 16:46:33 +01:00
Richard van der Hoff	aa5aa6f96a	Fix errors from malformed log line (#7454 )	2020-05-07 19:51:38 +01:00
Richard van der Hoff	da9b2db3af	Drop support for redis.dbid (#7450 ) Since we only use pubsub, the dbid is irrelevant.	2020-05-07 16:46:15 +01:00
Erik Johnston	d7983b63a6	Support any process writing to cache invalidation stream. (#7436 )	2020-05-07 13:51:08 +01:00
Richard van der Hoff	62ee862119	Merge branch 'release-v1.13.0' into develop	2020-05-06 15:56:03 +01:00
Richard van der Hoff	2e0c46ca07	Merge branch 'release-v1.13.0' into develop	2020-05-06 11:58:31 +01:00
Richard van der Hoff	a8c17da245	Merge branch 'release-v1.13.0' into rav/fix_dropped_messages	2020-05-05 23:01:12 +01:00
Richard van der Hoff	1242267316	Merge branch 'release-v1.13.0' into rav/fix_dropped_messages	2020-05-05 22:38:44 +01:00
Richard van der Hoff	7f7eedbebb	Wait for a POSITION on the right connection before accepting RDATA ... otherwise we can believe we're up to date when we're not.	2020-05-05 22:38:16 +01:00
Brendan Abolivier	5b8023dc7f	Move logs about discarded RDATA to debug (#7421 )	2020-05-05 21:07:33 +02:00
Richard van der Hoff	d78265af0c	Wait to subscribe before sending REPLICATE	2020-05-05 19:31:37 +01:00
Richard van der Hoff	d5aa7d93ed	Fix catchup-on-reconnect for the Federation Stream (#7374 ) looks like we managed to break this during the refactorathon.	2020-05-05 14:15:57 +01:00
Erik Johnston	350421e058	Fix redis password support. (#7401 ) We forgot to set the password on the subscriber connection, as well as not calling super methods for overridden connectionMade/connectionLost functions.	2020-05-04 14:04:09 +01:00
Erik Johnston	0e719f2398	Thread through instance name to replication client. (#7369 ) For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.	2020-05-01 17:19:56 +01:00
Erik Johnston	3085cde577	Use `stream.current_token()` and remove `stream_positions()` (#7172 ) We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.	2020-05-01 15:21:35 +01:00
Richard van der Hoff	b2dba06079	Workaround for assertion errors from db_query_to_update_function (#7378 ) Hopefully this is no worse than what we have on master...	2020-05-01 09:25:16 +01:00
Erik Johnston	37f6823f5b	Add instance name to RDATA/POSITION commands (#7364 ) This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.	2020-04-29 16:23:08 +01:00
Erik Johnston	3eab76ad43	Don't relay REMOTE_SERVER_UP cmds to same conn. (#7352 ) For direct TCP connections we need the master to relay REMOTE_SERVER_UP commands to the other connections so that all instances get notified about it. The old implementation just relayed to all connections, assuming that sending back to the original sender of the command was safe. This is not true for redis, where commands sent get echoed back to the sender, which was causing master to effectively infinite loop sending and then re-receiving REMOTE_SERVER_UP commands that it sent. The fix is to ensure that we only relay to other connections and not to the connection we received the notification from. Fixes #7334.	2020-04-29 14:10:59 +01:00
Richard van der Hoff	c2e1a2110f	Fix limit logic for EventsStream (#7358 ) * Factor out functions for injecting events into database I want to add some more flexibility to the tools for injecting events into the database, and I don't want to clutter up HomeserverTestCase with them, so let's factor them out to a new file. * Rework TestReplicationDataHandler This wasn't very easy to work with: the mock wrapping was largely superfluous, and it's useful to be able to inspect the received rows, and clear out the received list. * Fix AssertionErrors being thrown by EventsStream Part of the problem was that there was an off-by-one error in the assertion, but also the limit logic was too simple. Fix it all up and add some tests.	2020-04-29 12:30:36 +01:00
Erik Johnston	38919b521e	Run replication streamers on workers (#7146 ) Currently we never write to streams from workers, but that will change soon	2020-04-28 13:34:12 +01:00
Richard van der Hoff	ce428a1abe	Fix EventsStream raising assertions when it falls behind Figuring out how to correctly limit updates from this stream without dropping entries is far more complicated than just counting the number of rows being returned. We need to consider each query separately and, if any one query hits the limit, truncate the results from the others. I think this also fixes some potentially long-standing bugs where events or state changes could get missed if we hit the limit on either query.	2020-04-24 13:59:21 +01:00
Richard van der Hoff	9cbdfb3a2f	Make it clear that the limit for an update_function is a target	2020-04-23 15:45:12 +01:00
Richard van der Hoff	23b28266ac	Remove 'limit' param from `get_repl_stream_updates` API there doesn't seem to be much point in passing this limit all around, since both sides agree it's meant to be 100.	2020-04-23 15:44:35 +01:00
Richard van der Hoff	71a1abb8a1	Stop the master relaying USER_SYNC for other workers (#7318 ) Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication. In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits. Fixes (I hope) #7257.	2020-04-22 22:39:04 +01:00
Erik Johnston	841c581c40	Fix replication metrics when using redis (#7325 )	2020-04-22 16:26:19 +01:00

1 2 3 4 5 ...

547 Commits (c14f99be461d8ac9a36ad548e8e463feeda6394c)