MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
Erik Johnston	33548f37aa	Improve tracing for to device messages (#9686 )	2021-04-01 17:08:21 +01:00
Patrick Cloke	da75d2ea1f	Add type hints for the federation sender. (#9681 ) Includes an abstract base class which both the FederationSender and the FederationRemoteSendQueue must implement.	2021-03-29 11:43:20 -04:00
Erik Johnston	c602ba8336	Fixed undefined variable error in catchup (#9664 ) Broke in #9640 Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>	2021-03-24 16:12:47 +00:00
Erik Johnston	dd71eb0f8a	Make federation catchup send last event from any server. (#9640 ) Currently federation catchup will send the last local event that we failed to send to the remote. This can cause issues for large rooms where lots of servers have sent events while the remote server was down, as when it comes back up again it'll be flooded with events from various points in the DAG. Instead, let's make it so that all the servers send the most recent events, even if its not theirs. The remote should deduplicate the events, so there shouldn't be much overhead in doing this. Alternatively, the servers could only send local events if they were also extremities and hope that the other server will send the event over, but that is a bit risky.	2021-03-18 15:52:26 +00:00
Erik Johnston	026503fa3b	Don't go into federation catch up mode so easily (#9561 ) Federation catch up mode is very inefficient if the number of events that the remote server has missed is small, since handling gaps can be very expensive, c.f. #9492. Instead of going into catch up mode whenever we see an error, we instead do so only if we've backed off from trying the remote for more than an hour (the assumption being that in such a case it is more than a transient failure).	2021-03-15 14:42:40 +00:00
Richard van der Hoff	8a4b3738f3	Replace `last_*_pdu_age` metrics with timestamps (#9540 ) Following the advice at https://prometheus.io/docs/practices/instrumentation/#timestamps-not-time-since, it's preferable to export unix timestamps, not ages. There doesn't seem to be any particular naming convention for timestamp metrics.	2021-03-04 16:40:18 +00:00
Andrew Morgan	8bcfc2eaad	Be smarter about which hosts to send presence to when processing room joins (#9402 ) This PR attempts to eliminate unnecessary presence sending work when your local server joins a room, or when a remote server joins a room your server is participating in by processing state deltas in chunks rather than individually. --- When your server joins a room for the first time, it requests the historical state as well. This chunk of new state is passed to the presence handler which, after filtering that state down to only membership joins, will send presence updates to homeservers for each join processed. It turns out that we were being a bit naive and processing each event individually, and sending out presence updates for every one of those joins. Even if many different joins were users on the same server (hello IRC bridges), we'd send presence to that same homeserver for every remote user join we saw. This PR attempts to deduplicate all of that by processing the entire batch of state deltas at once, instead of only doing each join individually. We process the joins and note down which servers need which presence: * If it was a local user join, send that user's latest presence to all servers in the room * If it was a remote user join, send the presence for all local users in the room to that homeserver We deduplicate by inserting all of those pending updates into a dictionary of the form: ``` { server_name1: {presence_update1, ...}, server_name2: {presence_update1, presence_update2, ...} } ``` Only after building this dict do we then start sending out presence updates.	2021-02-19 11:37:29 +00:00
Eric Eastwood	0a00b7ff14	Update black, and run auto formatting over the codebase (#9381 ) - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](`80d6dc9783/docs/code_style.md`) - Update `code_style.md` docs around installing black to use the correct version	2021-02-16 22:32:34 +00:00
Erik Johnston	dd8da8c5f6	Precompute joined hosts and store in Redis (#9198 )	2021-01-26 13:57:31 +00:00
Erik Johnston	921a3f8a59	Fix not sending events over federation when using sharded event persisters (#8536 ) * Fix outbound federaion with multiple event persisters. We incorrectly notified federation senders that the minimum persisted stream position had advanced when we got an `RDATA` from an event persister. Notifying of federation senders already correctly happens in the notifier, so we just delete the offending line. * Change some interfaces to use RoomStreamToken. By enforcing use of `RoomStreamTokens` we make it less likely that people pass in random ints that they got from somewhere random.	2020-10-14 13:27:51 +01:00
Richard van der Hoff	f31f8e6319	Remove stream ordering from Metadata dict (#8452 ) There's no need for it to be in the dict as well as the events table. Instead, we store it in a separate attribute in the EventInternalMetadata object, and populate that on load. This means that we can rely on it being correctly populated for any event which has been persited to the database.	2020-10-05 14:43:14 +01:00
Richard van der Hoff	3bd3707cb9	Fix malformed log line in new federation "catch up" logic (#8442 )	2020-10-02 11:05:29 +01:00
Richard van der Hoff	c1ef579b63	Add prometheus metrics to track federation delays (#8430 ) Add a pair of federation metrics to track the delays in sending PDUs to/from particular servers.	2020-10-01 11:09:12 +01:00
reivilibre	36efbcaf51	Catch-up after Federation Outage (bonus): Catch-up on Synapse Startup (#8322 ) Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Fix _set_destination_retry_timings This came about because the code assumed that retry_interval could not be NULL — which has been challenged by catch-up.	2020-09-18 14:59:13 +01:00
reivilibre	576bc37d31	Catch-up after Federation Outage (split, 4): catch-up loop (#8272 )	2020-09-15 09:07:19 +01:00
reivilibre	17fa4c7ca7	Catch up after Federation Outage (split, 2): Track last successful stream ordering after transmission (#8247 ) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2020-09-04 15:06:51 +01:00
reivilibre	58f61f10f7	Catch-up after Federation Outage (split, 1) (#8230 ) Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>	2020-09-04 12:22:23 +01:00
Patrick Cloke	c619253db8	Stop sub-classing object (#8249 )	2020-09-04 06:54:56 -04:00
reivilibre	4535e849d7	Remove obsolete order field in `send_new_transaction` (#8245 ) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2020-09-03 19:23:07 +01:00
Patrick Cloke	5758dcf30c	Add type hints for state. (#8140 )	2020-08-24 14:25:27 -04:00
Patrick Cloke	eebf52be06	Be stricter about JSON that is accepted by Synapse (#8106 )	2020-08-19 07:26:03 -04:00
Patrick Cloke	ad6190c925	Convert stream database to async/await. (#8074 )	2020-08-17 07:24:46 -04:00
reivilibre	ff0e894656	Drop federation transmission queues during a significant remote outage. (#7864 ) * Empty federation transmission queues when we are backing off. Fixes #7828. Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> * Address feedback Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> * Reword newsfile	2020-08-13 12:35:04 +01:00
Erik Johnston	9d1e4942ab	Fix typing for notifier (#8064 )	2020-08-12 14:03:08 +01:00
Olivier Wilkinson (reivilibre)	3aa36b782c	Merge branch 'master' into develop	2020-07-30 15:18:36 +01:00
Patrick Cloke	c978f6c451	Convert federation client to async/await. (#7975 )	2020-07-30 08:01:33 -04:00
Erik Johnston	2c1b9d6763	Update worker docs with recent enhancements (#7969 )	2020-07-29 23:22:13 +01:00
Patrick Cloke	b975fa2e99	Convert state resolution to async/await (#7942 )	2020-07-24 10:59:51 -04:00
Patrick Cloke	fefe9943ef	Convert presence handler helpers to async/await. (#7939 )	2020-07-23 16:47:36 -04:00
Erik Johnston	649a7ead5c	Add ability to run multiple pusher instances (#7855 ) This reuses the same scheme as federation sender sharding	2020-07-16 14:06:28 +01:00
Olivier Wilkinson (reivilibre)	12528dc42f	Remove obsolete comment. It was correct at the time of our friend Jorik writing it (checking git blame), but the world has moved now and it is no longer a generator. Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>	2020-07-16 11:12:48 +01:00
Erik Johnston	f299441cc6	Add ability to shard the federation sender (#7798 )	2020-07-10 18:26:36 +01:00
Patrick Cloke	38e1fac886	Fix some spelling mistakes / typos. (#7811 )	2020-07-09 09:52:58 -04:00
Erik Johnston	1e03513f9a	Fix new metric where we used ms instead of seconds (#7771 ) Introduced in #7755, not yet released.	2020-07-01 15:23:58 +01:00
Erik Johnston	a99658074d	Add some metrics for inbound and outbound federation processing times (#7755 )	2020-06-30 16:58:06 +01:00
Patrick Cloke	bd6dc17221	Replace iteritems/itervalues/iterkeys with native versions. (#7692 )	2020-06-15 07:03:36 -04:00
Richard van der Hoff	075375bbc9	add a comment	2020-05-21 13:25:41 +01:00
Richard van der Hoff	d5aa7d93ed	Fix catchup-on-reconnect for the Federation Stream (#7374 ) looks like we managed to break this during the refactorathon.	2020-05-05 14:15:57 +01:00
Erik Johnston	4cff617df1	Move catchup of replication streams to worker. (#7024 ) This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.	2020-03-25 14:54:01 +00:00
Erik Johnston	b08b0a22d5	Add typing to synapse.federation.sender (#6871 )	2020-02-07 13:56:38 +00:00
Erik Johnston	a8a50f5b57	Wake up transaction queue when remote server comes back online (#6706 ) This will be used to retry outbound transactions to a remote server if we think it might have come back up.	2020-01-17 10:27:19 +00:00
Erik Johnston	d386f2f339	Add StateMap type alias (#6715 )	2020-01-16 13:31:22 +00:00
Andrew Morgan	3916e1b97a	Clean up newline quote marks around the codebase (#6362 )	2019-11-21 12:00:14 +00:00
Hubert Chathi	6f4bc6d01d	Merge branch 'develop' into cross-signing_federation	2019-10-31 22:38:21 -04:00
Amber Brown	020add5099	Update black to 19.10b0 (#6304 ) * update version of black and also fix the mypy config being overridden	2019-11-01 02:43:24 +11:00
Andrew Morgan	54fef094b3	Remove usage of deprecated logger.warn method from codebase (#6271 ) Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.	2019-10-31 10:23:24 +00:00
Hubert Chathi	bb6cec27a5	rename get_devices_by_remote to get_device_updates_by_remote	2019-10-30 14:57:34 -04:00
Hubert Chathi	c40d7244f8	Merge branch 'develop' into cross-signing_federation	2019-10-24 22:31:25 -04:00
Hubert Chathi	8d3542a64e	implement federation parts of cross-signing	2019-10-22 19:04:35 -04:00
Erik Johnston	c66a06ac6b	Move storage classes into a main "data store". This is in preparation for having multiple data stores that offer different functionality, e.g. splitting out state or event storage.	2019-10-21 16:05:06 +01:00
Richard van der Hoff	66537e10ce	add some metrics on the federation sender (#6160 )	2019-10-03 17:47:20 +01:00
Jorik Schellekens	ef20aa52eb	use access methods (duh..) Co-Authored-By: Erik Johnston <erik@matrix.org>	2019-09-05 15:07:17 +01:00
Jorik Schellekens	1d65292e94	Link the send loop with the edus contexts The contexts were being filtered too early so the send loop wasn't being linked to them unless the destination was whitelisted.	2019-09-05 14:42:37 +01:00
Jorik Schellekens	8767b63a82	Propagate opentracing contexts through EDUs (#5852 ) Propagate opentracing contexts through EDUs Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2019-08-22 18:21:10 +01:00
Amber Brown	4806651744	Replace returnValue with return (#5736 )	2019-07-23 23:00:55 +10:00
Richard van der Hoff	a6a776f3d8	remove dead transaction persist code (#5622 ) this hasn't done anything for years	2019-07-05 12:59:42 +01:00
Amber Brown	463b072b12	Move logging utilities out of the side drawer of util/ and into logging/ (#5606 )	2019-07-04 00:07:04 +10:00
Amber Brown	32e7c9e7f2	Run Black. (#5482 )	2019-06-20 19:32:02 +10:00
Erik Johnston	b42f90470f	Add experimental option to reduce extremities. Adds new config option `cleanup_extremities_with_dummy_events` which periodically sends dummy events to rooms with more than 10 extremities. THIS IS REALLY EXPERIMENTAL.	2019-06-18 15:02:18 +01:00
Richard van der Hoff	5c15039e06	Clean up code for sending federation EDUs. (#5381 ) This code confused the hell out of me today. Split _get_new_device_messages into its two (unrelated) parts.	2019-06-13 13:52:08 +01:00
Andrew Morgan	2d1d7b7e6f	Prevent multiple device list updates from breaking a batch send (#5156 ) fixes #5153	2019-06-06 23:54:00 +01:00
Richard van der Hoff	130f932cbc	Run `black` on per_destination_queue ... mostly to fix pep8 fails	2019-05-09 16:27:02 +01:00
Quentin Dufour	11ea16777f	Limit the number of EDUs in transactions to 100 as expected by receiver (#5138 ) Fixes #3951.	2019-05-09 11:01:41 +01:00
Erik Johnston	197fae1639	Use event streams to calculate presence Primarily this fixes a bug in the handling of remote users joining a room where the server sent out the presence for all local users in the room to all servers in the room. We also change to using the state delta stream, rather than the distributor, as it will make it easier to split processing out of the master process (as well as being more flexible). Finally, when sending presence states to newly joined servers we filter out old presence states to reduce the number sent. Initially we filter out states that are offline and have a last active more than a week ago, though this can be changed down the line. Fixes #3962	2019-03-27 13:41:36 +00:00
Richard van der Hoff	a902d13180	Batch up outgoing read-receipts to reduce federation traffic. (#4890 ) Rate-limit outgoing read-receipts as per #4730.	2019-03-20 16:02:25 +00:00
Richard van der Hoff	02e23b36bc	Rename and move the classes	2019-03-13 20:02:56 +00:00

1 2 3

116 Commits (1e571cd66437ea2455c203dafb94c20ba48cdcc1)