Commit Graph

498 Commits (437a99fb99f42e64a0488ffe0a394cbc60921254)

Author SHA1 Message Date
Erik Johnston 841c581c40
Fix replication metrics when using redis (#7325) 2020-04-22 16:26:19 +01:00
Richard van der Hoff 82d8b1dd1f
Another go at fixing one-word commands (#7326)
I messed this up last time I tried (#7239 / e13c6c7).
2020-04-22 14:34:31 +01:00
Erik Johnston 51f7eaf908
Add ability to run replication protocol over redis. (#7040)
This is configured via the `redis` config options.
2020-04-22 13:07:41 +01:00
Richard van der Hoff 0f8f02bc39
On catchup, process each row with its own stream id (#7286)
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.

This code was introduced in #7024, and I hope this fixes #7206.
2020-04-20 11:43:29 +01:00
Richard van der Hoff 67ff7b8ba0
Improve type checking in `replication.tcp.Stream` (#7291)
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.

After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.

I've also introduced a couple of new types, to take out some duplication.
2020-04-17 14:49:55 +01:00
Richard van der Hoff d7d42387f5
Fix 'generator object is not subscriptable' error (#7290)
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.

(think this was introduced in #7024)
2020-04-16 14:37:06 +01:00
Richard van der Hoff e13c6c7a96 Handle one-word replication commands correctly
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
2020-04-07 17:43:46 +01:00
Richard van der Hoff c3e4b4edb2 Fix warnings about not calling superclass constructor
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
2020-04-07 17:40:22 +01:00
Richard van der Hoff 6a519a0ca0 Remove vestigal references to SYNC replication command
We've ripped pretty much all of this out: let's remove the remains.
2020-04-07 17:40:07 +01:00
Erik Johnston ce72355d7f
Fix race in replication (#7226)
Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.
2020-04-07 11:01:04 +01:00
Erik Johnston 82498ee901
Move server command handling out of TCP protocol (#7187)
This completes the merging of server and client command processing.
2020-04-07 10:51:07 +01:00
Erik Johnston 5016b162fc
Move client command handling out of TCP protocol (#7185)
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
2020-04-06 09:58:42 +01:00
Erik Johnston dfa0782254
Remove connections per replication stream metric. (#7195)
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
2020-04-01 10:40:46 +01:00
Erik Johnston 4f21c33be3
Remove usage of "conn_id" for presence. (#7128)
* Remove `conn_id` usage for UserSyncCommand.

Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.

This really only effects UserSyncCommand.

* Add CLEAR_USER_SYNCS command that is sent on shutdown.

This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
2020-03-30 16:37:24 +01:00
Erik Johnston 4cff617df1
Move catchup of replication streams to worker. (#7024)
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
2020-03-25 14:54:01 +00:00
Richard van der Hoff a564b92d37
Convert `*StreamRow` classes to inner classes (#7116)
This just helps keep the rows closer to their streams, so that it's easier to
see what the format of each stream is.
2020-03-23 13:59:11 +00:00
Richard van der Hoff b3cee0ce67
Fix processing of `groups` stream, and use symbolic names for streams (#7117)
`groups` != `receipts`

Introduced in #6964
2020-03-23 11:39:36 +00:00
Erik Johnston fdb1344716
Remove concept of a non-limited stream. (#7011) 2020-03-20 14:40:47 +00:00
Erik Johnston a319cb1dd1
Change device list streams to have one row per ID (#7010)
* Add 'device_lists_outbound_pokes' as extra table.

This makes sure we check all the relevant tables to get the current max
stream ID.

Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.

* Change device lists stream to have one row per id.

This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.

* Change device list replication to match new semantics.

Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).

* Newsfile

* Remove handling of multiple rows per ID

* Fix worker handling

* Comments from review
2020-03-19 11:36:53 +00:00
Erik Johnston 6e6476ef07 Comments from review 2020-03-18 10:13:55 +00:00
Richard van der Hoff 78a15b1f9d
Store room_versions in EventBase objects (#6875)
This is a bit fiddly because it all has to be done on one fell swoop:

* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
2020-03-05 15:46:44 +00:00
Erik Johnston 9ce4e344a8 Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
2020-02-28 11:25:34 +00:00
Erik Johnston c3c6c0e622 Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.

Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
2020-02-28 11:15:11 +00:00
Richard van der Hoff 3e99528f2b
Store room version on invite (#6983)
When we get an invite over federation, store the room version in the rooms table.

The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
2020-02-26 16:58:33 +00:00
Erik Johnston 1f773eec91
Port PresenceHandler to async/await (#6991) 2020-02-26 15:33:26 +00:00
Erik Johnston bbf8886a05
Merge worker apps into one. (#6964) 2020-02-25 16:56:55 +00:00
Erik Johnston 0bd8cf435e Increase MAX_EVENTS_BEHIND for replication clients 2020-02-21 09:04:33 +00:00
Erik Johnston de2d267375
Allow moving group read APIs to workers (#6866) 2020-02-07 11:14:19 +00:00
Erik Johnston c3d4ad8afd
Fix sending server up commands from workers (#6811)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-01-30 16:42:11 +00:00
Erik Johnston e17a110661
Detect unknown remote devices and mark cache as stale (#6776)
We just mark the fact that the cache may be stale in the database for
now.
2020-01-28 14:43:21 +00:00
Erik Johnston d5275fc55f
Propagate cache invalidates from workers to other workers. (#6748)
Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.
2020-01-27 13:47:50 +00:00
Erik Johnston 5d7a6ad223
Allow streaming cache invalidate all to workers. (#6749) 2020-01-22 10:37:00 +00:00
Erik Johnston a8a50f5b57
Wake up transaction queue when remote server comes back online (#6706)
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
2020-01-17 10:27:19 +00:00
Erik Johnston 48c3a96886
Port synapse.replication.tcp to async/await (#6666)
* Port synapse.replication.tcp to async/await

* Newsfile

* Correctly document type of on_<FOO> functions as async

* Don't be overenthusiastic with the asyncing....
2020-01-16 09:16:12 +00:00
Erik Johnston 28c98e51ff
Add `local_current_membership` table (#6655)
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
2020-01-15 14:59:33 +00:00
Erik Johnston e8b68a4e4b
Fixup synapse.replication to pass mypy checks (#6667) 2020-01-14 14:08:06 +00:00
Richard van der Hoff 6964ea095b
Reduce the reconnect time when replication fails. (#6617) 2020-01-03 14:19:09 +00:00
Erik Johnston fa780e9721
Change EventContext to use the Storage class (#6564) 2019-12-20 10:32:02 +00:00
Erik Johnston 9a4fb457cf Change DataStores to accept 'database' param. 2019-12-06 13:30:06 +00:00
Erik Johnston a7f20500ff _CURRENT_STATE_CACHE_NAME is public 2019-12-04 15:45:42 +00:00
Erik Johnston 1056d6885a Move cache invalidation to main data store 2019-12-04 15:21:14 +00:00
Erik Johnston 2173785f0d Propagate reason in remotely rejected invites 2019-11-28 11:31:56 +00:00
Andrew Morgan a8175d0f96
Prevent account_data content from being sent over TCP replication (#6333) 2019-11-26 13:58:39 +00:00
Erik Johnston f9f1c8acbb
Merge pull request #6332 from matrix-org/erikj/query_devices_fix
Fix caching devices for remote servers in worker.
2019-11-26 12:56:05 +00:00
Erik Johnston 35f9165e96 Fixup docs 2019-11-26 12:04:48 +00:00
Andrew Morgan cd96b4586f lint 2019-11-08 15:45:45 +00:00
Andrew Morgan c4bdf2d785 Remove content from being sent for account data rdata stream 2019-11-08 15:44:02 +00:00
Andrew Morgan 1fe3cc2c9c Address review comments 2019-11-06 14:54:24 +00:00
Andrew Morgan 4059d61e26 Don't forget to ratelimit calls outside of RegistrationHandler 2019-11-06 12:01:54 +00:00
Erik Johnston c16e192e2f Fix caching devices for remote servers in worker.
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
2019-11-05 15:49:43 +00:00
Richard van der Hoff cc6243b4c0
document the REPLICATE command a bit better (#6305)
since I found myself wonder how it works
2019-11-04 12:40:18 +00:00
Hubert Chathi 9c94b48bf1 Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify 2019-10-31 12:32:07 -04:00
Hubert Chathi f7e4a582ef clean up code a bit 2019-10-31 12:01:00 -04:00
Andrew Morgan 54fef094b3
Remove usage of deprecated logger.warn method from codebase (#6271)
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Hubert Chathi 998f7fe7d4 make user signatures a separate stream 2019-10-30 17:22:52 -04:00
Hubert Chathi 670972c0e1 Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify 2019-10-30 16:46:31 -04:00
Erik Johnston e577a4b2ad Port replication http server endpoints to async/await 2019-10-29 13:00:51 +00:00
Hubert Chathi 8ac766c44a make notification of signatures work with workers 2019-10-24 22:14:58 -04:00
Erik Johnston bb6264be0b Merge branch 'develop' of github.com:matrix-org/synapse into erikj/refactor_stores 2019-10-22 10:41:18 +01:00
Erik Johnston c66a06ac6b Move storage classes into a main "data store".
This is in preparation for having multiple data stores that offer
different functionality, e.g. splitting out state or event storage.
2019-10-21 16:05:06 +01:00
Hubert Chathi 8e86f5b65c Merge branch 'develop' into uhoreg/e2e_cross-signing_merged 2019-09-07 13:20:34 -04:00
Jorik Schellekens f7c873a643
Trace how long it takes for the send trasaction to complete, including retrys (#5986) 2019-09-05 17:44:55 +01:00
Jorik Schellekens 909827b422
Add opentracing to all client servlets (#5983) 2019-09-05 14:46:04 +01:00
Hubert Chathi a22d58c96c add user signature stream change cache to slaved device store 2019-09-04 19:32:35 -04:00
Andrew Morgan b736c6cd3a
Remove bind_email and bind_msisdn (#5964)
Removes the `bind_email` and `bind_msisdn` parameters from the `/register` C/S API endpoint as per [MSC2140: Terms of Service for ISes and IMs](https://github.com/matrix-org/matrix-doc/pull/2140/files#diff-c03a26de5ac40fb532de19cb7fc2aaf7R107).
2019-09-04 18:24:23 +01:00
Andrew Morgan 4548d1f87e
Remove unnecessary parentheses around return statements (#5931)
Python will return a tuple whether there are parentheses around the returned values or not.

I'm just sick of my editor complaining about this all over the place :)
2019-08-30 16:28:26 +01:00
Jorik Schellekens 812ed6b0d5
Opentracing across workers (#5771)
Propagate opentracing contexts across workers


Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
  extract methods. - useful when injecting into carriers
  locally. Otherwise we'd always have to include our
  own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
  from a request to a servlet
2019-08-22 18:08:07 +01:00
Brendan Abolivier 1c5b8c6222 Revert "Add "require_consent" parameter for registration"
This reverts commit 3320aaab3a.
2019-08-22 14:47:34 +01:00
Half-Shot 3320aaab3a Add "require_consent" parameter for registration 2019-08-22 14:21:54 +01:00
Andrew Morgan baf081cd3b Bugfixes
--------
 
 - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh
 bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf
 j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck
 kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL
 yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS
 ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP
 qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB
 IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ
 SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr
 30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX
 GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI
 tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi
 3I7IW5aetZr2IQ==
 =+JBn
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.0rc2' into develop

Bugfixes
--------

- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
2019-07-24 13:47:51 +01:00
Jorik Schellekens cf2972c818
Fix servlet metric names (#5734)
* Fix servlet metric names

Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

* Remove redundant check

* Cover all return paths
2019-07-24 13:07:35 +01:00
Amber Brown 4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Richard van der Hoff 824707383b
Remove access-token support from RegistrationHandler.register (#5641)
Nothing uses this now, so we can remove the dead code, and clean up the
API.

Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
2019-07-08 19:01:08 +01:00
Richard van der Hoff 80cc82a445
Remove support for invite_3pid_guest. (#5625)
This has never been documented, and I'm not sure it's ever been used outside
sytest.

It's quite a lot of poorly-maintained code, so I'd like to get rid of it.

For now I haven't removed the database table; I suggest we leave that for a
future clearout.
2019-07-05 16:47:58 +01:00
Amber Brown 463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Amber Brown 32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Erik Johnston 6745b7de6d Handle failing to talk to master over replication 2019-06-07 10:47:31 +01:00
Erik Johnston 5dbff34509 Fixup bsaed on review comments 2019-05-17 15:48:04 +01:00
Erik Johnston d46aab3fa8 Add basic editing support 2019-05-16 16:54:45 +01:00
Erik Johnston b5c62c6b26 Fix relations in worker mode 2019-05-16 10:38:13 +01:00
Richard van der Hoff f50efcb65d Replace SlavedKeyStore with a shim
since we're pulling everything out of KeyStore anyway, we may as well simplify
it.
2019-04-08 23:59:07 +01:00
Richard van der Hoff 3352baac4b
Remove unused server_tls_certificates functions (#5028)
These have been unused since #4120, and with the demise of perspectives, it is
unlikely that they will ever be used again.
2019-04-08 21:50:18 +01:00
Neil Johnson e8419554ff
Remove presence lists (#4989)
Remove presence list support as per MSC 1819
2019-04-03 11:11:15 +01:00
Richard van der Hoff 297bf2547e
Fix sync bug when accepting invites (#4956)
Hopefully this time we really will fix #4422.

We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
2019-04-02 12:42:39 +01:00
Richard van der Hoff 4b91c313a9 Combine the CurrentStateDeltaStream into the EventStream 2019-03-27 22:07:05 +00:00
Richard van der Hoff 1f6d6f918a Make EventStream rows have a type
... as a precursor to combining it with the CurrentStateDelta stream.
2019-03-27 22:07:05 +00:00
Richard van der Hoff 015b3622eb Skip building a ROW_TYPE when building updates
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
2019-03-27 21:58:03 +00:00
Richard van der Hoff f570916a3e Add parse_row method to replication stream class
This will allow individual stream classes to override how a row is parsed.
2019-03-27 21:32:33 +00:00
Richard van der Hoff 71dcb275f1 move FederationStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff aa1e017864 move EventsStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff a5798de067 Move replication.tcp.streams into a package 2019-03-27 21:13:14 +00:00
Richard van der Hoff acaa18f7dd
Fix/improve some docstrings in the replication code. (#4949) 2019-03-27 21:12:36 +00:00
Richard van der Hoff 8cbbedaa2b
Fix ClientReplicationStreamProtocol.__str__ (#4929)
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.

We can calculate the peer addr from the transport, so there is no need for addr
anyway.
2019-03-25 16:41:51 +00:00
Richard van der Hoff 9bde730ef8
Fix bug where read-receipts lost their timestamps (#4927)
Make sure that they are sent correctly over the replication stream.

Fixes: #4898
2019-03-25 16:38:05 +00:00
Richard van der Hoff cdb8036161
Add a config option for torture-testing worker replication. (#4902)
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
2019-03-20 16:04:35 +00:00
Erik Johnston face0c5b3c Prefill client IPs cache on workers 2019-03-06 17:39:32 +00:00
Andrew Morgan 7b8a157b79
Merge pull request #4792 from matrix-org/anoa/replication_tokens
Support batch updates in the worker sender
2019-03-06 15:48:29 +00:00
Brendan Abolivier a4c3a361b7
Add rate-limiting on registration (#4735)
* Rate-limiting for registration

* Add unit test for registration rate limiting

* Add config parameters for rate limiting on auth endpoints

* Doc

* Fix doc of rate limiting function

Co-Authored-By: babolivier <contact@brendanabolivier.com>

* Incorporate review

* Fix config parsing

* Fix linting errors

* Set default config for auth rate limiting

* Fix tests

* Add changelog

* Advance reactor instead of mocked clock

* Move parameters to registration specific config and give them more sensible default values

* Remove unused config options

* Don't mock the rate limiter un MAU tests

* Rename _register_with_store into register_with_store

* Make CI happy

* Remove unused import

* Update sample config

* Fix ratelimiting test for py2

* Add non-guest test
2019-03-05 14:25:33 +00:00
Andrew Morgan b9f6163092 Simplify token replication logic 2019-03-05 13:58:30 +00:00
Erik Johnston a84b8d56c2 Fixup slave stores 2019-03-04 18:04:57 +00:00
Andrew Morgan fe7bd23a85 Clean up logic and add comments 2019-03-04 15:08:15 +00:00
Andrew Morgan 9f7cdf3da1 Clearer branching, fix missing list clear 2019-03-04 14:36:52 +00:00
Andrew Morgan 5f0c449dd5 Prevent replication wedging 2019-03-04 14:03:18 +00:00
Erik Johnston 1e315017d3 When presence is enabled don't send over replication 2019-02-27 13:53:46 +00:00
Erik Johnston 7590e9fa28
Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff
Fix tightloop over connecting to replication server
2019-02-27 11:00:59 +00:00
Erik Johnston 6bb1c028f1 Limit cache invalidation replication line length (#4748) 2019-02-27 10:28:37 +00:00
Erik Johnston 6870fc496f Move connecting logic into ClientReplicationStreamProtocol 2019-02-27 10:23:51 +00:00
Erik Johnston 25814921f1 Increase the max delay between retry attempts
Otherwise if you have many workers they can easily take out master with
their connection attempts
2019-02-26 15:12:33 +00:00
Erik Johnston 313987187e Fix tightloop over connecting to replication server
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.

This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.

The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
2019-02-26 15:05:41 +00:00
Erik Johnston 80467bbac3 Fix state cache invalidation on workers 2019-02-22 14:38:14 +00:00
Erik Johnston dbdc565dfd Fix registration on workers (#4682)
* Move RegistrationHandler init to HomeServer

* Move post registration actions to RegistrationHandler

* Add post regisration replication endpoint

* Newsfile
2019-02-20 18:47:31 +11:00
Erik Johnston a9b5ea6fc1 Batch cache invalidation over replication
Currently whenever the current state changes in a room invalidate a lot
of caches, which cause *a lot* of traffic over replication. Instead,
lets batch up all those invalidations and send a single poke down
the replication streams.

Hopefully this will reduce load on the master process by substantially
reducing traffic.
2019-02-18 17:53:31 +00:00
Erik Johnston af691e415c Move register_device into handler 2019-02-18 16:49:38 +00:00
Erik Johnston eb2b8523ae Split out registration to worker
This allows registration to be handled by a worker, though the actual
write to the database still happens on master.

Note: due to the in-memory session map all registration requests must be
handled by the same worker.
2019-02-18 12:12:57 +00:00
Erik Johnston a4f52a33fe Fix replication for room v3 (#4523)
* Fix replication for room v3

We were not correctly quoting the path fragments over http replication,
which meant that it exploded when the event IDs had a slash in them

* Newsfile
2019-01-30 14:19:52 +00:00
Erik Johnston b6b73a0bcf Fix receiving events from federation via a worker
This bug was introduced in PR #4470, commit 678a92cb56
2019-01-29 10:30:26 +00:00
Erik Johnston 678a92cb56 Replace missed usages of FrozenEvent 2019-01-25 10:32:30 +00:00
Erik Johnston be6a7e47fa
Revert "Require event format version to parse or create events" 2019-01-25 10:23:51 +00:00
Erik Johnston e8c9f15397 Replace missed usages of FrozenEvent 2019-01-24 11:14:07 +00:00
Erik Johnston a163b748a5 Don't truncate command name in metrics 2018-10-29 17:34:21 +00:00
Amber Brown c4b3698a80
Make the replication logger quieter (#4108) 2018-10-29 22:59:44 +11:00
Amber Brown 381d2cfdf0
Make workers work on Py3 (#4027) 2018-10-13 00:14:08 +11:00
Travis Ralston f1a7264663
Fix minor typo in exception 2018-09-13 11:51:12 -06:00
Amber Brown 7c27c4d51c
merge (#3576) 2018-09-14 03:11:11 +10:00
Erik Johnston 3e242dc149 Remove conn_id 2018-09-04 11:45:52 +01:00
Erik Johnston b13836da7f Remove conn_id from repl prometheus metrics
`conn_id` gets set to a random string, and so we end up filling up
prometheus with tonnes of data series, which is bad.
2018-09-03 17:22:49 +01:00
Erik Johnston 2aa7cc6a46
Merge pull request #3713 from matrix-org/erikj/fixup_fed_logging
Fix logging bug in EDU handling over replication
2018-08-20 10:51:45 +01:00
Erik Johnston 3b2dcfff78 Fix logging bug in EDU handling over replication 2018-08-17 11:11:06 +01:00
Richard van der Hoff 0e8d78f6aa Logcontexts for replication command handlers
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.

Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
2018-08-17 00:43:43 +01:00
Erik Johnston 488ffe6fdb Use federation handler function rather than duplicate
This involves renaming _persist_events to be a public function.
2018-08-15 14:17:18 +01:00
Erik Johnston 773db62a22 Rename slave TransactionStore to SlaveTransactionStore 2018-08-15 14:17:06 +01:00
Erik Johnston b179537f2a Move clean_room_for_join to master 2018-08-09 10:37:38 +01:00
Erik Johnston 72d1902bbe Fixup doc comments 2018-08-09 10:23:49 +01:00
Erik Johnston 5785b93711 Merge branch 'develop' of github.com:matrix-org/synapse into erikj/split_federation 2018-08-09 10:16:16 +01:00
Erik Johnston 2bdafaf3c1
Merge pull request #3632 from matrix-org/erikj/refactor_repl_servlet
Add helper base class for generating new replication endpoints
2018-08-09 10:06:23 +01:00
Erik Johnston 62564797f5 Fixup wording and remove dead code 2018-08-09 09:56:10 +01:00
Erik Johnston bebe325e6c Rename POST param to METHOD 2018-08-08 10:36:18 +01:00
Erik Johnston 5011417632 Fixup logging and docstrings 2018-08-08 10:29:58 +01:00
Erik Johnston 1e2bed9656 Import all functions from TransactionStore 2018-08-06 15:23:38 +01:00
Erik Johnston a3f5bf79a0 Add EDU/query handling over replication 2018-08-06 15:23:31 +01:00
Erik Johnston e26dbd82ef Add replication APIs for persisting federation events 2018-08-06 15:02:28 +01:00
Erik Johnston 051a99c400 Fix isort 2018-08-06 14:29:31 +01:00
Richard van der Hoff 0ca459ea33 Basic support for room versioning
This is the first tranche of support for room versioning. It includes:
 * setting the default room version in the config file
 * new room_version param on the createRoom API
 * storing the version of newly-created rooms in the m.room.create event
 * fishing the version of existing rooms out of the m.room.create event
2018-08-03 16:08:32 +01:00
Erik Johnston cb298ff623 Merge branch 'develop' of github.com:matrix-org/synapse into erikj/refactor_repl_servlet 2018-08-03 09:25:15 +01:00
Richard van der Hoff 01e93f48ed Kill off MatrixCodeMessageException
This code brings the SimpleHttpClient into line with the
MatrixFederationHttpClient by having it raise HttpResponseExceptions when a
request fails (rather than trying to parse for matrix errors and maybe raising
MatrixCodeMessageException).

Then, whenever we were checking for MatrixCodeMessageException and turning them
into SynapseErrors, we now need to check for HttpResponseExceptions and call
to_synapse_error.
2018-08-01 16:02:46 +01:00
Erik Johnston 443da003bc Use new helper base class for membership requests 2018-07-31 14:32:23 +01:00
Erik Johnston 729b672823 Use new helper base class for ReplicationSendEventRestServlet 2018-07-31 14:32:23 +01:00
Erik Johnston d81602b75a Add helper base class for generating new replication endpoints
This will hopefully reduce the boiler plate required to implement new
internal HTTP requests.
2018-07-31 14:32:20 +01:00
Richard van der Hoff f59be4eb0e Fix unit tests
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
2018-07-25 10:30:36 +01:00
Richard van der Hoff 371da42ae4 Wrap a number of things that run in the background
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00