MatrixSynapse

Commit Graph

Author	SHA1	Message	Date
Richard van der Hoff	642199570c	Improve the logging when handling a federation transaction (#3904 ) Let's try to rationalise the logging that happens when we are processing an incoming transaction, to make it easier to figure out what is going wrong when they take ages. In particular: - make everything start with a [room_id event_id] prefix - make sure we log a warning when catching exceptions rather than just turning them into other, more cryptic, exceptions.	2018-09-19 17:28:18 +01:00
Erik Johnston	9407bcf37a	Replace custom DeferredTimeoutError with defer.TimeoutError	2018-09-19 11:07:29 +01:00
Erik Johnston	6c48aa0256	Run canceller first to allow it to generate correct error	2018-09-19 11:07:27 +01:00
Erik Johnston	a334e1cace	Update to use new timeout function everywhere. The existing deferred timeout helper function (and the one into twisted) suffer from a bug when a deferred's canceller throws an exception, #3842. The new helper function doesn't suffer from this problem.	2018-09-19 10:39:40 +01:00
Erik Johnston	24efb2a70d	Fix timeout function Turns out deferred.cancel sometimes throws, so we do that last to ensure that we always do resolve the new deferred.	2018-09-15 11:38:39 +01:00
Erik Johnston	fcfe7a850d	Add an awful secondary timeout to fix wedged requests This is an attempt to mitigate #3842 by adding yet-another-timeout	2018-09-14 19:23:07 +01:00
Erik Johnston	0a81038ea0	Add in flight real time metrics for Measure blocks	2018-09-14 15:08:37 +01:00
Erik Johnston	9e05c8d309	Change the manhole SSH key to have more bits Newer versions of openssh client refuse to connect to the old key due to its length.	2018-09-11 10:42:10 +01:00
Richard van der Hoff	be6527325a	Fix exceptions when a connection is closed before we read the headers This fixes bugs introduced in #3700, by making sure that we behave sanely when an incoming connection is closed before the headers are read.	2018-08-20 18:21:10 +01:00
Richard van der Hoff	55e6bdf287	Robustness fix for logcontext filter Make the logcontext filter not explode if it somehow ends up with a logcontext of None, since that infinite-loops the whole logging system.	2018-08-20 18:20:07 +01:00
Amber Brown	324525f40c	Port over enough to get some sytests running on Python 3 (#3668 )	2018-08-20 23:54:49 +10:00
Richard van der Hoff	c31793a784	Merge branch 'rav/fix_linearizer_cancellation' into develop	2018-08-10 14:57:27 +01:00
Amber Brown	b37c472419	Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678 )	2018-08-10 23:50:21 +10:00
Richard van der Hoff	638d35ef08	Fix linearizer cancellation on twisted < 18.7 Turns out that cancellation of inlineDeferreds didn't really work properly until Twisted 18.7. This commit refactors Linearizer.queue to avoid inlineCallbacks.	2018-08-10 10:59:09 +01:00
Amber Brown	da7785147d	Python 3: Convert some unicode/bytes uses (#3569 )	2018-08-02 00:54:06 +10:00
Richard van der Hoff	a8cbce0ced	fix invalidation	2018-07-27 16:17:17 +01:00
Richard van der Hoff	f102c05856	Rewrite cache list decorator Because it was complicated and annoyed me. I suspect this will be more efficient too.	2018-07-27 13:47:04 +01:00
Richard van der Hoff	03751a6420	Fix some looping_call calls which were broken in #3604 It turns out that looping_call does check the deferred returned by its callback, and (at least in the case of client_ips), we were relying on this, and I broke it in #3604. Update run_as_background_process to return the deferred, and make sure we return it to clock.looping_call.	2018-07-26 11:48:08 +01:00
Richard van der Hoff	3d6df84658	Test and fix support for cancellation in Linearizer	2018-07-20 13:59:55 +01:00
Richard van der Hoff	7c712f95bb	Combine Limiter and Linearizer Linearizer was effectively a Limiter with max_count=1, so rather than maintaining two sets of code, let's combine them.	2018-07-20 13:11:43 +01:00
Richard van der Hoff	8462c26485	Improvements to the Limiter * give them names, to improve logging * use a deque rather than a list for efficiency	2018-07-20 12:50:27 +01:00
Richard van der Hoff	d7275eecf3	Add a sleep to the Limiter to fix stack overflows. Fixes #3570	2018-07-20 12:37:12 +01:00
Amber Brown	95ccb6e2ec	Don't spew errors because we can't save metrics (#3563 )	2018-07-19 20:58:18 +10:00
Richard van der Hoff	8c69b735e3	Make Distributor run its processes as a background process This is more involved than it might otherwise be, because the current implementation just drops its logcontexts and runs everything in the sentinel context. It turns out that we aren't actually using a bunch of the functionality here (notably suppress_failures and the fact that Distributor.fire returns a deferred), so the easiest way to fix this is actually by simplifying a bunch of code.	2018-07-18 20:55:05 +01:00
Richard van der Hoff	667fba68f3	Run things as background processes This fixes #3518, and ensures that we get useful logs and metrics for lots of things that happen in the background. (There are certainly more things that happen in the background; these are just the common ones I've found running a single-process synapse locally).	2018-07-18 20:55:05 +01:00
Erik Johnston	b2aa05a8d6	Use efficient .intersection	2018-07-17 11:07:04 +01:00
Erik Johnston	547b1355d3	Fix perf regression in PR #3530 The get_entities_changed function was changed to return all changed entities since the given stream position, rather than only those changed from a given list of entities. This resulted in the function incorrectly returning large numbers of entities that, for example, caused large increases in database usage.	2018-07-17 10:27:51 +01:00
Amber Brown	3fe0938b76	Merge pull request #3530 from matrix-org/erikj/stream_cache Don't return unknown entities in get_entities_changed	2018-07-17 13:44:46 +10:00
Richard van der Hoff	33b40d0a25	Make FederationRateLimiter queue requests properly popitem removes the most recent item by default [1]. We want the oldest. Fixes #3524 [1]: https://docs.python.org/2/library/collections.html#collections.OrderedDict.popitem	2018-07-13 16:19:40 +01:00
Erik Johnston	77b692e65d	Don't return unknown entities in get_entities_changed The stream cache keeps track of all entities that have changed since a particular stream position, so get_entities_changed does not need to return unknown entites when given a larger stream position. This makes it consistent with the behaviour of has_entity_changed.	2018-07-13 15:26:10 +01:00
Richard van der Hoff	fa5c2bc082	Reduce set building in get_entities_changed This line shows up as about 5% of cpu time on a synchrotron: not_known_entities = set(entities) - set(self._entity_to_key) Presumably the problem here is that _entity_to_key can be largeish, and building a set for its keys every time this function is called is slow. Here we rewrite the logic to avoid building so many sets.	2018-07-12 11:37:44 +01:00
Richard van der Hoff	c3c29aa196	Attempt to include db threads in cpu usage stats (#3496 ) Let's try to include time spent in the DB threads in the per-request/block cpu usage metrics.	2018-07-10 16:12:36 +01:00
Richard van der Hoff	55370331da	Refactor logcontext resource usage tracking (#3501 ) Factor out the resource usage tracking out to a separate object, which can be passed around and copied independently of the logcontext itself.	2018-07-10 13:56:07 +01:00
Amber Brown	49af402019	run isort	2018-07-09 16:09:20 +10:00
Amber Brown	6350bf925e	Attempt to be more performant on PyPy (#3462 )	2018-06-28 14:49:57 +01:00
Amber Brown	72d2143ea8	Revert "Revert "Try to not use as much CPU in the StreamChangeCache"" (#3454 )	2018-06-28 11:04:18 +01:00
Matthew Hodgson	8057489b26	Revert "Try to not use as much CPU in the StreamChangeCache"	2018-06-26 18:09:01 +01:00
Amber Brown	1202508067	fixes	2018-06-26 17:29:01 +01:00
Amber Brown	bd3d329c88	fixes	2018-06-26 17:28:12 +01:00
Amber Brown	abfe4b2957	try and make loading items from the cache faster	2018-06-26 17:25:34 +01:00
Amber Brown	07cad26d65	Remove all global reactor imports & pass it around explicitly (#3424 )	2018-06-25 14:08:28 +01:00
Richard van der Hoff	43e02c409d	Disable partial state group caching for wildcard lookups When _get_state_for_groups is given a wildcard filter, just do a complete lookup. Hopefully this will give us the best of both worlds by not filling up the ram if we only need one or two keys, but also making the cache still work for the federation reader usecase.	2018-06-22 11:52:07 +01:00
Richard van der Hoff	70e6501913	Merge pull request #3419 from matrix-org/rav/events_per_request Log number of events fetched from DB	2018-06-22 11:17:56 +01:00
Richard van der Hoff	0495fe0035	Indirect evt_count updates via method call so that we can stub it for the sentinel and not have a billion failing UTs	2018-06-22 10:42:28 +01:00
Amber Brown	77ac14b960	Pass around the reactor explicitly (#3385 )	2018-06-22 09:37:10 +01:00
Richard van der Hoff	b088aafcae	Log number of events fetched from DB When we finish processing a request, log the number of events we fetched from the database to handle it. [I'm trying to figure out which requests are responsible for large amounts of event cache churn. It may turn out to be more helpful to add counts to the prometheus per-request/block metrics, but that is an extension to this code anyway.]	2018-06-21 06:15:03 +01:00
Amber Brown	a61738b316	Remove run_on_reactor (#3395 )	2018-06-14 18:27:37 +10:00
Amber Brown	f7869f8f8b	Port to sortedcontainers (with tests!) (#3332 )	2018-06-06 00:13:57 +10:00
Erik Johnston	042eedfa2b	Add hacky cache factor override system	2018-06-04 15:39:28 +01:00
Amber Brown	c936a52a9e	Consistently use six's iteritems and wrap lazy keys/values in list() if they're not meant to be lazy (#3307 )	2018-05-31 19:03:47 +10:00
Amber Brown	debff7ae09	Merge pull request #3281 from NotAFile/py3-six-isinstance remaining isintance fixes	2018-05-30 12:44:46 +10:00
Adrian Tschira	7873cde526	pep8	2018-05-29 17:35:55 +02:00
Amber Brown	57ad76fa4a	fix up tests	2018-05-28 19:51:53 +10:00
Amber Brown	3ef5cd74a6	update to more consistently use seconds in any metrics or logging	2018-05-28 19:39:27 +10:00
Amber Brown	357c74a50f	add comment about why unreg	2018-05-28 19:14:41 +10:00
Amber Brown	754826a830	Merge remote-tracking branch 'origin/develop' into 3218-official-prom	2018-05-28 18:57:23 +10:00
Adrian Tschira	4ee4450d66	fix recursion error	2018-05-24 21:44:10 +02:00
Adrian Tschira	dd068ca979	remaining isintance fixes Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-05-24 20:55:08 +02:00
Amber Brown	36501068d8	Merge pull request #3247 from NotAFile/py3-misc Misc Python3 fixes	2018-05-24 12:58:37 -05:00
Amber Brown	2aff6eab6d	Merge pull request #3245 from NotAFile/batch-iter Add batch_iter to utils	2018-05-24 12:54:12 -05:00
Amber Brown	53cc2cde1f	cleanup	2018-05-22 17:32:57 -05:00
Amber Brown	071206304d	cleanup pep8 errors	2018-05-22 16:54:22 -05:00
Amber Brown	85ba83eb51	fixes	2018-05-22 16:28:23 -05:00
Amber Brown	a8990fa2ec	Merge remote-tracking branch 'origin/develop' into 3218-official-prom	2018-05-22 10:50:26 -05:00
Erik Johnston	7948ecf234	Comment	2018-05-22 11:39:43 +01:00
Erik Johnston	020377a550	Fix logcontext resource usage tracking	2018-05-22 11:16:07 +01:00
Amber Brown	df9f72d9e5	replacing portions	2018-05-21 19:47:37 -05:00
Adrian Tschira	45b55e23d3	Add batch_iter to utils There's a frequent idiom I noticed where an iterable is split up into a number of chunks/batches. Unfortunately that method does not work with iterators like dict.keys() in python3. This implementation works with iterators. Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-05-19 17:48:30 +02:00
Adrian Tschira	73cbdef5f7	fix py3 intern and remove unnecessary py3 encode Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-05-19 17:35:31 +02:00
Richard van der Hoff	093d8c415a	Merge remote-tracking branch 'origin/develop' into rav/warn_on_logcontext_fail	2018-05-03 14:59:29 +01:00
Richard van der Hoff	a7fe62f0cb	Fix logcontext leaks in rate limiter	2018-05-03 12:31:59 +01:00
Richard van der Hoff	415c6b672e	Merge branch 'develop' into rav/more_logcontext_leaks	2018-05-02 16:16:01 +01:00
Richard van der Hoff	f22e7cda2c	Fix a class of logcontext leaks So, it turns out that if you have a first `Deferred` `D1`, you can add a callback which returns another `Deferred` `D2`, and `D2` must then complete before any further callbacks on `D1` will execute (and later callbacks on `D1` get the result of `D2` rather than `D2` itself). So, `D1` might have `called=True` (as in, it has started running its callbacks), but any new callbacks added to `D1` won't get run until `D2` completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield` will 'block'. In conclusion: some of our assumptions in `logcontext` were invalid. We need to make sure that we don't optimise out the logcontext juggling when this situation happens. Fortunately, it is easy to detect by checking `D1.paused`.	2018-05-02 11:58:00 +01:00
Richard van der Hoff	e482f8cd85	Fix incorrect reference to StringIO This was introduced in `4f2f5171`	2018-05-02 09:12:26 +01:00
Richard van der Hoff	fdb6849b81	Merge pull request #3144 from matrix-org/rav/run_in_background_exception_handling Trap exceptions thrown within run_in_background	2018-04-30 10:23:02 +01:00
Richard van der Hoff	db75c86e84	Merge branch 'develop' into py3-xrange-1	2018-04-30 01:02:25 +01:00
Richard van der Hoff	049b0b5af2	Merge pull request #3154 from NotAFile/py3-stringio Replace stringIO imports with six	2018-04-30 00:59:04 +01:00
Richard van der Hoff	dbf6f28d64	Merge pull request #3155 from NotAFile/py3-bytes-1 more bytes strings	2018-04-30 00:38:21 +01:00
Richard van der Hoff	aab2e4da60	Merge pull request #3140 from matrix-org/rav/use_run_in_background Use run_in_background in preference to preserve_fn	2018-04-30 00:34:28 +01:00
Adrian Tschira	e9143b6593	more bytes strings Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-04-29 00:13:57 +02:00
Adrian Tschira	d82b6ea9e6	Move more xrange to six plus a bonus next() Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-04-28 13:57:00 +02:00
Adrian Tschira	4f2f5171b7	replace stringIO imports	2018-04-28 13:46:23 +02:00
Richard van der Hoff	fc149b4eeb	Merge remote-tracking branch 'origin/develop' into rav/use_run_in_background	2018-04-27 14:31:23 +01:00
Richard van der Hoff	6146332387	Merge remote-tracking branch 'origin/develop' into rav/deferred_timeout	2018-04-27 14:18:00 +01:00
Richard van der Hoff	2a13af23bc	Use run_in_background in preference to preserve_fn While I was going through uses of preserve_fn for other PRs, I converted places which only use the wrapped function once to use run_in_background, to avoid creating the function object.	2018-04-27 12:55:51 +01:00
Richard van der Hoff	9d2c1b8429	Backport deferred.addTimeout Twisted 16.0 doesn't have addTimeout, so let's backport it.	2018-04-27 12:52:30 +01:00
Richard van der Hoff	13843f771e	Trap exceptions thrown within run_in_background Turn any exceptions that get thrown synchronously within run_in_background into Failures instead.	2018-04-27 12:17:13 +01:00
Richard van der Hoff	9255a6cb17	Improve exception handling for background processes There were a bunch of places where we fire off a process to happen in the background, but don't have any exception handling on it - instead relying on the unhandled error being logged when the relevent deferred gets garbage-collected. This is unsatisfactory for a number of reasons: - logging on garbage collection is best-effort and may happen some time after the error, if at all - it can be hard to figure out where the error actually happened. - it is logged as a scary CRITICAL error which (a) I always forget to grep for and (b) it's not really CRITICAL if a background process we don't care about fails. So this is an attempt to add exception handling to everything we fire off into the background.	2018-04-27 11:07:40 +01:00
Richard van der Hoff	1ea904b9f0	Use deferred.addTimeout instead of time_bound_deferred This doesn't feel like a wheel we need to reinvent.	2018-04-23 00:53:18 +01:00
Richard van der Hoff	8dc4a6144b	Merge pull request #3107 from NotAFile/py3-bool-nonzero add __bool__ alias to __nonzero__ methods	2018-04-20 15:43:39 +01:00
Richard van der Hoff	c09a6daf09	Merge pull request #3110 from NotAFile/py3-six-queue Replace Queue with six.moves.queue	2018-04-20 15:35:00 +01:00
Richard van der Hoff	11a67b7c9d	Merge pull request #3093 from matrix-org/rav/response_cache_wrap Refactor ResponseCache usage	2018-04-20 11:31:17 +01:00
Adrian Tschira	878995e660	Replace Queue with six.moves.queue and a six.range change which I missed the last time Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-04-16 00:46:21 +02:00
Adrian Tschira	f63ff73c7f	add __bool__ alias to __nonzero__ methods Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-04-15 20:40:47 +02:00
Richard van der Hoff	d3347ad485	Revert "Use sortedcontainers instead of blist" This reverts commit `9fbe70a7dc`. It turns out that sortedcontainers.SortedDict is not an exact match for blist.sorteddict; in particular, `popitem()` removes things from the opposite end of the dict. This is trivial to fix, but I want to add some unit tests, and potentially some more thought about it, before we do so.	2018-04-13 11:16:43 +01:00
Richard van der Hoff	60f6014bb7	ResponseCache: fix handling of completed results Turns out that ObservableDeferred.observe doesn't return a deferred if the result is already completed. Fix handling and improve documentation.	2018-04-13 07:32:29 +01:00
Richard van der Hoff	b78395b7fe	Refactor ResponseCache usage Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a (get, set) pair, and then use it throughout the codebase. This will be largely non-functional, but does include the following functional changes: * federation_server.on_context_state_request: drops use of _server_linearizer which looked redundant and could cause incorrect cache misses by yielding between the get and the set. * RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks * the wrap function includes some logging. I'm hoping this won't be too noisy on production.	2018-04-12 13:02:15 +01:00
Richard van der Hoff	d5c74b9f6c	Merge pull request #3092 from matrix-org/rav/response_cache_metrics Add metrics for ResponseCache	2018-04-12 12:59:36 +01:00
Richard van der Hoff	261124396e	Merge pull request #3059 from matrix-org/rav/doc_response_cache Document the behaviour of ResponseCache	2018-04-12 11:22:30 +01:00
Richard van der Hoff	b3384232a0	Add metrics for ResponseCache	2018-04-10 23:14:47 +01:00

1 2 3 4 5 ...

519 Commits (43d175d17a96663081521b2215d1e26473135d1e)