Commit Graph

62 Commits (0f5e51f726756318f355d988856730a9930e2d2f)

Author SHA1 Message Date
Richard van der Hoff 5c445114d3
Correctly account for cpu usage by background threads (#4074)
Wrap calls to deferToThread() in a thing which uses a child logcontext to
attribute CPU usage to the right request.

While we're in the area, remove the logcontext_tracer stuff, which is never
used, and afaik doesn't work.

Fixes #4064
2018-10-23 13:12:32 +01:00
Richard van der Hoff 4a15a3e4d5
Include eventid in log lines when processing incoming federation transactions (#3959)
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.

This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
2018-09-27 11:25:34 +01:00
Richard van der Hoff be6527325a Fix exceptions when a connection is closed before we read the headers
This fixes bugs introduced in #3700, by making sure that we behave sanely
when an incoming connection is closed before the headers are read.
2018-08-20 18:21:10 +01:00
Richard van der Hoff 55e6bdf287 Robustness fix for logcontext filter
Make the logcontext filter not explode if it somehow ends up with a logcontext
of None, since that infinite-loops the whole logging system.
2018-08-20 18:20:07 +01:00
Amber Brown b37c472419
Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678) 2018-08-10 23:50:21 +10:00
Amber Brown 95ccb6e2ec
Don't spew errors because we can't save metrics (#3563) 2018-07-19 20:58:18 +10:00
Richard van der Hoff c3c29aa196
Attempt to include db threads in cpu usage stats (#3496)
Let's try to include time spent in the DB threads in the per-request/block cpu
usage metrics.
2018-07-10 16:12:36 +01:00
Richard van der Hoff 55370331da
Refactor logcontext resource usage tracking (#3501)
Factor out the resource usage tracking out to a separate object, which can be
passed around and copied independently of the logcontext itself.
2018-07-10 13:56:07 +01:00
Amber Brown 49af402019 run isort 2018-07-09 16:09:20 +10:00
Richard van der Hoff 0495fe0035 Indirect evt_count updates via method call
so that we can stub it for the sentinel and not have a billion failing UTs
2018-06-22 10:42:28 +01:00
Richard van der Hoff b088aafcae Log number of events fetched from DB
When we finish processing a request, log the number of events we fetched from
the database to handle it.

[I'm trying to figure out which requests are responsible for large amounts of
event cache churn. It may turn out to be more helpful to add counts to the
prometheus per-request/block metrics, but that is an extension to this code
anyway.]
2018-06-21 06:15:03 +01:00
Amber Brown 3ef5cd74a6 update to more consistently use seconds in any metrics or logging 2018-05-28 19:39:27 +10:00
Erik Johnston 7948ecf234 Comment 2018-05-22 11:39:43 +01:00
Erik Johnston 020377a550 Fix logcontext resource usage tracking 2018-05-22 11:16:07 +01:00
Richard van der Hoff 093d8c415a Merge remote-tracking branch 'origin/develop' into rav/warn_on_logcontext_fail 2018-05-03 14:59:29 +01:00
Richard van der Hoff f22e7cda2c Fix a class of logcontext leaks
So, it turns out that if you have a first `Deferred` `D1`, you can add a
callback which returns another `Deferred` `D2`, and `D2` must then complete
before any further callbacks on `D1` will execute (and later callbacks on `D1`
get the *result* of `D2` rather than `D2` itself).

So, `D1` might have `called=True` (as in, it has started running its
callbacks), but any new callbacks added to `D1` won't get run until `D2`
completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield`
will 'block'.

In conclusion: some of our assumptions in `logcontext` were invalid. We need to
make sure that we don't optimise out the logcontext juggling when this
situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
2018-05-02 11:58:00 +01:00
Richard van der Hoff fdb6849b81
Merge pull request #3144 from matrix-org/rav/run_in_background_exception_handling
Trap exceptions thrown within run_in_background
2018-04-30 10:23:02 +01:00
Richard van der Hoff fc149b4eeb Merge remote-tracking branch 'origin/develop' into rav/use_run_in_background 2018-04-27 14:31:23 +01:00
Richard van der Hoff 2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Richard van der Hoff 13843f771e Trap exceptions thrown within run_in_background
Turn any exceptions that get thrown synchronously within run_in_background into
Failures instead.
2018-04-27 12:17:13 +01:00
Richard van der Hoff 9255a6cb17 Improve exception handling for background processes
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.

This is unsatisfactory for a number of reasons:
 - logging on garbage collection is best-effort and may happen some time after
   the error, if at all
 - it can be hard to figure out where the error actually happened.
 - it is logged as a scary CRITICAL error which (a) I always forget to grep for
   and (b) it's not really CRITICAL if a background process we don't care about
   fails.

So this is an attempt to add exception handling to everything we fire off into
the background.
2018-04-27 11:07:40 +01:00
Adrian Tschira f63ff73c7f add __bool__ alias to __nonzero__ methods
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-15 20:40:47 +02:00
Richard van der Hoff 5a6e54264d Make 'unexpected logging context' into warnings
I think we've now fixed enough of these that the rest can be logged at
warning.
2018-03-15 18:40:38 +00:00
Richard van der Hoff 20f40348d4 Factor run_in_background out from preserve_fn
It annoys me that we create temporary function objects when there's really no
need for it. Let's factor the gubbins out of preserve_fn and start using it.
2018-03-08 11:50:11 +00:00
Richard van der Hoff 3a75de923b Rewrite make_deferred_yieldable avoiding inlineCallbacks
... because (a) it's actually simpler (b) it might be marginally more
performant?
2018-03-01 12:40:05 +00:00
Richard van der Hoff 3d12d97415 Track DB scheduling delay per-request
For each request, track the amount of time spent waiting for a db
connection. This entails adding it to the LoggingContext and we may as well add
metrics for it while we are passing.
2018-01-16 17:23:32 +00:00
Richard van der Hoff 6324b65f08 Track db txn time in millisecs
... to reduce the amount of floating-point foo we do.
2018-01-16 15:53:18 +00:00
Richard van der Hoff 44a498418c Optimise LoggingContext creation and copying
It turns out that the only thing we use the __dict__ of LoggingContext for is
`request`, and given we create lots of LoggingContexts and then copy them every
time we do a db transaction or log line, using the __dict__ seems a bit
redundant. Let's try to optimise things by making the request attribute
explicit.
2018-01-16 15:49:42 +00:00
Richard van der Hoff b2cd6accf5 Remove __PreservingContextDeferred too 2017-11-14 23:00:10 +00:00
Richard van der Hoff 7e6fa29cb5 Remove preserve_context_over_{fn, deferred}
Both of these functions ae known to leak logcontexts. Replace the remaining
calls to them and kill them off.
2017-11-14 11:22:42 +00:00
Richard van der Hoff eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff e2eebf1696 Fix fixme in preserve_fn
`preserve_fn` is no longer used as a decorator anywhere, so we can safely fix a
fixme therein.
2017-04-03 15:38:02 +01:00
Richard van der Hoff f9b4bb05e0 Fix the logcontext handling in the cache wrappers (#2077)
The cache wrappers had a habit of leaking the logcontext into the reactor while
the lookup function was running, and then not restoring it correctly when the
lookup function had completed. It's all the fault of
`preserve_context_over_{fn,deferred}` which are basically a bit broken.
2017-03-30 13:22:24 +01:00
Richard van der Hoff bd08ee7a46 Merge pull request #2026 from matrix-org/rav/logcontext_docs
Logcontext docs
2017-03-20 12:05:21 +00:00
Richard van der Hoff f40c2db05a Stop preserve_fn leaking context into the reactor
Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the
reactor, and add a test for it.

Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do
the same thing but had its own, different, set of bugs.
2017-03-18 00:07:43 +00:00
Richard van der Hoff d2d146a314 Logcontext docs 2017-03-17 23:59:28 +00:00
Richard van der Hoff b5d1c68beb Implement reset_context_after_deferred
to correctly reset the context when we fire off a deferred we aren't going to
wait for.
2017-03-15 02:21:07 +00:00
Erik Johnston 9219139351 Preserve some logcontexts 2016-08-24 11:58:40 +01:00
Erik Johnston ca8abfbf30 Clean up TransactionQueue 2016-08-10 16:24:16 +01:00
Erik Johnston 4eb8f9ca8a Remove comment 2016-02-10 11:29:21 +00:00
Erik Johnston 00c9ad49df s/parent_context/previous_context/ 2016-02-10 11:25:19 +00:00
Erik Johnston 9777c5f49a Set parent context on instansiation 2016-02-10 11:23:32 +00:00
Erik Johnston 3e2fcd67b2 Don't bother copying records on parent context 2016-02-09 10:50:31 +00:00
Erik Johnston c486b7b41c Change logcontext warns to debug 2016-02-09 09:20:38 +00:00
Erik Johnston 2c1fbea531 Fix up logcontexts 2016-02-08 14:26:45 +00:00
Erik Johnston d4f72a5bfb Allowing tagging log contexts 2016-02-03 13:52:27 +00:00
Matthew Hodgson 6c28ac260c copyrights 2016-01-07 04:26:29 +00:00
Mark Haines 6a5ff5f223 Track the time spent in the database per request.
and track the number of transactions that request started.
2015-12-07 17:56:11 +00:00
Mark Haines 3dd1630848 Add a setter for the current log context.
Move the resource tracking inside that setter so that it is easier
to make sure that the resource tracking isn't double counting the
resource usage.
2015-12-07 10:51:18 +00:00
Mark Haines 5231737369 Add comments to explain why we are hardcoding RUSAGE_THREAD 2015-12-04 11:53:38 +00:00