Commit Graph

194 Commits (258b5285b6b486526dffef9431c2ab063913f42b)

Author SHA1 Message Date
David Robertson 2bb2c32e8e
Avoid incrementing bg process utime/stime counters by negative durations (#14323) 2022-10-31 13:02:07 +00:00
David Robertson 1fa2e58772
Catch BrokenPipeError from metrics server, and log as a warning (#14072) 2022-10-07 13:35:44 +01:00
reivilibre cf11919ddd
Fix cache metrics not being updated when not using the legacy exposition module. (#13717) 2022-09-08 15:30:48 +01:00
reivilibre 526f84bc2e
Fix Prometheus recording rules to not use legacy metric names. (#13718) 2022-09-08 15:01:42 +01:00
reivilibre b455c2a5ec
Update Grafana dashboard to not use legacy metric names. (#13714) 2022-09-06 12:21:21 +01:00
Brendan Abolivier 898fef2789
Share some metrics between the Prometheus exporter and the phone home stats (#13671) 2022-09-05 10:26:43 +00:00
reivilibre be4250c7a8
Add experimental configuration option to allow disabling legacy Prometheus metric names. (#13540)
Co-authored-by: David Robertson <davidr@element.io>
2022-08-24 11:35:54 +00:00
Patrick Cloke 50122754c8
Add missing types to opentracing. (#13345)
After this change `synapse.logging` is fully typed.
2022-07-21 12:01:52 +00:00
David Robertson f30bcbd84a
Fix Synapse git info missing in version strings (#12973) 2022-06-07 15:24:11 +01:00
David Robertson 3503f42741
Easy type hints in synapse.logging.opentracing (#12894) 2022-05-27 11:17:33 +01:00
Shay cde8af9a49
Add config flags to allow for cache auto-tuning (#12701) 2022-05-13 12:32:39 -07:00
David Robertson 2607b3e181
Update mypy to 0.950 and fix complaints (#12650) 2022-05-06 12:35:20 +00:00
Richard van der Hoff ae01a7edd3
Update type annotations for compatiblity with prometheus_client 0.14 (#12389)
Principally, `prometheus_client.REGISTRY.register` now requires its argument to
extend `prometheus_client.Collector`.

Additionally, `Gauge.set` is now annotated so that passing `Optional[int]`
causes an error.
2022-04-06 12:59:04 +00:00
David Robertson 4ae956c8bb
Use version string helper from matrix-common (#11979)
* Require latest matrix-common
* Use the common function
2022-02-14 13:12:22 +00:00
reivilibre 41818cda1f
Fix type errors introduced by new annotations in the Prometheus Client library. (#11832)
Co-authored-by: David Robertson <davidr@element.io>
2022-02-02 16:51:00 +00:00
Patrick Cloke c072c0b829
Fix mypy for platforms without epoll support. (#11771) 2022-01-19 16:50:09 +00:00
Richard van der Hoff 6a78ede569
Improve `reactor_tick_time` metric (#11724)
The existing implementation of the `python_twisted_reactor_tick_time` metric is pretty useless, because it *only* 
measures the time taken to execute timed calls and callbacks from threads. That neglects everything that 
happens off the back of I/O, which is obviously quite a lot for us.

To improve this, I've hooked into a different place in the reactor - in particular, where it calls `epoll`. That call is 
the only place it should wait for something to happen - the rest of the loop *should* be quick.

I've also removed `python_twisted_reactor_pending_calls`, because I don't believe anyone ever looks at it, and
it's a nuisance to populate.
2022-01-17 12:14:40 +00:00
Richard van der Hoff 20c6d85c6e
Simplify GC prometheus metrics (#11723)
Rather than hooking into the reactor loop, just add a timed task that runs every 100 ms to do the garbage collection.

Part 1 of a quest to simplify the reactor monkey-patching.
2022-01-13 14:35:52 +00:00
Patrick Cloke 10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Sean Quah 84fac0f814
Add type annotations to `synapse.metrics` (#10847) 2021-11-17 19:07:02 +00:00
Erik Johnston 82d2168a15
Add metrics to the threadpools (#11178) 2021-11-01 11:21:36 +00:00
David Robertson 1bfd141205
Type hints for the remaining two files in `synapse.http`. (#11164)
* Teach MyPy that the sentinel context is False

This means that if `ctx: LoggingContextOrSentinel`
then `bool(ctx)` narrows us to `ctx:LoggingContext`, which is a really
neat find!

* Annotate RequestMetrics

- Raise errors for sentry if we use the sentinel context
- Ensure we don't raise an error and carry on, but not recording stats
- Include stack trace in the error case to lower Sean's blood pressure

* Make mypy pass for synapse.http.request_metrics

* Make synapse.http.connectproxyclient pass mypy

Co-authored-by: reivilibre <oliverw@matrix.org>
2021-10-28 14:14:42 +01:00
David Robertson 797ee7812d
Relax `ignore-missing-imports` for modules that have stubs now and update mypy (#11006)
Updating mypy past version 0.9 means that third-party stubs are no-longer distributed with typeshed. See http://mypy-lang.blogspot.com/2021/06/mypy-0900-released.html for details.
We therefore pull in stub packages in setup.py

Additionally, some modules that we were previously ignoring import failures for now have stubs. So let's use them.

The rest of this change consists of fixups to make the newer mypy + stubs pass CI.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-10-08 14:49:41 +01:00
Jonathan de Jong 95e47b2e78
[pyupgrade] `synapse/` (#10348)
This PR is tantamount to running 
```
pyupgrade --py36-plus --keep-percent-format `find synapse/ -type f -name "*.py"`
```

Part of #9744
2021-07-19 15:28:05 +01:00
Jonathan de Jong bf72d10dbf
Use inline type hints in various other places (in `synapse/`) (#10380) 2021-07-15 11:02:43 +01:00
Richard van der Hoff b2557cbf42
opentracing: use a consistent name for background processes (#10135)
... otherwise we tend to get a namespace clash between the bg process and the
functions that it calls.
2021-06-07 17:57:49 +01:00
Richard van der Hoff ed53bf314f
Set opentracing priority before setting other tags (#10092)
... because tags on spans which aren't being sampled get thrown away.
2021-05-28 16:14:08 +01:00
Erik Johnston 8771b1337d
Export jemalloc stats to prometheus when used (#9882) 2021-05-06 15:54:07 +01:00
Erik Johnston 1fb9a2d0bf
Limit how often GC happens by time. (#9902)
Synapse can be quite memory intensive, and unless care is taken to tune
the GC thresholds it can end up thrashing, causing noticable performance
problems for large servers. We fix this by limiting how often we GC a
given generation, regardless of current counts/thresholds.

This does not help with the reverse problem where the thresholds are set
too high, but that should only happen in situations where they've been
manually configured.

Adds a `gc_min_seconds_between` config option to override the defaults.

Fixes #9890.
2021-05-05 16:53:45 +01:00
Andrew Morgan 4b2217ace2 Merge branch 'master' into develop 2021-04-21 14:55:06 +01:00
Richard van der Hoff 5d281c10dd
Stop BackgroundProcessLoggingContext making new prometheus timeseries (#9854)
This undoes part of b076bc276e.
2021-04-21 10:03:31 +01:00
Andrew Morgan 6982db9651 Merge branch 'master' into develop 2021-04-20 14:55:16 +01:00
Patrick Cloke b076bc276e
Always use the name as the log ID. (#9829)
As far as I can tell our logging contexts are meant to log the request ID, or sometimes the request ID followed by a suffix (this is generally stored in the name field of LoggingContext). There's also code to log the name@memory location, but I'm not sure this is ever used.

This simplifies the code paths to require every logging context to have a name and use that in logging. For sub-contexts (created via nested_logging_contexts, defer_to_threadpool, Measure) we use the current context's str (which becomes their name or the string "sentinel") and then potentially modify that (e.g. add a suffix).
2021-04-20 14:19:00 +01:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Patrick Cloke 48d44ab142
Record more information into structured logs. (#9654)
Records additional request information into the structured logs,
e.g. the requester, IP address, etc.
2021-04-08 08:01:14 -04:00
Andrew Morgan 0d87c6bd12
Don't report anything from GaugeBucketCollector metrics until data is present (#8926)
This PR modifies `GaugeBucketCollector` to only report data once it has been updated, rather than initially reporting a value of 0. Fixes zero values being reported for some metrics on startup until a background job to update the metric's value runs later.
2021-04-06 16:32:04 +01:00
Patrick Cloke 33a02f0f52
Fix additional type hints from Twisted upgrade. (#9518) 2021-03-03 15:47:38 -05:00
Eric Eastwood 0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Patrick Cloke 1619802228
Various clean-ups to the logging context code (#8935) 2020-12-14 14:19:47 -05:00
David Teller f14428b25c
Allow spam-checker modules to be provide async methods. (#8890)
Spam checker modules can now provide async methods. This is implemented
in a backwards-compatible manner.
2020-12-11 14:05:15 -05:00
Erik Johnston 427ede619f
Add metrics for tracking 3PID /requestToken requests. (#8712)
The main use case is to see how many requests are being made, and how
many are second/third/etc attempts. If there are large number of retries
then that likely indicates a delivery problem.
2020-11-13 12:03:51 +00:00
Erik Johnston 2b7c180879
Start fewer opentracing spans (#8640)
#8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously.

This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary.

I don't really know how much this will help.
2020-10-26 09:30:19 +00:00
Patrick Cloke 34a5696f93
Fix typos and spelling errors. (#8639) 2020-10-23 12:38:40 -04:00
Erik Johnston 1fcdbeb3ab
Start an opentracing span for background processes. (#8567)
This should reduce the number of `There was no active span` errors we
see.

Fixes #8510.
2020-10-19 12:26:26 +01:00
Richard van der Hoff 6d2d42f8fb Rewrite BucketCollector
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.

I've replaced it with something that is hopefully a bit easier to use.

(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
2020-09-30 16:49:15 +01:00
Richard van der Hoff 1c8ca2c543 Fix _exposition.py to stop stripping samples
Our hacked-up `_exposition.py` was stripping out some samples it shouldn't
have been. Put them back in, to more closely match the upstream
`exposition.py`.
2020-09-30 16:45:43 +01:00
Richard van der Hoff ceafb5a1c6
Drop support for ancient prometheus_client (#8426)
Drop compatibility hacks for prometheus-client pre 0.4.0. Debian stretch and
Fedora 31 both have newer versions, so hopefully this will be ok.
2020-09-30 16:42:05 +01:00
Patrick Cloke aec294ee0d
Use slots in attrs classes where possible (#8296)
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
2020-09-14 12:50:06 -04:00
Patrick Cloke c619253db8
Stop sub-classing object (#8249) 2020-09-04 06:54:56 -04:00
Patrick Cloke d89692ea84
Convert runWithConnection to async. (#8121) 2020-08-19 07:09:24 -04:00