Merge remote-tracking branch 'origin/develop' into matrix-org-hotfixes

2022-08-18 16:27:31 +01:00 · 2022-08-18 16:27:31 +01:00 · d20c92d2c2
parent e91a929049 b251cff819
commit d20c92d2c2
16 changed files with 570 additions and 419 deletions
--- a/README.rst
+++ b/README.rst
@ -2,152 +2,70 @@
 Synapse |support| |development| |documentation| |license| |pypi| |python|
 =========================================================================

+Synapse is an open-source `Matrix <https://matrix.org/>`_ homeserver written and
+maintained by the Matrix.org Foundation. We began rapid development began in 2014,
+reaching v1.0.0 in 2019. Development on Synapse and the Matrix protocol itself continues
+in earnest today.
+
+Briefly, Matrix is an open standard for communications on the internet, supporting
+federation, encryption and VoIP. Matrix.org has more to say about the `goals of the
+Matrix project <https://matrix.org/docs/guides/introduction>`_, and the `formal specification
+<https://spec.matrix.org/>`_ describes the technical details.
+
 .. contents::

-Introduction
-============
+Installing and configuration
+============================

-Matrix is an ambitious new ecosystem for open federated Instant Messaging and
-VoIP.  The basics you need to know to get up and running are:
-
- Everything in Matrix happens in a room.  Rooms are distributed and do not
-  exist on any single server.  Rooms can be located using convenience aliases
-  like ``#matrix:matrix.org`` or ``#test:localhost:8448``.
-
- Matrix user IDs look like ``@matthew:matrix.org`` (although in the future
-  you will normally refer to yourself and others using a third party identifier
-  (3PID): email address, phone number, etc rather than manipulating Matrix user IDs)
-
-The overall architecture is::
-
-      client <----> homeserver <=====================> homeserver <----> client
-             https://somewhere.org/_matrix      https://elsewhere.net/_matrix
-
-``#matrix:matrix.org`` is the official support room for Matrix, and can be
-accessed by any client from https://matrix.org/docs/projects/try-matrix-now.html or
-via IRC bridge at irc://irc.libera.chat/matrix.
-
-Synapse is currently in rapid development, but as of version 0.5 we believe it
-is sufficiently stable to be run as an internet-facing service for real usage!
-
-About Matrix
-============
-
-Matrix specifies a set of pragmatic RESTful HTTP JSON APIs as an open standard,
-which handle:
-
- Creating and managing fully distributed chat rooms with no
-  single points of control or failure
- Eventually-consistent cryptographically secure synchronisation of room
-  state across a global open network of federated servers and services
- Sending and receiving extensible messages in a room with (optional)
-  end-to-end encryption
- Inviting, joining, leaving, kicking, banning room members
- Managing user accounts (registration, login, logout)
- Using 3rd Party IDs (3PIDs) such as email addresses, phone numbers,
-  Facebook accounts to authenticate, identify and discover users on Matrix.
- Placing 1:1 VoIP and Video calls
-
-These APIs are intended to be implemented on a wide range of servers, services
-and clients, letting developers build messaging and VoIP functionality on top
-of the entirely open Matrix ecosystem rather than using closed or proprietary
-solutions. The hope is for Matrix to act as the building blocks for a new
-generation of fully open and interoperable messaging and VoIP apps for the
-internet.
-
-Synapse is a Matrix "homeserver" implementation developed by the matrix.org core
-team, written in Python 3/Twisted.
-
-In Matrix, every user runs one or more Matrix clients, which connect through to
-a Matrix homeserver. The homeserver stores all their personal chat history and
-user account information - much as a mail client connects through to an
-IMAP/SMTP server. Just like email, you can either run your own Matrix
-homeserver and control and own your own communications and history or use one
-hosted by someone else (e.g. matrix.org) - there is no single point of control
-or mandatory service provider in Matrix, unlike WhatsApp, Facebook, Hangouts,
-etc.
-
-We'd like to invite you to join #matrix:matrix.org (via
-https://matrix.org/docs/projects/try-matrix-now.html), run a homeserver, take a look
-at the `Matrix spec <https://matrix.org/docs/spec>`_, and experiment with the
-`APIs <https://matrix.org/docs/api>`_ and `Client SDKs
-<https://matrix.org/docs/projects/try-matrix-now.html#client-sdks>`_.
-
-Thanks for using Matrix!
-
-Support
-=======
-
-For support installing or managing Synapse, please join |room|_ (from a matrix.org
-account if necessary) and ask questions there. We do not use GitHub issues for
-support requests, only for bug reports and feature requests.
-
-Synapse's documentation is `nicely rendered on GitHub Pages <https://matrix-org.github.io/synapse>`_,
-with its source available in |docs|_.
-
-.. |room| replace:: ``#synapse:matrix.org``
-.. _room: https://matrix.to/#/#synapse:matrix.org
-
-.. |docs| replace:: ``docs``
-.. _docs: docs
-
-Synapse Installation
-====================
+The Synapse documentation describes `how to install Synapse <https://matrix-org.github.io/synapse/latest/setup/installation.html>`_. We recommend using
+`Docker images <https://matrix-org.github.io/synapse/latest/setup/installation.html#docker-images-and-ansible-playbooks>`_ or `Debian packages from Matrix.org
+<https://matrix-org.github.io/synapse/latest/setup/installation.html#matrixorg-packages>`_.

 .. _federation:

-* For details on how to install synapse, see
-  `Installation Instructions <https://matrix-org.github.io/synapse/latest/setup/installation.html>`_.
-* For specific details on how to configure Synapse for federation see `docs/federate.md <docs/federate.md>`_
+Synapse has a variety of `config options
+<https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html>`_
+which can be used to customise its behaviour after installation.
+There are additional details on how to `configure Synapse for federation here
+<https://matrix-org.github.io/synapse/latest/federate.html>`_.
+
+.. _reverse-proxy:
+
+Using a reverse proxy with Synapse
+----------------------------------
+
+It is recommended to put a reverse proxy such as
+`nginx <https://nginx.org/en/docs/http/ngx_http_proxy_module.html>`_,
+`Apache <https://httpd.apache.org/docs/current/mod/mod_proxy_http.html>`_,
+`Caddy <https://caddyserver.com/docs/quick-starts/reverse-proxy>`_,
+`HAProxy <https://www.haproxy.org/>`_ or
+`relayd <https://man.openbsd.org/relayd.8>`_ in front of Synapse. One advantage of
+doing so is that it means that you can expose the default https port (443) to
+Matrix clients without needing to run Synapse with root privileges.
+For information on configuring one, see `the reverse proxy docs
+<https://matrix-org.github.io/synapse/latest/reverse_proxy.html>`_.
+
+Upgrading an existing Synapse
+-----------------------------
+
+The instructions for upgrading Synapse are in `the upgrade notes`_.
+Please check these instructions as upgrading may require extra steps for some
+versions of Synapse.
+
+.. _the upgrade notes: https://matrix-org.github.io/synapse/develop/upgrade.html


-Connecting to Synapse from a client
-===================================
+Platform dependencies
+---------------------

-The easiest way to try out your new Synapse installation is by connecting to it
-from a web client.
+Synapse uses a number of platform dependencies such as Python and PostgreSQL,
+and aims to follow supported upstream versions. See the
+`deprecation policy <https://matrix-org.github.io/synapse/latest/deprecation_policy.html>`_
+for more details.

-Unless you are running a test instance of Synapse on your local machine, in
-general, you will need to enable TLS support before you can successfully
-connect from a client: see
-`TLS certificates <https://matrix-org.github.io/synapse/latest/setup/installation.html#tls-certificates>`_.
-
-An easy way to get started is to login or register via Element at
-https://app.element.io/#/login or https://app.element.io/#/register respectively.
-You will need to change the server you are logging into from ``matrix.org``
-and instead specify a Homeserver URL of ``https://<server_name>:8448``
-(or just ``https://<server_name>`` if you are using a reverse proxy).
-If you prefer to use another client, refer to our
-`client breakdown <https://matrix.org/docs/projects/clients-matrix>`_.
-
-If all goes well you should at least be able to log in, create a room, and
-start sending messages.
-
-.. _`client-user-reg`:
-
-Registering a new user from a client
------------------------------------
-
-By default, registration of new users via Matrix clients is disabled. To enable
-it, specify ``enable_registration: true`` in ``homeserver.yaml``. (It is then
-recommended to also set up CAPTCHA - see `<docs/CAPTCHA_SETUP.md>`_.)
-
-Once ``enable_registration`` is set to ``true``, it is possible to register a
-user via a Matrix client.
-
-Your new user name will be formed partly from the ``server_name``, and partly
-from a localpart you specify when you create the account. Your name will take
-the form of::
-
-    @localpart:my.domain.name
-
-(pronounced "at localpart on my dot domain dot name").
-
-As when logging in, you will need to specify a "Custom server".  Specify your
-desired ``localpart`` in the 'User name' box.

 Security note
-=============
+-------------

 Matrix serves raw, user-supplied data in some APIs -- specifically the `content
 repository endpoints`_.
@ -187,30 +105,76 @@ Following this advice ensures that even if an XSS is found in Synapse, the
 impact to other applications will be minimal.


-Upgrading an existing Synapse
-=============================
+Testing a new installation
+==========================

-The instructions for upgrading synapse are in `the upgrade notes`_.
-Please check these instructions as upgrading may require extra steps for some
-versions of synapse.
+The easiest way to try out your new Synapse installation is by connecting to it
+from a web client.

-.. _the upgrade notes: https://matrix-org.github.io/synapse/develop/upgrade.html
+Unless you are running a test instance of Synapse on your local machine, in
+general, you will need to enable TLS support before you can successfully
+connect from a client: see
+`TLS certificates <https://matrix-org.github.io/synapse/latest/setup/installation.html#tls-certificates>`_.

-.. _reverse-proxy:
+An easy way to get started is to login or register via Element at
+https://app.element.io/#/login or https://app.element.io/#/register respectively.
+You will need to change the server you are logging into from ``matrix.org``
+and instead specify a Homeserver URL of ``https://<server_name>:8448``
+(or just ``https://<server_name>`` if you are using a reverse proxy).
+If you prefer to use another client, refer to our
+`client breakdown <https://matrix.org/docs/projects/clients-matrix>`_.

-Using a reverse proxy with Synapse
-==================================
+If all goes well you should at least be able to log in, create a room, and
+start sending messages.

-It is recommended to put a reverse proxy such as
-`nginx <https://nginx.org/en/docs/http/ngx_http_proxy_module.html>`_,
-`Apache <https://httpd.apache.org/docs/current/mod/mod_proxy_http.html>`_,
-`Caddy <https://caddyserver.com/docs/quick-starts/reverse-proxy>`_,
-`HAProxy <https://www.haproxy.org/>`_ or
-`relayd <https://man.openbsd.org/relayd.8>`_ in front of Synapse. One advantage of
-doing so is that it means that you can expose the default https port (443) to
-Matrix clients without needing to run Synapse with root privileges.
+.. _`client-user-reg`:

-For information on configuring one, see `<docs/reverse_proxy.md>`_.
+Registering a new user from a client
+------------------------------------
+
+By default, registration of new users via Matrix clients is disabled. To enable
+it:
+
+1. In the
+   `registration config section <https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#registration>`_
+   set ``enable_registration: true`` in ``homeserver.yaml``.
+2. Then **either**:
+
+   a. set up a `CAPTCHA <https://matrix-org.github.io/synapse/latest/CAPTCHA_SETUP.html>`_, or
+   b. set ``enable_registration_without_verification: true`` in ``homeserver.yaml``.
+
+We **strongly** recommend using a CAPTCHA, particularly if your homeserver is exposed to
+the public internet. Without it, anyone can freely register accounts on your homeserver.
+This can be exploited by attackers to create spambots targetting the rest of the Matrix
+federation.
+
+Your new user name will be formed partly from the ``server_name``, and partly
+from a localpart you specify when you create the account. Your name will take
+the form of::
+
+    @localpart:my.domain.name
+
+(pronounced "at localpart on my dot domain dot name").
+
+As when logging in, you will need to specify a "Custom server".  Specify your
+desired ``localpart`` in the 'User name' box.
+
+Troubleshooting and support
+===========================
+
+The `Admin FAQ <https://matrix-org.github.io/synapse/latest/usage/administration/admin_faq.html>`_
+includes tips on dealing with some common problems. For more details, see
+`Synapse's wider documentation <https://matrix-org.github.io/synapse/latest/>`_.
+
+For additional support installing or managing Synapse, please ask in the community
+support room |room|_ (from a matrix.org account if necessary). We do not use GitHub
+issues for support requests, only for bug reports and feature requests.
+
+.. |room| replace:: ``#synapse:matrix.org``
+.. _room: https://matrix.to/#/#synapse:matrix.org
+
+.. |docs| replace:: ``docs``
+.. _docs: docs

 Identity Servers
 ================
@ -242,34 +206,15 @@ an email address with your account, or send an invite to another user via their
 email address.


-Password reset
-==============
-
-Users can reset their password through their client. Alternatively, a server admin
-can reset a users password using the `admin API <docs/admin_api/user_admin_api.md#reset-password>`_
-or by directly editing the database as shown below.
-
-First calculate the hash of the new password::
-
-    $ ~/synapse/env/bin/hash_password
-    Password:
-    Confirm password:
-    $2a$12$xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-
-Then update the ``users`` table in the database::
-
-    UPDATE users SET password_hash='$2a$12$xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
-        WHERE name='@test:test.com';
-
-
-Synapse Development
-===================
+Development
+===========

+We welcome contributions to Synapse from the community!
 The best place to get started is our
 `guide for contributors <https://matrix-org.github.io/synapse/latest/development/contributing_guide.html>`_.
 This is part of our larger `documentation <https://matrix-org.github.io/synapse/latest>`_, which includes
-information for synapse developers as well as synapse administrators.

+information for Synapse developers as well as Synapse administrators.
 Developers might be particularly interested in:

 * `Synapse's database schema <https://matrix-org.github.io/synapse/latest/development/database_schema.html>`_,
@ -280,187 +225,6 @@ Alongside all that, join our developer community on Matrix:
 `#synapse-dev:matrix.org <https://matrix.to/#/#synapse-dev:matrix.org>`_, featuring real humans!


-Quick start
-----------
-
-Before setting up a development environment for synapse, make sure you have the
-system dependencies (such as the python header files) installed - see
-`Platform-specific prerequisites <https://matrix-org.github.io/synapse/latest/setup/installation.html#platform-specific-prerequisites>`_.
-
-To check out a synapse for development, clone the git repo into a working
-directory of your choice::
-
-    git clone https://github.com/matrix-org/synapse.git
-    cd synapse
-
-Synapse has a number of external dependencies. We maintain a fixed development
-environment using `Poetry <https://python-poetry.org/>`_. First, install poetry. We recommend::
-
-    pip install --user pipx
-    pipx install poetry
-
-as described `here <https://python-poetry.org/docs/#installing-with-pipx>`_.
-(See `poetry's installation docs <https://python-poetry.org/docs/#installation>`_
-for other installation methods.) Then ask poetry to create a virtual environment
-from the project and install Synapse's dependencies::
-
-    poetry install --extras "all test"
-
-This will run a process of downloading and installing all the needed
-dependencies into a virtual env.
-
-We recommend using the demo which starts 3 federated instances running on ports `8080` - `8082`::
-
-    poetry run ./demo/start.sh
-
-(to stop, you can use ``poetry run ./demo/stop.sh``)
-
-See the `demo documentation <https://matrix-org.github.io/synapse/develop/development/demo.html>`_
-for more information.
-
-If you just want to start a single instance of the app and run it directly::
-
-    # Create the homeserver.yaml config once
-    poetry run synapse_homeserver \
-      --server-name my.domain.name \
-      --config-path homeserver.yaml \
-      --generate-config \
-      --report-stats=[yes|no]
-
-    # Start the app
-    poetry run synapse_homeserver --config-path homeserver.yaml
-
-
-Running the unit tests
----------------------
-
-After getting up and running, you may wish to run Synapse's unit tests to
-check that everything is installed correctly::
-
-    poetry run trial tests
-
-This should end with a 'PASSED' result (note that exact numbers will
-differ)::
-
-    Ran 1337 tests in 716.064s
-
-    PASSED (skips=15, successes=1322)
-
-For more tips on running the unit tests, like running a specific test or
-to see the logging output, see the `CONTRIBUTING doc <CONTRIBUTING.md#run-the-unit-tests>`_.
-
-
-Running the Integration Tests
-----------------------------
-
-Synapse is accompanied by `SyTest <https://github.com/matrix-org/sytest>`_,
-a Matrix homeserver integration testing suite, which uses HTTP requests to
-access the API as a Matrix client would. It is able to run Synapse directly from
-the source tree, so installation of the server is not required.
-
-Testing with SyTest is recommended for verifying that changes related to the
-Client-Server API are functioning correctly. See the `SyTest installation
-instructions <https://github.com/matrix-org/sytest#installing>`_ for details.
-
-
-Platform dependencies
-=====================
-
-Synapse uses a number of platform dependencies such as Python and PostgreSQL,
-and aims to follow supported upstream versions. See the
-`<docs/deprecation_policy.md>`_ document for more details.
-
-
-Troubleshooting
-===============
-
-Need help? Join our community support room on Matrix:
-`#synapse:matrix.org <https://matrix.to/#/#synapse:matrix.org>`_
-
-Running out of File Handles
---------------------------
-
-If synapse runs out of file handles, it typically fails badly - live-locking
-at 100% CPU, and/or failing to accept new TCP connections (blocking the
-connecting client).  Matrix currently can legitimately use a lot of file handles,
-thanks to busy rooms like #matrix:matrix.org containing hundreds of participating
-servers.  The first time a server talks in a room it will try to connect
-simultaneously to all participating servers, which could exhaust the available
-file descriptors between DNS queries & HTTPS sockets, especially if DNS is slow
-to respond. (We need to improve the routing algorithm used to be better than
-full mesh, but as of March 2019 this hasn't happened yet).
-
-If you hit this failure mode, we recommend increasing the maximum number of
-open file handles to be at least 4096 (assuming a default of 1024 or 256).
-This is typically done by editing ``/etc/security/limits.conf``
-
-Separately, Synapse may leak file handles if inbound HTTP requests get stuck
-during processing - e.g. blocked behind a lock or talking to a remote server etc.
-This is best diagnosed by matching up the 'Received request' and 'Processed request'
-log lines and looking for any 'Processed request' lines which take more than
-a few seconds to execute. Please let us know at #synapse:matrix.org if
-you see this failure mode so we can help debug it, however.
-
-Help!! Synapse is slow and eats all my RAM/CPU!
-----------------------------------------------
-
-First, ensure you are running the latest version of Synapse, using Python 3
-with a PostgreSQL database.
-
-Synapse's architecture is quite RAM hungry currently - we deliberately
-cache a lot of recent room data and metadata in RAM in order to speed up
-common requests. We'll improve this in the future, but for now the easiest
-way to either reduce the RAM usage (at the risk of slowing things down)
-is to set the almost-undocumented ``SYNAPSE_CACHE_FACTOR`` environment
-variable. The default is 0.5, which can be decreased to reduce RAM usage
-in memory constrained enviroments, or increased if performance starts to
-degrade.
-
-However, degraded performance due to a low cache factor, common on
-machines with slow disks, often leads to explosions in memory use due
-backlogged requests. In this case, reducing the cache factor will make
-things worse. Instead, try increasing it drastically. 2.0 is a good
-starting value.
-
-Using `libjemalloc <http://jemalloc.net/>`_ can also yield a significant
-improvement in overall memory use, and especially in terms of giving back
-RAM to the OS. To use it, the library must simply be put in the
-LD_PRELOAD environment variable when launching Synapse. On Debian, this
-can be done by installing the ``libjemalloc1`` package and adding this
-line to ``/etc/default/matrix-synapse``::
-
-    LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
-
-This can make a significant difference on Python 2.7 - it's unclear how
-much of an improvement it provides on Python 3.x.
-
-If you're encountering high CPU use by the Synapse process itself, you
-may be affected by a bug with presence tracking that leads to a
-massive excess of outgoing federation requests (see `discussion
-<https://github.com/matrix-org/synapse/issues/3971>`_). If metrics
-indicate that your server is also issuing far more outgoing federation
-requests than can be accounted for by your users' activity, this is a
-likely cause. The misbehavior can be worked around by setting
-the following in the Synapse config file:
-
-.. code-block:: yaml
-
-   presence:
-       enabled: false
-
-People can't accept room invitations from me
--------------------------------------------
-
-The typical failure mode here is that you send an invitation to someone
-to join a room or direct chat, but when they go to accept it, they get an
-error (typically along the lines of "Invalid signature"). They might see
-something like the following in their logs::
-
-    2019-09-11 19:32:04,271 - synapse.federation.transport.server - 288 - WARNING - GET-11752 - authenticate_request failed: 401: Invalid signature for server <server> with key ed25519:a_EqML: Unable to verify signature for <server>
-
-This is normally caused by a misconfiguration in your reverse-proxy. See
-`<docs/reverse_proxy.md>`_ and double-check that your settings are correct.
-
 .. |support| image:: https://img.shields.io/matrix/synapse:matrix.org?label=support&logo=matrix
  :alt: (get support on #synapse:matrix.org)
  :target: https://matrix.to/#/#synapse:matrix.org
--- a/changelog.d/13477.misc
+++ b/changelog.d/13477.misc
@ -0,0 +1 @@
+Faster room joins: Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state.
--- a/changelog.d/13491.doc
+++ b/changelog.d/13491.doc
@ -0,0 +1 @@
+Tidy up Synapse's README.
--- a/changelog.d/13525.bugfix
+++ b/changelog.d/13525.bugfix
@ -0,0 +1 @@
+Fix a bug in the `/event_reports` Admin API which meant that the total count could be larger than the number of results you can actually query for.
--- a/changelog.d/13534.misc
+++ b/changelog.d/13534.misc
@ -0,0 +1 @@
+Add metrics to track how the rate limiter is affecting requests (sleep/reject).
--- a/changelog.d/13537.bugfix
+++ b/changelog.d/13537.bugfix
@ -0,0 +1 @@
+Add support for compression to federation responses.
--- a/changelog.d/13541.misc
+++ b/changelog.d/13541.misc
@ -0,0 +1 @@
+Add metrics to track how the rate limiter is affecting requests (sleep/reject).
--- a/changelog.d/13554.misc
+++ b/changelog.d/13554.misc
@ -0,0 +1 @@
+Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger.
--- a/docs/usage/administration/admin_faq.md
+++ b/docs/usage/administration/admin_faq.md
@ -2,9 +2,9 @@

 How do I become a server admin?
 ---
-If your server already has an admin account you should use the user admin API to promote other accounts to become admins. See [User Admin API](../../admin_api/user_admin_api.md#Change-whether-a-user-is-a-server-administrator-or-not)
+If your server already has an admin account you should use the [User Admin API](../../admin_api/user_admin_api.md#Change-whether-a-user-is-a-server-administrator-or-not) to promote other accounts to become admins.

-If you don't have any admin accounts yet you won't be able to use the admin API so you'll have to edit the database manually. Manually editing the database is generally not recommended so once you have an admin account, use the admin APIs to make further changes.
+If you don't have any admin accounts yet you won't be able to use the admin API, so you'll have to edit the database manually. Manually editing the database is generally not recommended so once you have an admin account: use the admin APIs to make further changes.

 ```sql
 UPDATE users SET admin = 1 WHERE name = '@foo:bar.com';
@ -32,9 +32,11 @@ What users are registered on my server?
 SELECT NAME from users;
 ```

-Manually resetting passwords:
+Manually resetting passwords
 ---
-See https://github.com/matrix-org/synapse/blob/master/README.rst#password-reset
+Users can reset their password through their client. Alternatively, a server admin
+can reset a user's password using the [admin API](../../admin_api/user_admin_api.md#reset-password).
+

 I have a problem with my server. Can I just delete my database and start again?
 ---
@ -101,3 +103,83 @@ LIMIT 10;

 You can also use the [List Room API](../../admin_api/rooms.md#list-room-api)
 and `order_by` `state_events`.
+
+
+People can't accept room invitations from me
+---
+
+The typical failure mode here is that you send an invitation to someone
+to join a room or direct chat, but when they go to accept it, they get an
+error (typically along the lines of "Invalid signature"). They might see
+something like the following in their logs:
+
+    2019-09-11 19:32:04,271 - synapse.federation.transport.server - 288 - WARNING - GET-11752 - authenticate_request failed: 401: Invalid signature for server <server> with key ed25519:a_EqML: Unable to verify signature for <server>
+
+This is normally caused by a misconfiguration in your reverse-proxy. See [the reverse proxy docs](docs/reverse_proxy.md) and double-check that your settings are correct.
+
+
+Help!! Synapse is slow and eats all my RAM/CPU!
+-----------------------------------------------
+
+First, ensure you are running the latest version of Synapse, using Python 3
+with a [PostgreSQL database](../../postgres.md).
+
+Synapse's architecture is quite RAM hungry currently - we deliberately
+cache a lot of recent room data and metadata in RAM in order to speed up
+common requests. We'll improve this in the future, but for now the easiest
+way to either reduce the RAM usage (at the risk of slowing things down)
+is to set the almost-undocumented ``SYNAPSE_CACHE_FACTOR`` environment
+variable. The default is 0.5, which can be decreased to reduce RAM usage
+in memory constrained environments, or increased if performance starts to
+degrade.
+
+However, degraded performance due to a low cache factor, common on
+machines with slow disks, often leads to explosions in memory use due
+backlogged requests. In this case, reducing the cache factor will make
+things worse. Instead, try increasing it drastically. 2.0 is a good
+starting value.
+
+Using [libjemalloc](https://jemalloc.net) can also yield a significant
+improvement in overall memory use, and especially in terms of giving back
+RAM to the OS. To use it, the library must simply be put in the
+LD_PRELOAD environment variable when launching Synapse. On Debian, this
+can be done by installing the `libjemalloc1` package and adding this
+line to `/etc/default/matrix-synapse`:
+
+    LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
+
+This made a significant difference on Python 2.7 - it's unclear how
+much of an improvement it provides on Python 3.x.
+
+If you're encountering high CPU use by the Synapse process itself, you
+may be affected by a bug with presence tracking that leads to a
+massive excess of outgoing federation requests (see [discussion](https://github.com/matrix-org/synapse/issues/3971)). If metrics
+indicate that your server is also issuing far more outgoing federation
+requests than can be accounted for by your users' activity, this is a
+likely cause. The misbehavior can be worked around by disabling presence
+in the Synapse config file: [see here](../configuration/config_documentation.md#presence).
+
+
+Running out of File Handles
+---------------------------
+
+If Synapse runs out of file handles, it typically fails badly - live-locking
+at 100% CPU, and/or failing to accept new TCP connections (blocking the
+connecting client).  Matrix currently can legitimately use a lot of file handles,
+thanks to busy rooms like `#matrix:matrix.org` containing hundreds of participating
+servers.  The first time a server talks in a room it will try to connect
+simultaneously to all participating servers, which could exhaust the available
+file descriptors between DNS queries & HTTPS sockets, especially if DNS is slow
+to respond. (We need to improve the routing algorithm used to be better than
+full mesh, but as of March 2019 this hasn't happened yet).
+
+If you hit this failure mode, we recommend increasing the maximum number of
+open file handles to be at least 4096 (assuming a default of 1024 or 256).
+This is typically done by editing ``/etc/security/limits.conf``
+
+Separately, Synapse may leak file handles if inbound HTTP requests get stuck
+during processing - e.g. blocked behind a lock or talking to a remote server etc.
+This is best diagnosed by matching up the 'Received request' and 'Processed request'
+log lines and looking for any 'Processed request' lines which take more than
+a few seconds to execute. Please let us know at [`#synapse:matrix.org`](https://matrix.to/#/#synapse-dev:matrix.org) if
+you see this failure mode so we can help debug it, however.
--- a/docs/usage/configuration/config_documentation.md
+++ b/docs/usage/configuration/config_documentation.md
@ -444,7 +444,7 @@ Sub-options for each listener include:
   * `names`: a list of names of HTTP resources. See below for a list of valid resource names.

   * `compress`: set to true to enable gzip compression on HTTP bodies for this resource. This is currently only supported with the
-     `client`, `consent` and `metrics` resources.
+     `client`, `consent`, `metrics` and `federation` resources.

 * `additional_resources`: Only valid for an 'http' listener. A map of
   additional endpoints which should be loaded via dynamic modules.
--- a/synapse/app/homeserver.py
+++ b/synapse/app/homeserver.py
@ -220,7 +220,10 @@ class SynapseHomeServer(HomeServer):
            resources.update({"/_matrix/consent": consent_resource})

        if name == "federation":
-            resources.update({FEDERATION_PREFIX: TransportLayerServer(self)})
+            federation_resource: Resource = TransportLayerServer(self)
+            if compress:
+                federation_resource = gz_wrap(federation_resource)
+            resources.update({FEDERATION_PREFIX: federation_resource})

        if name == "openid":
            resources.update(
--- a/synapse/handlers/sync.py
+++ b/synapse/handlers/sync.py
@ -16,9 +16,11 @@ import logging
 from typing import (
    TYPE_CHECKING,
    Any,
+    Collection,
    Dict,
    FrozenSet,
    List,
+    Mapping,
    Optional,
    Sequence,
    Set,
@ -517,10 +519,17 @@ class SyncHandler:
                # ensure that we always include current state in the timeline
                current_state_ids: FrozenSet[str] = frozenset()
                if any(e.is_state() for e in recents):
+                    # FIXME(faster_joins): We use the partial state here as
+                    # we don't want to block `/sync` on finishing a lazy join.
+                    # Which should be fine once
+                    # https://github.com/matrix-org/synapse/issues/12989 is resolved,
+                    # since we shouldn't reach here anymore?
+                    # Note that we use the current state as a whitelist for filtering
+                    # `recents`, so partial state is only a problem when a membership
+                    # event turns up in `recents` but has not made it into the current
+                    # state.
                    current_state_ids_map = (
-                        await self._state_storage_controller.get_current_state_ids(
-                            room_id
-                        )
+                        await self.store.get_partial_current_state_ids(room_id)
                    )
                    current_state_ids = frozenset(current_state_ids_map.values())

@ -589,7 +598,13 @@ class SyncHandler:
                if any(e.is_state() for e in loaded_recents):
                    # FIXME(faster_joins): We use the partial state here as
                    # we don't want to block `/sync` on finishing a lazy join.
-                    # Is this the correct way of doing it?
+                    # Which should be fine once
+                    # https://github.com/matrix-org/synapse/issues/12989 is resolved,
+                    # since we shouldn't reach here anymore?
+                    # Note that we use the current state as a whitelist for filtering
+                    # `loaded_recents`, so partial state is only a problem when a
+                    # membership event turns up in `loaded_recents` but has not made it
+                    # into the current state.
                    current_state_ids_map = (
                        await self.store.get_partial_current_state_ids(room_id)
                    )
@ -637,7 +652,10 @@ class SyncHandler:
        )

    async def get_state_after_event(
-        self, event_id: str, state_filter: Optional[StateFilter] = None
+        self,
+        event_id: str,
+        state_filter: Optional[StateFilter] = None,
+        await_full_state: bool = True,
    ) -> StateMap[str]:
        """
        Get the room state after the given event
@ -645,9 +663,14 @@ class SyncHandler:
        Args:
            event_id: event of interest
            state_filter: The state filter used to fetch state from the database.
+            await_full_state: if `True`, will block if we do not yet have complete state
+                at the event and `state_filter` is not satisfied by partial state.
+                Defaults to `True`.
        """
        state_ids = await self._state_storage_controller.get_state_ids_for_event(
-            event_id, state_filter=state_filter or StateFilter.all()
+            event_id,
+            state_filter=state_filter or StateFilter.all(),
+            await_full_state=await_full_state,
        )

        # using get_metadata_for_events here (instead of get_event) sidesteps an issue
@ -670,6 +693,7 @@ class SyncHandler:
        room_id: str,
        stream_position: StreamToken,
        state_filter: Optional[StateFilter] = None,
+        await_full_state: bool = True,
    ) -> StateMap[str]:
        """Get the room state at a particular stream position

@ -677,6 +701,9 @@ class SyncHandler:
            room_id: room for which to get state
            stream_position: point at which to get state
            state_filter: The state filter used to fetch state from the database.
+            await_full_state: if `True`, will block if we do not yet have complete state
+                at the last event in the room before `stream_position` and
+                `state_filter` is not satisfied by partial state. Defaults to `True`.
        """
        # FIXME: This gets the state at the latest event before the stream ordering,
        # which might not be the same as the "current state" of the room at the time
@ -688,7 +715,9 @@ class SyncHandler:

        if last_event_id:
            state = await self.get_state_after_event(
-                last_event_id, state_filter=state_filter or StateFilter.all()
+                last_event_id,
+                state_filter=state_filter or StateFilter.all(),
+                await_full_state=await_full_state,
            )

        else:
@ -891,7 +920,15 @@ class SyncHandler:
        with Measure(self.clock, "compute_state_delta"):
            # The memberships needed for events in the timeline.
            # Only calculated when `lazy_load_members` is on.
-            members_to_fetch = None
+            members_to_fetch: Optional[Set[str]] = None
+
+            # A dictionary mapping user IDs to the first event in the timeline sent by
+            # them. Only calculated when `lazy_load_members` is on.
+            first_event_by_sender_map: Optional[Dict[str, EventBase]] = None
+
+            # The contribution to the room state from state events in the timeline.
+            # Only contains the last event for any given state key.
+            timeline_state: StateMap[str]

            lazy_load_members = sync_config.filter_collection.lazy_load_members()
            include_redundant_members = (
@ -902,10 +939,23 @@ class SyncHandler:
                # We only request state for the members needed to display the
                # timeline:

-                members_to_fetch = {
-                    event.sender  # FIXME: we also care about invite targets etc.
-                    for event in batch.events
-                }
+                timeline_state = {}
+
+                members_to_fetch = set()
+                first_event_by_sender_map = {}
+                for event in batch.events:
+                    # Build the map from user IDs to the first timeline event they sent.
+                    if event.sender not in first_event_by_sender_map:
+                        first_event_by_sender_map[event.sender] = event
+
+                    # We need the event's sender, unless their membership was in a
+                    # previous timeline event.
+                    if (EventTypes.Member, event.sender) not in timeline_state:
+                        members_to_fetch.add(event.sender)
+                    # FIXME: we also care about invite targets etc.
+
+                    if event.is_state():
+                        timeline_state[(event.type, event.state_key)] = event.event_id

                if full_state:
                    # always make sure we LL ourselves so we know we're in the room
@ -915,16 +965,21 @@ class SyncHandler:
                    members_to_fetch.add(sync_config.user.to_string())

                state_filter = StateFilter.from_lazy_load_member_list(members_to_fetch)
-            else:
-                state_filter = StateFilter.all()

-            # The contribution to the room state from state events in the timeline.
-            # Only contains the last event for any given state key.
-            timeline_state = {
-                (event.type, event.state_key): event.event_id
-                for event in batch.events
-                if event.is_state()
-            }
+                # We are happy to use partial state to compute the `/sync` response.
+                # Since partial state may not include the lazy-loaded memberships we
+                # require, we fix up the state response afterwards with memberships from
+                # auth events.
+                await_full_state = False
+            else:
+                timeline_state = {
+                    (event.type, event.state_key): event.event_id
+                    for event in batch.events
+                    if event.is_state()
+                }
+
+                state_filter = StateFilter.all()
+                await_full_state = True

            # Now calculate the state to return in the sync response for the room.
            # This is more or less the change in state between the end of the previous
@ -936,19 +991,26 @@ class SyncHandler:
                if batch:
                    state_at_timeline_end = (
                        await self._state_storage_controller.get_state_ids_for_event(
-                            batch.events[-1].event_id, state_filter=state_filter
+                            batch.events[-1].event_id,
+                            state_filter=state_filter,
+                            await_full_state=await_full_state,
                        )
                    )

                    state_at_timeline_start = (
                        await self._state_storage_controller.get_state_ids_for_event(
-                            batch.events[0].event_id, state_filter=state_filter
+                            batch.events[0].event_id,
+                            state_filter=state_filter,
+                            await_full_state=await_full_state,
                        )
                    )

                else:
                    state_at_timeline_end = await self.get_state_at(
-                        room_id, stream_position=now_token, state_filter=state_filter
+                        room_id,
+                        stream_position=now_token,
+                        state_filter=state_filter,
+                        await_full_state=await_full_state,
                    )

                    state_at_timeline_start = state_at_timeline_end
@ -964,14 +1026,19 @@ class SyncHandler:
                if batch:
                    state_at_timeline_start = (
                        await self._state_storage_controller.get_state_ids_for_event(
-                            batch.events[0].event_id, state_filter=state_filter
+                            batch.events[0].event_id,
+                            state_filter=state_filter,
+                            await_full_state=await_full_state,
                        )
                    )
                else:
                    # We can get here if the user has ignored the senders of all
                    # the recent events.
                    state_at_timeline_start = await self.get_state_at(
-                        room_id, stream_position=now_token, state_filter=state_filter
+                        room_id,
+                        stream_position=now_token,
+                        state_filter=state_filter,
+                        await_full_state=await_full_state,
                    )

                # for now, we disable LL for gappy syncs - see
@ -993,20 +1060,28 @@ class SyncHandler:
                # is indeed the case.
                assert since_token is not None
                state_at_previous_sync = await self.get_state_at(
-                    room_id, stream_position=since_token, state_filter=state_filter
+                    room_id,
+                    stream_position=since_token,
+                    state_filter=state_filter,
+                    await_full_state=await_full_state,
                )

                if batch:
                    state_at_timeline_end = (
                        await self._state_storage_controller.get_state_ids_for_event(
-                            batch.events[-1].event_id, state_filter=state_filter
+                            batch.events[-1].event_id,
+                            state_filter=state_filter,
+                            await_full_state=await_full_state,
                        )
                    )
                else:
                    # We can get here if the user has ignored the senders of all
                    # the recent events.
                    state_at_timeline_end = await self.get_state_at(
-                        room_id, stream_position=now_token, state_filter=state_filter
+                        room_id,
+                        stream_position=now_token,
+                        state_filter=state_filter,
+                        await_full_state=await_full_state,
                    )

                state_ids = _calculate_state(
@ -1036,8 +1111,23 @@ class SyncHandler:
                                (EventTypes.Member, member)
                                for member in members_to_fetch
                            ),
+                            await_full_state=False,
                        )

+            # If we only have partial state for the room, `state_ids` may be missing the
+            # memberships we wanted. We attempt to find some by digging through the auth
+            # events of timeline events.
+            if lazy_load_members and await self.store.is_partial_state_room(room_id):
+                assert members_to_fetch is not None
+                assert first_event_by_sender_map is not None
+
+                additional_state_ids = (
+                    await self._find_missing_partial_state_memberships(
+                        room_id, members_to_fetch, first_event_by_sender_map, state_ids
+                    )
+                )
+                state_ids = {**state_ids, **additional_state_ids}
+
            # At this point, if `lazy_load_members` is enabled, `state_ids` includes
            # the memberships of all event senders in the timeline. This is because we
            # may not have sent the memberships in a previous sync.
@ -1086,6 +1176,99 @@ class SyncHandler:
            if e.type != EventTypes.Aliases  # until MSC2261 or alternative solution
        }

+    async def _find_missing_partial_state_memberships(
+        self,
+        room_id: str,
+        members_to_fetch: Collection[str],
+        events_with_membership_auth: Mapping[str, EventBase],
+        found_state_ids: StateMap[str],
+    ) -> StateMap[str]:
+        """Finds missing memberships from a set of auth events and returns them as a
+        state map.
+
+        Args:
+            room_id: The partial state room to find the remaining memberships for.
+            members_to_fetch: The memberships to find.
+            events_with_membership_auth: A mapping from user IDs to events whose auth
+                events are known to contain their membership.
+            found_state_ids: A dict from (type, state_key) -> state_event_id, containing
+                memberships that have been previously found. Entries in
+                `members_to_fetch` that have a membership in `found_state_ids` are
+                ignored.
+
+        Returns:
+            A dict from ("m.room.member", state_key) -> state_event_id, containing the
+            memberships missing from `found_state_ids`.
+
+        Raises:
+            KeyError: if `events_with_membership_auth` does not have an entry for a
+                missing membership. Memberships in `found_state_ids` do not need an
+                entry in `events_with_membership_auth`.
+        """
+        additional_state_ids: MutableStateMap[str] = {}
+
+        # Tracks the missing members for logging purposes.
+        missing_members = set()
+
+        # Identify memberships missing from `found_state_ids` and pick out the auth
+        # events in which to look for them.
+        auth_event_ids: Set[str] = set()
+        for member in members_to_fetch:
+            if (EventTypes.Member, member) in found_state_ids:
+                continue
+
+            missing_members.add(member)
+            event_with_membership_auth = events_with_membership_auth[member]
+            auth_event_ids.update(event_with_membership_auth.auth_event_ids())
+
+        auth_events = await self.store.get_events(auth_event_ids)
+
+        # Run through the missing memberships once more, picking out the memberships
+        # from the pile of auth events we have just fetched.
+        for member in members_to_fetch:
+            if (EventTypes.Member, member) in found_state_ids:
+                continue
+
+            event_with_membership_auth = events_with_membership_auth[member]
+
+            # Dig through the auth events to find the desired membership.
+            for auth_event_id in event_with_membership_auth.auth_event_ids():
+                # We only store events once we have all their auth events,
+                # so the auth event must be in the pile we have just
+                # fetched.
+                auth_event = auth_events[auth_event_id]
+
+                if (
+                    auth_event.type == EventTypes.Member
+                    and auth_event.state_key == member
+                ):
+                    missing_members.remove(member)
+                    additional_state_ids[
+                        (EventTypes.Member, member)
+                    ] = auth_event.event_id
+                    break
+
+        if missing_members:
+            # There really shouldn't be any missing memberships now. Either:
+            #  * we couldn't find an auth event, which shouldn't happen because we do
+            #    not persist events with persisting their auth events first, or
+            #  * the set of auth events did not contain a membership we wanted, which
+            #    means our caller didn't compute the events in `members_to_fetch`
+            #    correctly, or we somehow accepted an event whose auth events were
+            #    dodgy.
+            logger.error(
+                "Failed to find memberships for %s in partial state room "
+                "%s in the auth events of %s.",
+                missing_members,
+                room_id,
+                [
+                    events_with_membership_auth[member].event_id
+                    for member in missing_members
+                ],
+            )
+
+        return additional_state_ids
+
    async def unread_notifs_for_room_id(
        self, room_id: str, sync_config: SyncConfig
    ) -> NotifCounts:
@ -1730,7 +1913,11 @@ class SyncHandler:
                continue

            if room_id in sync_result_builder.joined_room_ids or has_join:
-                old_state_ids = await self.get_state_at(room_id, since_token)
+                old_state_ids = await self.get_state_at(
+                    room_id,
+                    since_token,
+                    state_filter=StateFilter.from_types([(EventTypes.Member, user_id)]),
+                )
                old_mem_ev_id = old_state_ids.get((EventTypes.Member, user_id), None)
                old_mem_ev = None
                if old_mem_ev_id:
@ -1756,7 +1943,13 @@ class SyncHandler:
                    newly_left_rooms.append(room_id)
                else:
                    if not old_state_ids:
-                        old_state_ids = await self.get_state_at(room_id, since_token)
+                        old_state_ids = await self.get_state_at(
+                            room_id,
+                            since_token,
+                            state_filter=StateFilter.from_types(
+                                [(EventTypes.Member, user_id)]
+                            ),
+                        )
                        old_mem_ev_id = old_state_ids.get(
                            (EventTypes.Member, user_id), None
                        )
--- a/synapse/storage/controllers/state.py
+++ b/synapse/storage/controllers/state.py
@ -234,6 +234,7 @@ class StateStorageController:
        self,
        event_ids: Collection[str],
        state_filter: Optional[StateFilter] = None,
+        await_full_state: bool = True,
    ) -> Dict[str, StateMap[str]]:
        """
        Get the state dicts corresponding to a list of events, containing the event_ids
@ -242,6 +243,9 @@ class StateStorageController:
        Args:
            event_ids: events whose state should be returned
            state_filter: The state filter used to fetch state from the database.
+            await_full_state: if `True`, will block if we do not yet have complete state
+                at these events and `state_filter` is not satisfied by partial state.
+                Defaults to `True`.

        Returns:
            A dict from event_id -> (type, state_key) -> event_id
@ -250,8 +254,12 @@ class StateStorageController:
            RuntimeError if we don't have a state group for one or more of the events
                (ie they are outliers or unknown)
        """
-        await_full_state = True
-        if state_filter and not state_filter.must_await_full_state(self._is_mine_id):
+        if (
+            await_full_state
+            and state_filter
+            and not state_filter.must_await_full_state(self._is_mine_id)
+        ):
+            # Full state is not required if the state filter is restrictive enough.
            await_full_state = False

        event_to_groups = await self.get_state_group_for_events(
@ -294,7 +302,10 @@ class StateStorageController:

    @trace
    async def get_state_ids_for_event(
-        self, event_id: str, state_filter: Optional[StateFilter] = None
+        self,
+        event_id: str,
+        state_filter: Optional[StateFilter] = None,
+        await_full_state: bool = True,
    ) -> StateMap[str]:
        """
        Get the state dict corresponding to a particular event
@ -302,6 +313,9 @@ class StateStorageController:
        Args:
            event_id: event whose state should be returned
            state_filter: The state filter used to fetch state from the database.
+            await_full_state: if `True`, will block if we do not yet have complete state
+                at the event and `state_filter` is not satisfied by partial state.
+                Defaults to `True`.

        Returns:
            A dict from (type, state_key) -> state_event_id
@ -311,7 +325,9 @@ class StateStorageController:
                outlier or is unknown)
        """
        state_map = await self.get_state_ids_for_events(
-            [event_id], state_filter or StateFilter.all()
+            [event_id],
+            state_filter or StateFilter.all(),
+            await_full_state=await_full_state,
        )
        return state_map[event_id]

--- a/synapse/storage/databases/main/room.py
+++ b/synapse/storage/databases/main/room.py
@ -2001,9 +2001,15 @@ class RoomStore(RoomBackgroundUpdateStore, RoomWorkerStore):

            where_clause = "WHERE " + " AND ".join(filters) if len(filters) > 0 else ""

+            # We join on room_stats_state despite not using any columns from it
+            # because the join can influence the number of rows returned;
+            # e.g. a room that doesn't have state, maybe because it was deleted.
+            # The query returning the total count should be consistent with
+            # the query returning the results.
            sql = """
                SELECT COUNT(*) as total_event_reports
                FROM event_reports AS er
+                JOIN room_stats_state ON room_stats_state.room_id = er.room_id
                {}
                """.format(
                where_clause
--- a/synapse/util/ratelimitutils.py
+++ b/synapse/util/ratelimitutils.py
@ -18,6 +18,8 @@ import logging
 import typing
 from typing import Any, DefaultDict, Iterator, List, Set

+from prometheus_client.core import Counter
+
 from twisted.internet import defer

 from synapse.api.errors import LimitExceededError
@ -28,7 +30,7 @@ from synapse.logging.context import (
    run_in_background,
 )
 from synapse.logging.opentracing import start_active_span
-from synapse.metrics import Histogram
+from synapse.metrics import Histogram, LaterGauge
 from synapse.util import Clock

 if typing.TYPE_CHECKING:
@ -37,6 +39,9 @@ if typing.TYPE_CHECKING:
 logger = logging.getLogger(__name__)


+# Track how much the ratelimiter is affecting requests
+rate_limit_sleep_counter = Counter("synapse_rate_limit_sleep", "")
+rate_limit_reject_counter = Counter("synapse_rate_limit_reject", "")
 queue_wait_timer = Histogram(
    "synapse_rate_limit_queue_wait_time_seconds",
    "sec",
@ -69,6 +74,27 @@ class FederationRateLimiter:
            str, "_PerHostRatelimiter"
        ] = collections.defaultdict(new_limiter)

+        # We track the number of affected hosts per time-period so we can
+        # differentiate one really noisy homeserver from a general
+        # ratelimit tuning problem across the federation.
+        LaterGauge(
+            "synapse_rate_limit_sleep_affected_hosts",
+            "Number of hosts that had requests put to sleep",
+            [],
+            lambda: sum(
+                ratelimiter.should_sleep() for ratelimiter in self.ratelimiters.values()
+            ),
+        )
+        LaterGauge(
+            "synapse_rate_limit_reject_affected_hosts",
+            "Number of hosts that had requests rejected",
+            [],
+            lambda: sum(
+                ratelimiter.should_reject()
+                for ratelimiter in self.ratelimiters.values()
+            ),
+        )
+
    def ratelimit(self, host: str) -> "_GeneratorContextManager[defer.Deferred[None]]":
        """Used to ratelimit an incoming request from a given host

@ -84,7 +110,7 @@ class FederationRateLimiter:
        Returns:
            context manager which returns a deferred.
        """
-        return self.ratelimiters[host].ratelimit()
+        return self.ratelimiters[host].ratelimit(host)


 class _PerHostRatelimiter:
@ -119,19 +145,42 @@ class _PerHostRatelimiter:
        self.request_times: List[int] = []

    @contextlib.contextmanager
-    def ratelimit(self) -> "Iterator[defer.Deferred[None]]":
+    def ratelimit(self, host: str) -> "Iterator[defer.Deferred[None]]":
        # `contextlib.contextmanager` takes a generator and turns it into a
        # context manager. The generator should only yield once with a value
        # to be returned by manager.
        # Exceptions will be reraised at the yield.

+        self.host = host
+
        request_id = object()
-        ret = self._on_enter(request_id)
+        # Ideally we'd use `Deferred.fromCoroutine()` here, to save on redundant
+        # type-checking, but we'd need Twisted >= 21.2.
+        ret = defer.ensureDeferred(self._on_enter_with_tracing(request_id))
        try:
            yield ret
        finally:
            self._on_exit(request_id)

+    def should_reject(self) -> bool:
+        """
+        Whether to reject the request if we already have too many queued up
+        (either sleeping or in the ready queue).
+        """
+        queue_size = len(self.ready_request_queue) + len(self.sleeping_requests)
+        return queue_size > self.reject_limit
+
+    def should_sleep(self) -> bool:
+        """
+        Whether to sleep the request if we already have too many requests coming
+        through within the window.
+        """
+        return len(self.request_times) > self.sleep_limit
+
+    async def _on_enter_with_tracing(self, request_id: object) -> None:
+        with start_active_span("ratelimit wait"), queue_wait_timer.time():
+            await self._on_enter(request_id)
+
    def _on_enter(self, request_id: object) -> "defer.Deferred[None]":
        time_now = self.clock.time_msec()

@ -142,8 +191,9 @@ class _PerHostRatelimiter:

        # reject the request if we already have too many queued up (either
        # sleeping or in the ready queue).
-        queue_size = len(self.ready_request_queue) + len(self.sleeping_requests)
-        if queue_size > self.reject_limit:
+        if self.should_reject():
+            logger.debug("Ratelimiter(%s): rejecting request", self.host)
+            rate_limit_reject_counter.inc()
            raise LimitExceededError(
                retry_after_ms=int(self.window_size / self.sleep_limit)
            )
@ -155,7 +205,8 @@ class _PerHostRatelimiter:
                queue_defer: defer.Deferred[None] = defer.Deferred()
                self.ready_request_queue[request_id] = queue_defer
                logger.info(
-                    "Ratelimiter: queueing request (queue now %i items)",
+                    "Ratelimiter(%s): queueing request (queue now %i items)",
+                    self.host,
                    len(self.ready_request_queue),
                )

@ -164,19 +215,28 @@ class _PerHostRatelimiter:
                return defer.succeed(None)

        logger.debug(
-            "Ratelimit [%s]: len(self.request_times)=%d",
+            "Ratelimit(%s) [%s]: len(self.request_times)=%d",
+            self.host,
            id(request_id),
            len(self.request_times),
        )

-        if len(self.request_times) > self.sleep_limit:
-            logger.debug("Ratelimiter: sleeping request for %f sec", self.sleep_sec)
+        if self.should_sleep():
+            logger.debug(
+                "Ratelimiter(%s) [%s]: sleeping request for %f sec",
+                self.host,
+                id(request_id),
+                self.sleep_sec,
+            )
+            rate_limit_sleep_counter.inc()
            ret_defer = run_in_background(self.clock.sleep, self.sleep_sec)

            self.sleeping_requests.add(request_id)

            def on_wait_finished(_: Any) -> "defer.Deferred[None]":
-                logger.debug("Ratelimit [%s]: Finished sleeping", id(request_id))
+                logger.debug(
+                    "Ratelimit(%s) [%s]: Finished sleeping", self.host, id(request_id)
+                )
                self.sleeping_requests.discard(request_id)
                queue_defer = queue_request()
                return queue_defer
@ -186,7 +246,9 @@ class _PerHostRatelimiter:
            ret_defer = queue_request()

        def on_start(r: object) -> object:
-            logger.debug("Ratelimit [%s]: Processing req", id(request_id))
+            logger.debug(
+                "Ratelimit(%s) [%s]: Processing req", self.host, id(request_id)
+            )
            self.current_processing.add(request_id)
            return r

@ -201,23 +263,14 @@ class _PerHostRatelimiter:
            # Ensure that we've properly cleaned up.
            self.sleeping_requests.discard(request_id)
            self.ready_request_queue.pop(request_id, None)
-            wait_span_scope.__exit__(None, None, None)
-            wait_timer_cm.__exit__(None, None, None)
            return r

-        # Tracing
-        wait_span_scope = start_active_span("ratelimit wait")
-        wait_span_scope.__enter__()
-        # Metrics
-        wait_timer_cm = queue_wait_timer.time()
-        wait_timer_cm.__enter__()
-
        ret_defer.addCallbacks(on_start, on_err)
        ret_defer.addBoth(on_both)
        return make_deferred_yieldable(ret_defer)

    def _on_exit(self, request_id: object) -> None:
-        logger.debug("Ratelimit [%s]: Processed req", id(request_id))
+        logger.debug("Ratelimit(%s) [%s]: Processed req", self.host, id(request_id))
        self.current_processing.discard(request_id)
        try:
            # start processing the next item on the queue.
--- a/tests/rest/admin/test_event_reports.py
+++ b/tests/rest/admin/test_event_reports.py
@ -410,6 +410,33 @@ class EventReportsTestCase(unittest.HomeserverTestCase):
            self.assertIn("score", c)
            self.assertIn("reason", c)

+    def test_count_correct_despite_table_deletions(self) -> None:
+        """
+        Tests that the count matches the number of rows, even if rows in joined tables
+        are missing.
+        """
+
+        # Delete rows from room_stats_state for one of our rooms.
+        self.get_success(
+            self.hs.get_datastores().main.db_pool.simple_delete(
+                "room_stats_state", {"room_id": self.room_id1}, desc="_"
+            )
+        )
+
+        channel = self.make_request(
+            "GET",
+            self.url,
+            access_token=self.admin_user_tok,
+        )
+
+        self.assertEqual(200, channel.code, msg=channel.json_body)
+        # The 'total' field is 10 because only 10 reports will actually
+        # be retrievable since we deleted the rows in the room_stats_state
+        # table.
+        self.assertEqual(channel.json_body["total"], 10)
+        # This is consistent with the number of rows actually returned.
+        self.assertEqual(len(channel.json_body["event_reports"]), 10)
+

 class EventReportDetailTestCase(unittest.HomeserverTestCase):
    servlets = [
				`@ -0,0 +1 @@`
				Faster room joins: Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state.
				`@ -0,0 +1 @@`
				Fix a bug in the `/event_reports` Admin API which meant that the total count could be larger than the number of results you can actually query for.
				`@ -0,0 +1 @@`
				`Add metrics to track how the rate limiter is affecting requests (sleep/reject).`
				`@ -0,0 +1 @@`
				`Add support for compression to federation responses.`
				`@ -0,0 +1 @@`
				Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger.