Fix sharded federation sender sometimes using 100% CPU.

We pull all destinations requiring catchup from the DB in batches. However, if all those destinations get filtered out (due to the federation sender being sharded), then the `last_processed` destination doesn't get updated, and we keep requesting the same set repeatedly.
2021-04-08 17:30:01 +01:00 · 2021-04-08 17:30:01 +01:00 · 3a569fb200
parent 48d44ab142
commit 3a569fb200
2 changed files with 5 additions and 2 deletions
--- a/changelog.d/9770.bugfix
+++ b/changelog.d/9770.bugfix
@ -0,0 +1 @@
+Fix bug where sharded federation senders could get stuck repeatedly querying the DB in a loop, using lots of CPU.
--- a/synapse/federation/sender/init.py
+++ b/synapse/federation/sender/init.py
@ -734,16 +734,18 @@ class FederationSender(AbstractFederationSender):
                self._catchup_after_startup_timer = None
                break

+            last_processed = destinations_to_wake[-1]
+
            destinations_to_wake = [
                d
                for d in destinations_to_wake
                if self._federation_shard_config.should_handle(self._instance_name, d)
            ]

-            for last_processed in destinations_to_wake:
+            for destination in destinations_to_wake:
                logger.info(
                    "Destination %s has outstanding catch-up, waking up.",
                    last_processed,
                )
-                self.wake_destination(last_processed)
+                self.wake_destination(destination)
                await self.clock.sleep(CATCH_UP_STARTUP_INTERVAL_SEC)
				`@ -0,0 +1 @@`
				`Fix bug where sharded federation senders could get stuck repeatedly querying the DB in a loop, using lots of CPU.`