Remove configuration options for direct TCP replication. (#13647)

Removes the ability to configure legacy direct TCP replication. Workers now require Redis to run.
pull/13724/head
Patrick Cloke 2022-09-06 03:50:02 -04:00 committed by GitHub
parent 8edf3f66d5
commit 32fc3b7ba4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 63 additions and 78 deletions

View File

@ -204,7 +204,6 @@ jobs:
POSTGRES: ${{ matrix.job.postgres && 1}} POSTGRES: ${{ matrix.job.postgres && 1}}
MULTI_POSTGRES: ${{ (matrix.job.postgres == 'multi-postgres') && 1}} MULTI_POSTGRES: ${{ (matrix.job.postgres == 'multi-postgres') && 1}}
WORKERS: ${{ matrix.job.workers && 1 }} WORKERS: ${{ matrix.job.workers && 1 }}
REDIS: 1
BLACKLIST: ${{ matrix.job.workers && 'synapse-blacklist-with-workers' }} BLACKLIST: ${{ matrix.job.workers && 'synapse-blacklist-with-workers' }}
TOP: ${{ github.workspace }} TOP: ${{ github.workspace }}

View File

@ -0,0 +1 @@
Remove the ability to use direct TCP replication with workers. Direct TCP replication was deprecated in Synapse v1.18.0. Workers now require using Redis.

View File

@ -91,6 +91,21 @@ process, for example:
# Upgrading to v1.67.0 # Upgrading to v1.67.0
## Direct TCP replication is no longer supported: migrate to Redis
Redis support was added in v1.13.0 with it becoming the recommended method in
v1.18.0. It replaced the old direct TCP connections (which was deprecated as of
v1.18.0) to the main process. With Redis, rather than all the workers connecting
to the main process, all the workers and the main process connect to Redis,
which relays replication commands between processes. This can give a significant
CPU saving on the main process and is a prerequisite for upcoming
performance improvements.
To migrate to Redis add the [`redis` config](./workers.md#shared-configuration),
and remove the TCP `replication` listener from config of the master and
`worker_replication_port` from worker config. Note that a HTTP listener with a
`replication` resource is still required.
## Minimum version of Poetry is now v1.2.0 ## Minimum version of Poetry is now v1.2.0
The minimum supported version of poetry is now 1.2. This should only affect The minimum supported version of poetry is now 1.2. This should only affect

View File

@ -431,8 +431,6 @@ Sub-options for each listener include:
* `metrics`: (see the docs [here](../../metrics-howto.md)), * `metrics`: (see the docs [here](../../metrics-howto.md)),
* `replication`: (deprecated as of Synapse 1.18, see the docs [here](../../workers.md)).
* `tls`: set to true to enable TLS for this listener. Will use the TLS key/cert specified in tls_private_key_path / tls_certificate_path. * `tls`: set to true to enable TLS for this listener. Will use the TLS key/cert specified in tls_private_key_path / tls_certificate_path.
* `x_forwarded`: Only valid for an 'http' listener. Set to true to use the X-Forwarded-For header as the client IP. Useful when Synapse is * `x_forwarded`: Only valid for an 'http' listener. Set to true to use the X-Forwarded-For header as the client IP. Useful when Synapse is

View File

@ -32,13 +32,8 @@ stream between all configured Synapse processes. Additionally, processes may
make HTTP requests to each other, primarily for operations which need to wait make HTTP requests to each other, primarily for operations which need to wait
for a reply ─ such as sending an event. for a reply ─ such as sending an event.
Redis support was added in v1.13.0 with it becoming the recommended method in All the workers and the main process connect to Redis, which relays replication
v1.18.0. It replaced the old direct TCP connections (which is deprecated as of commands between processes.
v1.18.0) to the main process. With Redis, rather than all the workers connecting
to the main process, all the workers and the main process connect to Redis,
which relays replication commands between processes. This can give a significant
cpu saving on the main process and will be a prerequisite for upcoming
performance improvements.
If Redis support is enabled Synapse will use it as a shared cache, as well as a If Redis support is enabled Synapse will use it as a shared cache, as well as a
pub/sub mechanism. pub/sub mechanism.
@ -330,7 +325,6 @@ effects of bursts of events from that bridge on events sent by normal users.
Additionally, the writing of specific streams (such as events) can be moved off Additionally, the writing of specific streams (such as events) can be moved off
of the main process to a particular worker. of the main process to a particular worker.
(This is only supported with Redis-based replication.)
To enable this, the worker must have a HTTP replication listener configured, To enable this, the worker must have a HTTP replication listener configured,
have a `worker_name` and be listed in the `instance_map` config. The same worker have a `worker_name` and be listed in the `instance_map` config. The same worker
@ -600,15 +594,9 @@ equivalent to `synapse.app.generic_worker`:
## Migration from old config ## Migration from old config
There are two main independent changes that have been made: introducing Redis A main change that has occurred is the merging of worker apps into
support and merging apps into `synapse.app.generic_worker`. Both these changes `synapse.app.generic_worker`. This change is backwards compatible and so no
are backwards compatible and so no changes to the config are required, however changes to the config are required.
server admins are encouraged to plan to migrate to Redis as the old style direct
TCP replication config is deprecated.
To migrate to Redis add the `redis` config as above, and optionally remove the
TCP `replication` listener from master and `worker_replication_port` from worker
config.
To migrate apps to use `synapse.app.generic_worker` simply update the To migrate apps to use `synapse.app.generic_worker` simply update the
`worker_app` option in the worker configs, and where worker are started (e.g. `worker_app` option in the worker configs, and where worker are started (e.g.

View File

@ -57,7 +57,6 @@ from synapse.http.site import SynapseSite
from synapse.logging.context import LoggingContext from synapse.logging.context import LoggingContext
from synapse.metrics import METRICS_PREFIX, MetricsResource, RegistryProxy from synapse.metrics import METRICS_PREFIX, MetricsResource, RegistryProxy
from synapse.replication.http import REPLICATION_PREFIX, ReplicationRestResource from synapse.replication.http import REPLICATION_PREFIX, ReplicationRestResource
from synapse.replication.tcp.resource import ReplicationStreamProtocolFactory
from synapse.rest import ClientRestResource from synapse.rest import ClientRestResource
from synapse.rest.admin import AdminRestResource from synapse.rest.admin import AdminRestResource
from synapse.rest.health import HealthResource from synapse.rest.health import HealthResource
@ -290,16 +289,6 @@ class SynapseHomeServer(HomeServer):
manhole_settings=self.config.server.manhole_settings, manhole_settings=self.config.server.manhole_settings,
manhole_globals={"hs": self}, manhole_globals={"hs": self},
) )
elif listener.type == "replication":
services = listen_tcp(
listener.bind_addresses,
listener.port,
ReplicationStreamProtocolFactory(self),
)
for s in services:
self.get_reactor().addSystemEventTrigger(
"before", "shutdown", s.stopListening
)
elif listener.type == "metrics": elif listener.type == "metrics":
if not self.config.metrics.enable_metrics: if not self.config.metrics.enable_metrics:
logger.warning( logger.warning(

View File

@ -36,6 +36,12 @@ from ._util import validate_config
logger = logging.Logger(__name__) logger = logging.Logger(__name__)
DIRECT_TCP_ERROR = """
Using direct TCP replication for workers is no longer supported.
Please see https://matrix-org.github.io/synapse/latest/upgrade.html#direct-tcp-replication-is-no-longer-supported-migrate-to-redis
"""
# by default, we attempt to listen on both '::' *and* '0.0.0.0' because some OSes # by default, we attempt to listen on both '::' *and* '0.0.0.0' because some OSes
# (Windows, macOS, other BSD/Linux where net.ipv6.bindv6only is set) will only listen # (Windows, macOS, other BSD/Linux where net.ipv6.bindv6only is set) will only listen
# on IPv6 when '::' is set. # on IPv6 when '::' is set.
@ -165,7 +171,6 @@ KNOWN_LISTENER_TYPES = {
"http", "http",
"metrics", "metrics",
"manhole", "manhole",
"replication",
} }
KNOWN_RESOURCES = { KNOWN_RESOURCES = {
@ -515,7 +520,9 @@ class ServerConfig(Config):
): ):
raise ConfigError("allowed_avatar_mimetypes must be a list") raise ConfigError("allowed_avatar_mimetypes must be a list")
self.listeners = [parse_listener_def(x) for x in config.get("listeners", [])] self.listeners = [
parse_listener_def(i, x) for i, x in enumerate(config.get("listeners", []))
]
# no_tls is not really supported any more, but let's grandfather it in # no_tls is not really supported any more, but let's grandfather it in
# here. # here.
@ -880,9 +887,12 @@ def read_gc_thresholds(
) )
def parse_listener_def(listener: Any) -> ListenerConfig: def parse_listener_def(num: int, listener: Any) -> ListenerConfig:
"""parse a listener config from the config file""" """parse a listener config from the config file"""
listener_type = listener["type"] listener_type = listener["type"]
# Raise a helpful error if direct TCP replication is still configured.
if listener_type == "replication":
raise ConfigError(DIRECT_TCP_ERROR, ("listeners", str(num), "type"))
port = listener.get("port") port = listener.get("port")
if not isinstance(port, int): if not isinstance(port, int):

View File

@ -27,7 +27,7 @@ from ._base import (
RoutableShardedWorkerHandlingConfig, RoutableShardedWorkerHandlingConfig,
ShardedWorkerHandlingConfig, ShardedWorkerHandlingConfig,
) )
from .server import ListenerConfig, parse_listener_def from .server import DIRECT_TCP_ERROR, ListenerConfig, parse_listener_def
_FEDERATION_SENDER_WITH_SEND_FEDERATION_ENABLED_ERROR = """ _FEDERATION_SENDER_WITH_SEND_FEDERATION_ENABLED_ERROR = """
The send_federation config option must be disabled in the main The send_federation config option must be disabled in the main
@ -128,7 +128,8 @@ class WorkerConfig(Config):
self.worker_app = None self.worker_app = None
self.worker_listeners = [ self.worker_listeners = [
parse_listener_def(x) for x in config.get("worker_listeners", []) parse_listener_def(i, x)
for i, x in enumerate(config.get("worker_listeners", []))
] ]
self.worker_daemonize = bool(config.get("worker_daemonize")) self.worker_daemonize = bool(config.get("worker_daemonize"))
self.worker_pid_file = config.get("worker_pid_file") self.worker_pid_file = config.get("worker_pid_file")
@ -142,7 +143,8 @@ class WorkerConfig(Config):
self.worker_replication_host = config.get("worker_replication_host", None) self.worker_replication_host = config.get("worker_replication_host", None)
# The port on the main synapse for TCP replication # The port on the main synapse for TCP replication
self.worker_replication_port = config.get("worker_replication_port", None) if "worker_replication_port" in config:
raise ConfigError(DIRECT_TCP_ERROR, ("worker_replication_port",))
# The port on the main synapse for HTTP replication endpoint # The port on the main synapse for HTTP replication endpoint
self.worker_replication_http_port = config.get("worker_replication_http_port") self.worker_replication_http_port = config.get("worker_replication_http_port")

View File

@ -35,7 +35,6 @@ from twisted.internet.protocol import ReconnectingClientFactory
from synapse.metrics import LaterGauge from synapse.metrics import LaterGauge
from synapse.metrics.background_process_metrics import run_as_background_process from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.replication.tcp.client import DirectTcpReplicationClientFactory
from synapse.replication.tcp.commands import ( from synapse.replication.tcp.commands import (
ClearUserSyncsCommand, ClearUserSyncsCommand,
Command, Command,
@ -332,46 +331,31 @@ class ReplicationCommandHandler:
def start_replication(self, hs: "HomeServer") -> None: def start_replication(self, hs: "HomeServer") -> None:
"""Helper method to start replication.""" """Helper method to start replication."""
if hs.config.redis.redis_enabled: from synapse.replication.tcp.redis import RedisDirectTcpReplicationClientFactory
from synapse.replication.tcp.redis import (
RedisDirectTcpReplicationClientFactory,
)
# First let's ensure that we have a ReplicationStreamer started. # First let's ensure that we have a ReplicationStreamer started.
hs.get_replication_streamer() hs.get_replication_streamer()
# We need two connections to redis, one for the subscription stream and # We need two connections to redis, one for the subscription stream and
# one to send commands to (as you can't send further redis commands to a # one to send commands to (as you can't send further redis commands to a
# connection after SUBSCRIBE is called). # connection after SUBSCRIBE is called).
# First create the connection for sending commands. # First create the connection for sending commands.
outbound_redis_connection = hs.get_outbound_redis_connection() outbound_redis_connection = hs.get_outbound_redis_connection()
# Now create the factory/connection for the subscription stream. # Now create the factory/connection for the subscription stream.
self._factory = RedisDirectTcpReplicationClientFactory( self._factory = RedisDirectTcpReplicationClientFactory(
hs, hs,
outbound_redis_connection, outbound_redis_connection,
channel_names=self._channels_to_subscribe_to, channel_names=self._channels_to_subscribe_to,
) )
hs.get_reactor().connectTCP( hs.get_reactor().connectTCP(
hs.config.redis.redis_host, hs.config.redis.redis_host,
hs.config.redis.redis_port, hs.config.redis.redis_port,
self._factory, self._factory,
timeout=30, timeout=30,
bindAddress=None, bindAddress=None,
) )
else:
client_name = hs.get_instance_name()
self._factory = DirectTcpReplicationClientFactory(hs, client_name, self)
host = hs.config.worker.worker_replication_host
port = hs.config.worker.worker_replication_port
hs.get_reactor().connectTCP(
host,
port,
self._factory,
timeout=30,
bindAddress=None,
)
def get_streams(self) -> Dict[str, Stream]: def get_streams(self) -> Dict[str, Stream]:
"""Get a map from stream name to all streams.""" """Get a map from stream name to all streams."""

View File

@ -61,7 +61,7 @@ class FederationReaderOpenIDListenerTests(HomeserverTestCase):
} }
# Listen with the config # Listen with the config
self.hs._listen_http(parse_listener_def(config)) self.hs._listen_http(parse_listener_def(0, config))
# Grab the resource from the site that was told to listen # Grab the resource from the site that was told to listen
site = self.reactor.tcpServers[0][1] site = self.reactor.tcpServers[0][1]
@ -109,7 +109,7 @@ class SynapseHomeserverOpenIDListenerTests(HomeserverTestCase):
} }
# Listen with the config # Listen with the config
self.hs._listener_http(self.hs.config, parse_listener_def(config)) self.hs._listener_http(self.hs.config, parse_listener_def(0, config))
# Grab the resource from the site that was told to listen # Grab the resource from the site that was told to listen
site = self.reactor.tcpServers[0][1] site = self.reactor.tcpServers[0][1]

View File

@ -228,7 +228,7 @@ class OptionsResourceTests(unittest.TestCase):
site = SynapseSite( site = SynapseSite(
"test", "test",
"site_tag", "site_tag",
parse_listener_def({"type": "http", "port": 0}), parse_listener_def(0, {"type": "http", "port": 0}),
self.resource, self.resource,
"1.0", "1.0",
max_request_body_size=4096, max_request_body_size=4096,

View File

@ -135,7 +135,6 @@ def default_config(
"enable_registration_captcha": False, "enable_registration_captcha": False,
"macaroon_secret_key": "not even a little secret", "macaroon_secret_key": "not even a little secret",
"password_providers": [], "password_providers": [],
"worker_replication_url": "",
"worker_app": None, "worker_app": None,
"block_non_admin_invites": False, "block_non_admin_invites": False,
"federation_domain_whitelist": None, "federation_domain_whitelist": None,