Table of Contents
- Background
- Which workers are meaningful ?
- analysing old logs
- synapse.app.synchrotron
- synapse.app.federation_reader
- synapse.app.media_repository
- synapse.app.client_reader
- synapse.app.user_dir
- synapse.app.frontend_proxy
- synapse.app.event_creator
- results
- Setting up synchrotron worker(s)
- homeserver.yaml
- workers configuration
- Starting the worker
- systemd
- Nginx config
- reverse proxy the endpoints
- federation_reader
- event_creator
- media_repository
- Issues
This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.
WIP
The actual documentation for setting up workers is not really easy to follow :
- https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
- https://github.com/matrix-org/synapse/blob/master/docs/workers.md
This is how I change my setup for using workers. Seems to work for me now.
**WARNING : SHOULD BE REVIEWED ! WIP ! ** Look at the issues below first
I expect you have already a working synapse configuration. Not putting whole config files here
Background
- My setup is having around 400 users. mostly around 300 concurrent connections on day time. 4500 local rooms. Some big federated rooms too.
- Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL).
- DB is 14GB big
- nginx is used as a reverse proxy
- Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs.
- my nginx graph gives an average of 140 requests/s in working hours
- I'm using the debian packages of matrix.org and starting matrix with systemd
Which workers are meaningful ?
analysing old logs
First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours. Below the grep I used for different worker's endpoints
synapse.app.synchrotron
grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)'
synapse.app.federation_reader
grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'
synapse.app.media_repository
grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'
synapse.app.client_reader
grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'
Note : I didn't included /_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages
)
175576 (without /messages)
9998816 (with /messages not sure why)
synapse.app.user_dir
grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'
synapse.app.frontend_proxy
grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'
synapse.app.event_creator
grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'
results
worker’s endpoints | request/day | percent |
---|---|---|
synchrotron | 9017921 | 90.19% |
federation_reader | 321413 | 3.21% |
media_repository | 115749 | 1.16% |
client_reader | 175576 | 1.76% |
user_dir | 1341 | 0.01% |
frontend_proxy | 6936 | 0.07% |
event_creator | 26876 | 0.27% |
total | 9665812 | 96.67% |
total requests | 9998816 | 100.00% |
others | 333004 | 3.33% |
So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this)
Setting up synchrotron worker(s)
WARNING : I broke parts of my setup a lot while trying to do it on a live server.
homeserver.yaml
Just add this in the existing listeners part of the config
listeners:
# The TCP replication port
- port: 9092
bind_address: '127.0.0.1'
type: replication
# The HTTP replication port
- port: 9093
bind_address: '127.0.0.1'
type: http
resources:
- names: [replication]
Also add this to homeserver.yaml
worker_app: synapse.app.homeserver
daemonize: false
restart your synapse to check it's still working
# systemctl restart matrix-synapse
workers configuration
Note : if you work as root, take care of giving the config files to matrix-synapse user after creating them
I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers.
mkdir /etc/matrix-synapse/workers
/etc/matrix-synapse/workers/synchrotron-1.yaml
worker_app: synapse.app.synchrotron
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8083
resources:
- names:
- client
worker_daemonize: False
worker_pid_file: /var/run/synchrotron1.pid
worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False
If you want to run multiple synchrotron, create other config like this sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml
Don't forget to create log config files as weel for each worker.
/etc/matrix-synapse/synchrotron1-log.yaml
This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log It may possibly be reduced...
version: 1
formatters:
precise:
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'
filters:
context:
(): synapse.util.logcontext.LoggingContextFilter
request: ""
handlers:
file:
class: logging.handlers.RotatingFileHandler
formatter: precise
filename: /var/log/matrix-synapse/synchrotron1.log
maxBytes: 104857600
backupCount: 10
filters: [context]
encoding: utf8
level: DEBUG
console:
class: logging.StreamHandler
formatter: precise
level: WARN
loggers:
synapse:
level: WARN
synapse.storage.SQL:
level: INFO
synapse.app.synchrotron:
level: DEBUG
root:
level: WARN
handlers: [file, console]
Starting the worker
I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly
systemd
Followed this : https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
And created an extra systemd service to be able to have multiple synchrotrons.
/etc/systemd/system/matrix-synapse-worker-synchrotron@.service
[Unit]
Description=Synapse Matrix Worker
After=matrix-synapse.service
BindsTo=matrix-synapse.service
[Service]
Type=notify
NotifyAccess=main
User=matrix-synapse
WorkingDirectory=/var/lib/matrix-synapse
EnvironmentFile=/etc/default/matrix-synapse
ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=matrix-synapse-synchrotron-%i
[Install]
WantedBy=matrix-synapse.service
- Reload the systemd config :
systemctl daemon-reload
- start synchrotron1 :
systemctl start matrix-synapse-worker-synchrotron@1.service
- check the logs :
journal -xe -f -u matrix-synapse-worker-synchrotron@1.service
If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet.
Nginx config
Some extras
add this to your default_server somewhere in server { }
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow ::1;
deny all;
}
you can then get some ideas of the requests you get with
$ curl http://127.0.0.1/nginx_status
Active connections: 270
server accepts handled requests
172758 172758 3500311
Reading: 0 Writing: 126 Waiting: 144
upstream synchrotrons
First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) :
Place this in your nginx config (I put it in my vhost config outside of server {}
)
log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time';
upstream synchrotron {
# ip_hash; # this might help in some cases, not in mine
# server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely)
server 127.0.0.1:8083; # synchrotron1
# server 127.0.0.1:8084; # synchrotron2
# server 127.0.0.1:8085; # synchrotron3
}
Then, you can change the default log format of your vhost :
server {
#[...]
access_log /var/log/nginx/matrix-access.log backend;
#[...]
}
reverse proxy the endpoints
in my server {}
section I set multiple locations (to avoid a very big regexp):
location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ {
proxy_pass http://synchrotron$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ {
proxy_pass http://synchrotron$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ {
proxy_pass http://synchrotron$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ {
proxy_pass http://synchrotron$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
reload the nginx config, and your synchrotron worker should start to get traffic.
federation_reader
workers/federation_reader.yaml
synapse.app.federation_reader listen on port 8011
worker_app: synapse.app.federation_reader
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8011
resources:
- names: [federation]
worker_pid_file: "/var/run/app.federation_reader.pid"
worker_daemonize: False
worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False
Here I separated the ^/_matrix/federation/v1/send/
endpoint, since it's documented that this cannot be multiple
location ~ ^/_matrix/federation/v1/send/ {
proxy_pass http://127.0.0.1:8011$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
# and a big regex for the rest
location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {
proxy_pass http://127.0.0.1:8011$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
event_creator
/etc/matrix-synapse/workers/event_creator.yaml
worker_app: synapse.app.event_creator
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8102
resources:
- names:
- client
worker_daemonize: False
worker_pid_file: /var/run/event_creator.pid
worker_log_config: /etc/matrix-synapse/event_creator-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False
nginx
# events_creator
location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) {
proxy_pass http://127.0.0.1:8102$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
media_repository
/etc/matrix-synapse/workers/media_repository.yaml
worker_app: synapse.app.media_repository
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8101
resources:
- names:
- media
worker_daemonize: False
worker_pid_file: /var/run/media_repository.pid
worker_log_config: /etc/matrix-synapse/media_repository-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False
Nginx
# media_repository
location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) {
proxy_pass http://127.0.0.1:8101$request_uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
Issues
This are the issues I met until now (it might also have been related to some big federated rooms):
CPU usage was getting high on all synchrotron workers, more than with a single synapse process: proxy_pass with $request_uri instead of the normalized $uri solved thisa lot of clients were disconnecting all the timecame from the problem abovesome old notifications where popping up on desktop and mobile all the timeSOLVEDmedia_repository was breaking thumbnails: I guess it's solved with the $request_uri trick- https://github.com/matrix-org/synapse/issues/7154
c816072d47
- https://github.com/matrix-org/synapse/issues/7130