181 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
			
		
		
	
	
			181 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
| How to monitor Synapse metrics using Prometheus
 | |
| ===============================================
 | |
| 
 | |
| 1. Install Prometheus:
 | |
| 
 | |
|    Follow instructions at http://prometheus.io/docs/introduction/install/
 | |
| 
 | |
| 2. Enable Synapse metrics:
 | |
| 
 | |
|    There are two methods of enabling metrics in Synapse.
 | |
| 
 | |
|    The first serves the metrics as a part of the usual web server and can be
 | |
|    enabled by adding the "metrics" resource to the existing listener as such::
 | |
| 
 | |
|      resources:
 | |
|        - names:
 | |
|          - client
 | |
|          - metrics
 | |
| 
 | |
|    This provides a simple way of adding metrics to your Synapse installation,
 | |
|    and serves under ``/_synapse/metrics``. If you do not wish your metrics be
 | |
|    publicly exposed, you will need to either filter it out at your load
 | |
|    balancer, or use the second method.
 | |
| 
 | |
|    The second method runs the metrics server on a different port, in a
 | |
|    different thread to Synapse. This can make it more resilient to heavy load
 | |
|    meaning metrics cannot be retrieved, and can be exposed to just internal
 | |
|    networks easier. The served metrics are available over HTTP only, and will
 | |
|    be available at ``/``.
 | |
| 
 | |
|    Add a new listener to homeserver.yaml::
 | |
| 
 | |
|      listeners:
 | |
|        - type: metrics
 | |
|          port: 9000
 | |
|          bind_addresses:
 | |
|            - '0.0.0.0'
 | |
| 
 | |
|    For both options, you will need to ensure that ``enable_metrics`` is set to
 | |
|    ``True``.
 | |
| 
 | |
|    Restart Synapse.
 | |
| 
 | |
| 3. Add a Prometheus target for Synapse.
 | |
| 
 | |
|    It needs to set the ``metrics_path`` to a non-default value (under ``scrape_configs``)::
 | |
| 
 | |
|     - job_name: "synapse"
 | |
|       metrics_path: "/_synapse/metrics"
 | |
|       static_configs:
 | |
|         - targets: ["my.server.here:9092"]
 | |
| 
 | |
|    If your prometheus is older than 1.5.2, you will need to replace
 | |
|    ``static_configs`` in the above with ``target_groups``.
 | |
| 
 | |
|    Restart Prometheus.
 | |
| 
 | |
| 
 | |
| Removal of deprecated metrics & time based counters becoming histograms in 0.31.0
 | |
| ---------------------------------------------------------------------------------
 | |
| 
 | |
| The duplicated metrics deprecated in Synapse 0.27.0 have been removed.
 | |
| 
 | |
| All time duration-based metrics have been changed to be seconds. This affects:
 | |
| 
 | |
| +----------------------------------+
 | |
| | msec -> sec metrics              |
 | |
| +==================================+
 | |
| | python_gc_time                   |
 | |
| +----------------------------------+
 | |
| | python_twisted_reactor_tick_time |
 | |
| +----------------------------------+
 | |
| | synapse_storage_query_time       |
 | |
| +----------------------------------+
 | |
| | synapse_storage_schedule_time    |
 | |
| +----------------------------------+
 | |
| | synapse_storage_transaction_time |
 | |
| +----------------------------------+
 | |
| 
 | |
| Several metrics have been changed to be histograms, which sort entries into
 | |
| buckets and allow better analysis. The following metrics are now histograms:
 | |
| 
 | |
| +-------------------------------------------+
 | |
| | Altered metrics                           |
 | |
| +===========================================+
 | |
| | python_gc_time                            |
 | |
| +-------------------------------------------+
 | |
| | python_twisted_reactor_pending_calls      |
 | |
| +-------------------------------------------+
 | |
| | python_twisted_reactor_tick_time          |
 | |
| +-------------------------------------------+
 | |
| | synapse_http_server_response_time_seconds |
 | |
| +-------------------------------------------+
 | |
| | synapse_storage_query_time                |
 | |
| +-------------------------------------------+
 | |
| | synapse_storage_schedule_time             |
 | |
| +-------------------------------------------+
 | |
| | synapse_storage_transaction_time          |
 | |
| +-------------------------------------------+
 | |
| 
 | |
| 
 | |
| Block and response metrics renamed for 0.27.0
 | |
| ---------------------------------------------
 | |
| 
 | |
| Synapse 0.27.0 begins the process of rationalising the duplicate ``*:count``
 | |
| metrics reported for the resource tracking for code blocks and HTTP requests.
 | |
| 
 | |
| At the same time, the corresponding ``*:total`` metrics are being renamed, as
 | |
| the ``:total`` suffix no longer makes sense in the absence of a corresponding
 | |
| ``:count`` metric.
 | |
| 
 | |
| To enable a graceful migration path, this release just adds new names for the
 | |
| metrics being renamed. A future release will remove the old ones.
 | |
| 
 | |
| The following table shows the new metrics, and the old metrics which they are
 | |
| replacing.
 | |
| 
 | |
| ==================================================== ===================================================
 | |
| New name                                             Old name
 | |
| ==================================================== ===================================================
 | |
| synapse_util_metrics_block_count                     synapse_util_metrics_block_timer:count
 | |
| synapse_util_metrics_block_count                     synapse_util_metrics_block_ru_utime:count
 | |
| synapse_util_metrics_block_count                     synapse_util_metrics_block_ru_stime:count
 | |
| synapse_util_metrics_block_count                     synapse_util_metrics_block_db_txn_count:count
 | |
| synapse_util_metrics_block_count                     synapse_util_metrics_block_db_txn_duration:count
 | |
| 
 | |
| synapse_util_metrics_block_time_seconds              synapse_util_metrics_block_timer:total
 | |
| synapse_util_metrics_block_ru_utime_seconds          synapse_util_metrics_block_ru_utime:total
 | |
| synapse_util_metrics_block_ru_stime_seconds          synapse_util_metrics_block_ru_stime:total
 | |
| synapse_util_metrics_block_db_txn_count              synapse_util_metrics_block_db_txn_count:total
 | |
| synapse_util_metrics_block_db_txn_duration_seconds   synapse_util_metrics_block_db_txn_duration:total
 | |
| 
 | |
| synapse_http_server_response_count                   synapse_http_server_requests
 | |
| synapse_http_server_response_count                   synapse_http_server_response_time:count
 | |
| synapse_http_server_response_count                   synapse_http_server_response_ru_utime:count
 | |
| synapse_http_server_response_count                   synapse_http_server_response_ru_stime:count
 | |
| synapse_http_server_response_count                   synapse_http_server_response_db_txn_count:count
 | |
| synapse_http_server_response_count                   synapse_http_server_response_db_txn_duration:count
 | |
| 
 | |
| synapse_http_server_response_time_seconds            synapse_http_server_response_time:total
 | |
| synapse_http_server_response_ru_utime_seconds        synapse_http_server_response_ru_utime:total
 | |
| synapse_http_server_response_ru_stime_seconds        synapse_http_server_response_ru_stime:total
 | |
| synapse_http_server_response_db_txn_count            synapse_http_server_response_db_txn_count:total
 | |
| synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total
 | |
| ==================================================== ===================================================
 | |
| 
 | |
| 
 | |
| Standard Metric Names
 | |
| ---------------------
 | |
| 
 | |
| As of synapse version 0.18.2, the format of the process-wide metrics has been
 | |
| changed to fit prometheus standard naming conventions. Additionally the units
 | |
| have been changed to seconds, from miliseconds.
 | |
| 
 | |
| ================================== =============================
 | |
| New name                           Old name
 | |
| ================================== =============================
 | |
| process_cpu_user_seconds_total     process_resource_utime / 1000
 | |
| process_cpu_system_seconds_total   process_resource_stime / 1000
 | |
| process_open_fds (no 'type' label) process_fds
 | |
| ================================== =============================
 | |
| 
 | |
| The python-specific counts of garbage collector performance have been renamed.
 | |
| 
 | |
| =========================== ======================
 | |
| New name                    Old name
 | |
| =========================== ======================
 | |
| python_gc_time              reactor_gc_time
 | |
| python_gc_unreachable_total reactor_gc_unreachable
 | |
| python_gc_counts            reactor_gc_counts
 | |
| =========================== ======================
 | |
| 
 | |
| The twisted-specific reactor metrics have been renamed.
 | |
| 
 | |
| ==================================== =====================
 | |
| New name                             Old name
 | |
| ==================================== =====================
 | |
| python_twisted_reactor_pending_calls reactor_pending_calls
 | |
| python_twisted_reactor_tick_time     reactor_tick_time
 | |
| ==================================== =====================
 |