Add documentation for cancellation of request processing (#12761)
Signed-off-by: Sean Quah <seanq@matrix.org>pull/12790/head
							parent
							
								
									50ae4eafe1
								
							
						
					
					
						commit
						3d8839c30c
					
				|  | @ -0,0 +1 @@ | |||
| Add documentation for cancellation of request processing. | ||||
|  | @ -89,6 +89,7 @@ | |||
|   - [Database Schemas](development/database_schema.md) | ||||
|   - [Experimental features](development/experimental_features.md) | ||||
|   - [Synapse Architecture]() | ||||
|     - [Cancellation](development/synapse_architecture/cancellation.md) | ||||
|     - [Log Contexts](log_contexts.md) | ||||
|     - [Replication](replication.md) | ||||
|     - [TCP Replication](tcp_replication.md) | ||||
|  |  | |||
|  | @ -0,0 +1,392 @@ | |||
| # Cancellation | ||||
| Sometimes, requests take a long time to service and clients disconnect | ||||
| before Synapse produces a response. To avoid wasting resources, Synapse | ||||
| can cancel request processing for select endpoints marked with the | ||||
| `@cancellable` decorator. | ||||
| 
 | ||||
| Synapse makes use of Twisted's `Deferred.cancel()` feature to make | ||||
| cancellation work. The `@cancellable` decorator does nothing by itself | ||||
| and merely acts as a flag, signalling to developers and other code alike | ||||
| that a method can be cancelled. | ||||
| 
 | ||||
| ## Enabling cancellation for an endpoint | ||||
| 1. Check that the endpoint method, and any `async` functions in its call | ||||
|    tree handle cancellation correctly. See | ||||
|    [Handling cancellation correctly](#handling-cancellation-correctly) | ||||
|    for a list of things to look out for. | ||||
| 2. Add the `@cancellable` decorator to the `on_GET/POST/PUT/DELETE` | ||||
|    method. It's not recommended to make non-`GET` methods cancellable, | ||||
|    since cancellation midway through some database updates is less | ||||
|    likely to be handled correctly. | ||||
| 
 | ||||
| ## Mechanics | ||||
| There are two stages to cancellation: downward propagation of a | ||||
| `cancel()` call, followed by upwards propagation of a `CancelledError` | ||||
| out of a blocked `await`. | ||||
| Both Twisted and asyncio have a cancellation mechanism. | ||||
| 
 | ||||
| |               | Method              | Exception                               | Exception inherits from | | ||||
| |---------------|---------------------|-----------------------------------------|-------------------------| | ||||
| | Twisted       | `Deferred.cancel()` | `twisted.internet.defer.CancelledError` | `Exception` (!)         | | ||||
| | asyncio       | `Task.cancel()`     | `asyncio.CancelledError`                | `BaseException`         | | ||||
| 
 | ||||
| ### Deferred.cancel() | ||||
| When Synapse starts handling a request, it runs the async method | ||||
| responsible for handling it using `defer.ensureDeferred`, which returns | ||||
| a `Deferred`. For example: | ||||
| 
 | ||||
| ```python | ||||
| def do_something() -> Deferred[None]: | ||||
|     ... | ||||
| 
 | ||||
| @cancellable | ||||
| async def on_GET() -> Tuple[int, JsonDict]: | ||||
|     d = make_deferred_yieldable(do_something()) | ||||
|     await d | ||||
|     return 200, {} | ||||
| 
 | ||||
| request = defer.ensureDeferred(on_GET()) | ||||
| ``` | ||||
| 
 | ||||
| When a client disconnects early, Synapse checks for the presence of the | ||||
| `@cancellable` decorator on `on_GET`. Since `on_GET` is cancellable, | ||||
| `Deferred.cancel()` is called on the `Deferred` from | ||||
| `defer.ensureDeferred`, ie. `request`. Twisted knows which `Deferred` | ||||
| `request` is waiting on and passes the `cancel()` call on to `d`. | ||||
| 
 | ||||
| The `Deferred` being waited on, `d`, may have its own handling for | ||||
| `cancel()` and pass the call on to other `Deferred`s. | ||||
| 
 | ||||
| Eventually, a `Deferred` handles the `cancel()` call by resolving itself | ||||
| with a `CancelledError`. | ||||
| 
 | ||||
| ### CancelledError | ||||
| The `CancelledError` gets raised out of the `await` and bubbles up, as | ||||
| per normal Python exception handling. | ||||
| 
 | ||||
| ## Handling cancellation correctly | ||||
| In general, when writing code that might be subject to cancellation, two | ||||
| things must be considered: | ||||
|  * The effect of `CancelledError`s raised out of `await`s. | ||||
|  * The effect of `Deferred`s being `cancel()`ed. | ||||
| 
 | ||||
| Examples of code that handles cancellation incorrectly include: | ||||
|  * `try-except` blocks which swallow `CancelledError`s. | ||||
|  * Code that shares the same `Deferred`, which may be cancelled, between | ||||
|    multiple requests. | ||||
|  * Code that starts some processing that's exempt from cancellation, but | ||||
|    uses a logging context from cancellable code. The logging context | ||||
|    will be finished upon cancellation, while the uncancelled processing | ||||
|    is still using it. | ||||
| 
 | ||||
| Some common patterns are listed below in more detail. | ||||
| 
 | ||||
| ### `async` function calls | ||||
| Most functions in Synapse are relatively straightforward from a | ||||
| cancellation standpoint: they don't do anything with `Deferred`s and | ||||
| purely call and `await` other `async` functions. | ||||
| 
 | ||||
| An `async` function handles cancellation correctly if its own code | ||||
| handles cancellation correctly and all the async function it calls | ||||
| handle cancellation correctly. For example: | ||||
| ```python | ||||
| async def do_two_things() -> None: | ||||
|     check_something() | ||||
|     await do_something() | ||||
|     await do_something_else() | ||||
| ``` | ||||
| `do_two_things` handles cancellation correctly if `do_something` and | ||||
| `do_something_else` handle cancellation correctly. | ||||
| 
 | ||||
| That is, when checking whether a function handles cancellation | ||||
| correctly, its implementation and all its `async` function calls need to | ||||
| be checked, recursively. | ||||
| 
 | ||||
| As `check_something` is not `async`, it does not need to be checked. | ||||
| 
 | ||||
| ### CancelledErrors | ||||
| Because Twisted's `CancelledError`s are `Exception`s, it's easy to | ||||
| accidentally catch and suppress them. Care must be taken to ensure that | ||||
| `CancelledError`s are allowed to propagate upwards. | ||||
| 
 | ||||
| <table width="100%"> | ||||
| <tr> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Bad**: | ||||
| ```python | ||||
| try: | ||||
|     await do_something() | ||||
| except Exception: | ||||
|     # `CancelledError` gets swallowed here. | ||||
|     logger.info(...) | ||||
| ``` | ||||
| </td> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Good**: | ||||
| ```python | ||||
| try: | ||||
|     await do_something() | ||||
| except CancelledError: | ||||
|     raise | ||||
| except Exception: | ||||
|     logger.info(...) | ||||
| ``` | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **OK**: | ||||
| ```python | ||||
| try: | ||||
|     check_something() | ||||
|     # A `CancelledError` won't ever be raised here. | ||||
| except Exception: | ||||
|     logger.info(...) | ||||
| ``` | ||||
| </td> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Good**: | ||||
| ```python | ||||
| try: | ||||
|     await do_something() | ||||
| except ValueError: | ||||
|     logger.info(...) | ||||
| ``` | ||||
| </td> | ||||
| </tr> | ||||
| </table> | ||||
| 
 | ||||
| #### defer.gatherResults | ||||
| `defer.gatherResults` produces a `Deferred` which: | ||||
|  * broadcasts `cancel()` calls to every `Deferred` being waited on. | ||||
|  * wraps the first exception it sees in a `FirstError`. | ||||
| 
 | ||||
| Together, this means that `CancelledError`s will be wrapped in | ||||
| a `FirstError` unless unwrapped. Such `FirstError`s are liable to be | ||||
| swallowed, so they must be unwrapped. | ||||
| 
 | ||||
| <table width="100%"> | ||||
| <tr> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Bad**: | ||||
| ```python | ||||
| async def do_something() -> None: | ||||
|     await make_deferred_yieldable( | ||||
|         defer.gatherResults([...], consumeErrors=True) | ||||
|     ) | ||||
| 
 | ||||
| try: | ||||
|     await do_something() | ||||
| except CancelledError: | ||||
|     raise | ||||
| except Exception: | ||||
|     # `FirstError(CancelledError)` gets swallowed here. | ||||
|     logger.info(...) | ||||
| ``` | ||||
| 
 | ||||
| </td> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Good**: | ||||
| ```python | ||||
| async def do_something() -> None: | ||||
|     await make_deferred_yieldable( | ||||
|         defer.gatherResults([...], consumeErrors=True) | ||||
|     ).addErrback(unwrapFirstError) | ||||
| 
 | ||||
| try: | ||||
|     await do_something() | ||||
| except CancelledError: | ||||
|     raise | ||||
| except Exception: | ||||
|     logger.info(...) | ||||
| ``` | ||||
| </td> | ||||
| </tr> | ||||
| </table> | ||||
| 
 | ||||
| ### Creation of `Deferred`s | ||||
| If a function creates a `Deferred`, the effect of cancelling it must be considered. `Deferred`s that get shared are likely to have unintended behaviour when cancelled. | ||||
| 
 | ||||
| <table width="100%"> | ||||
| <tr> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Bad**: | ||||
| ```python | ||||
| cache: Dict[str, Deferred[None]] = {} | ||||
| 
 | ||||
| def wait_for_room(room_id: str) -> Deferred[None]: | ||||
|     deferred = cache.get(room_id) | ||||
|     if deferred is None: | ||||
|         deferred = Deferred() | ||||
|         cache[room_id] = deferred | ||||
|     # `deferred` can have multiple waiters. | ||||
|     # All of them will observe a `CancelledError` | ||||
|     # if any one of them is cancelled. | ||||
|     return make_deferred_yieldable(deferred) | ||||
| 
 | ||||
| # Request 1 | ||||
| await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org") | ||||
| # Request 2 | ||||
| await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org") | ||||
| ``` | ||||
| </td> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Good**: | ||||
| ```python | ||||
| cache: Dict[str, Deferred[None]] = {} | ||||
| 
 | ||||
| def wait_for_room(room_id: str) -> Deferred[None]: | ||||
|     deferred = cache.get(room_id) | ||||
|     if deferred is None: | ||||
|         deferred = Deferred() | ||||
|         cache[room_id] = deferred | ||||
|     # `deferred` will never be cancelled now. | ||||
|     # A `CancelledError` will still come out of | ||||
|     # the `await`. | ||||
|     # `delay_cancellation` may also be used. | ||||
|     return make_deferred_yieldable(stop_cancellation(deferred)) | ||||
| 
 | ||||
| # Request 1 | ||||
| await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org") | ||||
| # Request 2 | ||||
| await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org") | ||||
| ``` | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td width="50%" valign="top"> | ||||
| </td> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Good**: | ||||
| ```python | ||||
| cache: Dict[str, List[Deferred[None]]] = {} | ||||
| 
 | ||||
| def wait_for_room(room_id: str) -> Deferred[None]: | ||||
|     if room_id not in cache: | ||||
|         cache[room_id] = [] | ||||
|     # Each request gets its own `Deferred` to wait on. | ||||
|     deferred = Deferred() | ||||
|     cache[room_id]].append(deferred) | ||||
|     return make_deferred_yieldable(deferred) | ||||
| 
 | ||||
| # Request 1 | ||||
| await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org") | ||||
| # Request 2 | ||||
| await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org") | ||||
| ``` | ||||
| </td> | ||||
| </table> | ||||
| 
 | ||||
| ### Uncancelled processing | ||||
| Some `async` functions may kick off some `async` processing which is | ||||
| intentionally protected from cancellation, by `stop_cancellation` or | ||||
| other means. If the `async` processing inherits the logcontext of the | ||||
| request which initiated it, care must be taken to ensure that the | ||||
| logcontext is not finished before the `async` processing completes. | ||||
| 
 | ||||
| <table width="100%"> | ||||
| <tr> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Bad**: | ||||
| ```python | ||||
| cache: Optional[ObservableDeferred[None]] = None | ||||
| 
 | ||||
| async def do_something_else( | ||||
|     to_resolve: Deferred[None] | ||||
| ) -> None: | ||||
|     await ... | ||||
|     logger.info("done!") | ||||
|     to_resolve.callback(None) | ||||
| 
 | ||||
| async def do_something() -> None: | ||||
|     if not cache: | ||||
|         to_resolve = Deferred() | ||||
|         cache = ObservableDeferred(to_resolve) | ||||
|         # `do_something_else` will never be cancelled and | ||||
|         # can outlive the `request-1` logging context. | ||||
|         run_in_background(do_something_else, to_resolve) | ||||
| 
 | ||||
|     await make_deferred_yieldable(cache.observe()) | ||||
| 
 | ||||
| with LoggingContext("request-1"): | ||||
|     await do_something() | ||||
| ``` | ||||
| </td> | ||||
| <td width="50%" valign="top"> | ||||
| 
 | ||||
| **Good**: | ||||
| ```python | ||||
| cache: Optional[ObservableDeferred[None]] = None | ||||
| 
 | ||||
| async def do_something_else( | ||||
|     to_resolve: Deferred[None] | ||||
| ) -> None: | ||||
|     await ... | ||||
|     logger.info("done!") | ||||
|     to_resolve.callback(None) | ||||
| 
 | ||||
| async def do_something() -> None: | ||||
|     if not cache: | ||||
|         to_resolve = Deferred() | ||||
|         cache = ObservableDeferred(to_resolve) | ||||
|         run_in_background(do_something_else, to_resolve) | ||||
|         # We'll wait until `do_something_else` is | ||||
|         # done before raising a `CancelledError`. | ||||
|         await make_deferred_yieldable( | ||||
|             delay_cancellation(cache.observe()) | ||||
|         ) | ||||
|     else: | ||||
|         await make_deferred_yieldable(cache.observe()) | ||||
| 
 | ||||
| with LoggingContext("request-1"): | ||||
|     await do_something() | ||||
| ``` | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td width="50%"> | ||||
| 
 | ||||
| **OK**: | ||||
| ```python | ||||
| cache: Optional[ObservableDeferred[None]] = None | ||||
| 
 | ||||
| async def do_something_else( | ||||
|     to_resolve: Deferred[None] | ||||
| ) -> None: | ||||
|     await ... | ||||
|     logger.info("done!") | ||||
|     to_resolve.callback(None) | ||||
| 
 | ||||
| async def do_something() -> None: | ||||
|     if not cache: | ||||
|         to_resolve = Deferred() | ||||
|         cache = ObservableDeferred(to_resolve) | ||||
|         # `do_something_else` will get its own independent | ||||
|         # logging context. `request-1` will not count any | ||||
|         # metrics from `do_something_else`. | ||||
|         run_as_background_process( | ||||
|             "do_something_else", | ||||
|             do_something_else, | ||||
|             to_resolve, | ||||
|         ) | ||||
| 
 | ||||
|     await make_deferred_yieldable(cache.observe()) | ||||
| 
 | ||||
| with LoggingContext("request-1"): | ||||
|     await do_something() | ||||
| ``` | ||||
| </td> | ||||
| <td width="50%"> | ||||
| </td> | ||||
| </tr> | ||||
| </table> | ||||
		Loading…
	
		Reference in New Issue
	
	 Sean Quah
						Sean Quah