393 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
			
		
		
	
	
			393 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
# Cancellation
 | 
						|
Sometimes, requests take a long time to service and clients disconnect
 | 
						|
before Synapse produces a response. To avoid wasting resources, Synapse
 | 
						|
can cancel request processing for select endpoints marked with the
 | 
						|
`@cancellable` decorator.
 | 
						|
 | 
						|
Synapse makes use of Twisted's `Deferred.cancel()` feature to make
 | 
						|
cancellation work. The `@cancellable` decorator does nothing by itself
 | 
						|
and merely acts as a flag, signalling to developers and other code alike
 | 
						|
that a method can be cancelled.
 | 
						|
 | 
						|
## Enabling cancellation for an endpoint
 | 
						|
1. Check that the endpoint method, and any `async` functions in its call
 | 
						|
   tree handle cancellation correctly. See
 | 
						|
   [Handling cancellation correctly](#handling-cancellation-correctly)
 | 
						|
   for a list of things to look out for.
 | 
						|
2. Add the `@cancellable` decorator to the `on_GET/POST/PUT/DELETE`
 | 
						|
   method. It's not recommended to make non-`GET` methods cancellable,
 | 
						|
   since cancellation midway through some database updates is less
 | 
						|
   likely to be handled correctly.
 | 
						|
 | 
						|
## Mechanics
 | 
						|
There are two stages to cancellation: downward propagation of a
 | 
						|
`cancel()` call, followed by upwards propagation of a `CancelledError`
 | 
						|
out of a blocked `await`.
 | 
						|
Both Twisted and asyncio have a cancellation mechanism.
 | 
						|
 | 
						|
|               | Method              | Exception                               | Exception inherits from |
 | 
						|
|---------------|---------------------|-----------------------------------------|-------------------------|
 | 
						|
| Twisted       | `Deferred.cancel()` | `twisted.internet.defer.CancelledError` | `Exception` (!)         |
 | 
						|
| asyncio       | `Task.cancel()`     | `asyncio.CancelledError`                | `BaseException`         |
 | 
						|
 | 
						|
### Deferred.cancel()
 | 
						|
When Synapse starts handling a request, it runs the async method
 | 
						|
responsible for handling it using `defer.ensureDeferred`, which returns
 | 
						|
a `Deferred`. For example:
 | 
						|
 | 
						|
```python
 | 
						|
def do_something() -> Deferred[None]:
 | 
						|
    ...
 | 
						|
 | 
						|
@cancellable
 | 
						|
async def on_GET() -> Tuple[int, JsonDict]:
 | 
						|
    d = make_deferred_yieldable(do_something())
 | 
						|
    await d
 | 
						|
    return 200, {}
 | 
						|
 | 
						|
request = defer.ensureDeferred(on_GET())
 | 
						|
```
 | 
						|
 | 
						|
When a client disconnects early, Synapse checks for the presence of the
 | 
						|
`@cancellable` decorator on `on_GET`. Since `on_GET` is cancellable,
 | 
						|
`Deferred.cancel()` is called on the `Deferred` from
 | 
						|
`defer.ensureDeferred`, ie. `request`. Twisted knows which `Deferred`
 | 
						|
`request` is waiting on and passes the `cancel()` call on to `d`.
 | 
						|
 | 
						|
The `Deferred` being waited on, `d`, may have its own handling for
 | 
						|
`cancel()` and pass the call on to other `Deferred`s.
 | 
						|
 | 
						|
Eventually, a `Deferred` handles the `cancel()` call by resolving itself
 | 
						|
with a `CancelledError`.
 | 
						|
 | 
						|
### CancelledError
 | 
						|
The `CancelledError` gets raised out of the `await` and bubbles up, as
 | 
						|
per normal Python exception handling.
 | 
						|
 | 
						|
## Handling cancellation correctly
 | 
						|
In general, when writing code that might be subject to cancellation, two
 | 
						|
things must be considered:
 | 
						|
 * The effect of `CancelledError`s raised out of `await`s.
 | 
						|
 * The effect of `Deferred`s being `cancel()`ed.
 | 
						|
 | 
						|
Examples of code that handles cancellation incorrectly include:
 | 
						|
 * `try-except` blocks which swallow `CancelledError`s.
 | 
						|
 * Code that shares the same `Deferred`, which may be cancelled, between
 | 
						|
   multiple requests.
 | 
						|
 * Code that starts some processing that's exempt from cancellation, but
 | 
						|
   uses a logging context from cancellable code. The logging context
 | 
						|
   will be finished upon cancellation, while the uncancelled processing
 | 
						|
   is still using it.
 | 
						|
 | 
						|
Some common patterns are listed below in more detail.
 | 
						|
 | 
						|
### `async` function calls
 | 
						|
Most functions in Synapse are relatively straightforward from a
 | 
						|
cancellation standpoint: they don't do anything with `Deferred`s and
 | 
						|
purely call and `await` other `async` functions.
 | 
						|
 | 
						|
An `async` function handles cancellation correctly if its own code
 | 
						|
handles cancellation correctly and all the async function it calls
 | 
						|
handle cancellation correctly. For example:
 | 
						|
```python
 | 
						|
async def do_two_things() -> None:
 | 
						|
    check_something()
 | 
						|
    await do_something()
 | 
						|
    await do_something_else()
 | 
						|
```
 | 
						|
`do_two_things` handles cancellation correctly if `do_something` and
 | 
						|
`do_something_else` handle cancellation correctly.
 | 
						|
 | 
						|
That is, when checking whether a function handles cancellation
 | 
						|
correctly, its implementation and all its `async` function calls need to
 | 
						|
be checked, recursively.
 | 
						|
 | 
						|
As `check_something` is not `async`, it does not need to be checked.
 | 
						|
 | 
						|
### CancelledErrors
 | 
						|
Because Twisted's `CancelledError`s are `Exception`s, it's easy to
 | 
						|
accidentally catch and suppress them. Care must be taken to ensure that
 | 
						|
`CancelledError`s are allowed to propagate upwards.
 | 
						|
 | 
						|
<table width="100%">
 | 
						|
<tr>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Bad**:
 | 
						|
```python
 | 
						|
try:
 | 
						|
    await do_something()
 | 
						|
except Exception:
 | 
						|
    # `CancelledError` gets swallowed here.
 | 
						|
    logger.info(...)
 | 
						|
```
 | 
						|
</td>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Good**:
 | 
						|
```python
 | 
						|
try:
 | 
						|
    await do_something()
 | 
						|
except CancelledError:
 | 
						|
    raise
 | 
						|
except Exception:
 | 
						|
    logger.info(...)
 | 
						|
```
 | 
						|
</td>
 | 
						|
</tr>
 | 
						|
<tr>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**OK**:
 | 
						|
```python
 | 
						|
try:
 | 
						|
    check_something()
 | 
						|
    # A `CancelledError` won't ever be raised here.
 | 
						|
except Exception:
 | 
						|
    logger.info(...)
 | 
						|
```
 | 
						|
</td>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Good**:
 | 
						|
```python
 | 
						|
try:
 | 
						|
    await do_something()
 | 
						|
except ValueError:
 | 
						|
    logger.info(...)
 | 
						|
```
 | 
						|
</td>
 | 
						|
</tr>
 | 
						|
</table>
 | 
						|
 | 
						|
#### defer.gatherResults
 | 
						|
`defer.gatherResults` produces a `Deferred` which:
 | 
						|
 * broadcasts `cancel()` calls to every `Deferred` being waited on.
 | 
						|
 * wraps the first exception it sees in a `FirstError`.
 | 
						|
 | 
						|
Together, this means that `CancelledError`s will be wrapped in
 | 
						|
a `FirstError` unless unwrapped. Such `FirstError`s are liable to be
 | 
						|
swallowed, so they must be unwrapped.
 | 
						|
 | 
						|
<table width="100%">
 | 
						|
<tr>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Bad**:
 | 
						|
```python
 | 
						|
async def do_something() -> None:
 | 
						|
    await make_deferred_yieldable(
 | 
						|
        defer.gatherResults([...], consumeErrors=True)
 | 
						|
    )
 | 
						|
 | 
						|
try:
 | 
						|
    await do_something()
 | 
						|
except CancelledError:
 | 
						|
    raise
 | 
						|
except Exception:
 | 
						|
    # `FirstError(CancelledError)` gets swallowed here.
 | 
						|
    logger.info(...)
 | 
						|
```
 | 
						|
 | 
						|
</td>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Good**:
 | 
						|
```python
 | 
						|
async def do_something() -> None:
 | 
						|
    await make_deferred_yieldable(
 | 
						|
        defer.gatherResults([...], consumeErrors=True)
 | 
						|
    ).addErrback(unwrapFirstError)
 | 
						|
 | 
						|
try:
 | 
						|
    await do_something()
 | 
						|
except CancelledError:
 | 
						|
    raise
 | 
						|
except Exception:
 | 
						|
    logger.info(...)
 | 
						|
```
 | 
						|
</td>
 | 
						|
</tr>
 | 
						|
</table>
 | 
						|
 | 
						|
### Creation of `Deferred`s
 | 
						|
If a function creates a `Deferred`, the effect of cancelling it must be considered. `Deferred`s that get shared are likely to have unintended behaviour when cancelled.
 | 
						|
 | 
						|
<table width="100%">
 | 
						|
<tr>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Bad**:
 | 
						|
```python
 | 
						|
cache: Dict[str, Deferred[None]] = {}
 | 
						|
 | 
						|
def wait_for_room(room_id: str) -> Deferred[None]:
 | 
						|
    deferred = cache.get(room_id)
 | 
						|
    if deferred is None:
 | 
						|
        deferred = Deferred()
 | 
						|
        cache[room_id] = deferred
 | 
						|
    # `deferred` can have multiple waiters.
 | 
						|
    # All of them will observe a `CancelledError`
 | 
						|
    # if any one of them is cancelled.
 | 
						|
    return make_deferred_yieldable(deferred)
 | 
						|
 | 
						|
# Request 1
 | 
						|
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
 | 
						|
# Request 2
 | 
						|
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
 | 
						|
```
 | 
						|
</td>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Good**:
 | 
						|
```python
 | 
						|
cache: Dict[str, Deferred[None]] = {}
 | 
						|
 | 
						|
def wait_for_room(room_id: str) -> Deferred[None]:
 | 
						|
    deferred = cache.get(room_id)
 | 
						|
    if deferred is None:
 | 
						|
        deferred = Deferred()
 | 
						|
        cache[room_id] = deferred
 | 
						|
    # `deferred` will never be cancelled now.
 | 
						|
    # A `CancelledError` will still come out of
 | 
						|
    # the `await`.
 | 
						|
    # `delay_cancellation` may also be used.
 | 
						|
    return make_deferred_yieldable(stop_cancellation(deferred))
 | 
						|
 | 
						|
# Request 1
 | 
						|
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
 | 
						|
# Request 2
 | 
						|
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
 | 
						|
```
 | 
						|
</td>
 | 
						|
</tr>
 | 
						|
<tr>
 | 
						|
<td width="50%" valign="top">
 | 
						|
</td>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Good**:
 | 
						|
```python
 | 
						|
cache: Dict[str, List[Deferred[None]]] = {}
 | 
						|
 | 
						|
def wait_for_room(room_id: str) -> Deferred[None]:
 | 
						|
    if room_id not in cache:
 | 
						|
        cache[room_id] = []
 | 
						|
    # Each request gets its own `Deferred` to wait on.
 | 
						|
    deferred = Deferred()
 | 
						|
    cache[room_id]].append(deferred)
 | 
						|
    return make_deferred_yieldable(deferred)
 | 
						|
 | 
						|
# Request 1
 | 
						|
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
 | 
						|
# Request 2
 | 
						|
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
 | 
						|
```
 | 
						|
</td>
 | 
						|
</table>
 | 
						|
 | 
						|
### Uncancelled processing
 | 
						|
Some `async` functions may kick off some `async` processing which is
 | 
						|
intentionally protected from cancellation, by `stop_cancellation` or
 | 
						|
other means. If the `async` processing inherits the logcontext of the
 | 
						|
request which initiated it, care must be taken to ensure that the
 | 
						|
logcontext is not finished before the `async` processing completes.
 | 
						|
 | 
						|
<table width="100%">
 | 
						|
<tr>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Bad**:
 | 
						|
```python
 | 
						|
cache: Optional[ObservableDeferred[None]] = None
 | 
						|
 | 
						|
async def do_something_else(
 | 
						|
    to_resolve: Deferred[None]
 | 
						|
) -> None:
 | 
						|
    await ...
 | 
						|
    logger.info("done!")
 | 
						|
    to_resolve.callback(None)
 | 
						|
 | 
						|
async def do_something() -> None:
 | 
						|
    if not cache:
 | 
						|
        to_resolve = Deferred()
 | 
						|
        cache = ObservableDeferred(to_resolve)
 | 
						|
        # `do_something_else` will never be cancelled and
 | 
						|
        # can outlive the `request-1` logging context.
 | 
						|
        run_in_background(do_something_else, to_resolve)
 | 
						|
 | 
						|
    await make_deferred_yieldable(cache.observe())
 | 
						|
 | 
						|
with LoggingContext("request-1"):
 | 
						|
    await do_something()
 | 
						|
```
 | 
						|
</td>
 | 
						|
<td width="50%" valign="top">
 | 
						|
 | 
						|
**Good**:
 | 
						|
```python
 | 
						|
cache: Optional[ObservableDeferred[None]] = None
 | 
						|
 | 
						|
async def do_something_else(
 | 
						|
    to_resolve: Deferred[None]
 | 
						|
) -> None:
 | 
						|
    await ...
 | 
						|
    logger.info("done!")
 | 
						|
    to_resolve.callback(None)
 | 
						|
 | 
						|
async def do_something() -> None:
 | 
						|
    if not cache:
 | 
						|
        to_resolve = Deferred()
 | 
						|
        cache = ObservableDeferred(to_resolve)
 | 
						|
        run_in_background(do_something_else, to_resolve)
 | 
						|
        # We'll wait until `do_something_else` is
 | 
						|
        # done before raising a `CancelledError`.
 | 
						|
        await make_deferred_yieldable(
 | 
						|
            delay_cancellation(cache.observe())
 | 
						|
        )
 | 
						|
    else:
 | 
						|
        await make_deferred_yieldable(cache.observe())
 | 
						|
 | 
						|
with LoggingContext("request-1"):
 | 
						|
    await do_something()
 | 
						|
```
 | 
						|
</td>
 | 
						|
</tr>
 | 
						|
<tr>
 | 
						|
<td width="50%">
 | 
						|
 | 
						|
**OK**:
 | 
						|
```python
 | 
						|
cache: Optional[ObservableDeferred[None]] = None
 | 
						|
 | 
						|
async def do_something_else(
 | 
						|
    to_resolve: Deferred[None]
 | 
						|
) -> None:
 | 
						|
    await ...
 | 
						|
    logger.info("done!")
 | 
						|
    to_resolve.callback(None)
 | 
						|
 | 
						|
async def do_something() -> None:
 | 
						|
    if not cache:
 | 
						|
        to_resolve = Deferred()
 | 
						|
        cache = ObservableDeferred(to_resolve)
 | 
						|
        # `do_something_else` will get its own independent
 | 
						|
        # logging context. `request-1` will not count any
 | 
						|
        # metrics from `do_something_else`.
 | 
						|
        run_as_background_process(
 | 
						|
            "do_something_else",
 | 
						|
            do_something_else,
 | 
						|
            to_resolve,
 | 
						|
        )
 | 
						|
 | 
						|
    await make_deferred_yieldable(cache.observe())
 | 
						|
 | 
						|
with LoggingContext("request-1"):
 | 
						|
    await do_something()
 | 
						|
```
 | 
						|
</td>
 | 
						|
<td width="50%">
 | 
						|
</td>
 | 
						|
</tr>
 | 
						|
</table>
 |