Add documentation for cancellation of request processing (#12761)
Signed-off-by: Sean Quah <seanq@matrix.org>pull/12790/head
parent
50ae4eafe1
commit
3d8839c30c
|
@ -0,0 +1 @@
|
|||
Add documentation for cancellation of request processing.
|
|
@ -89,6 +89,7 @@
|
|||
- [Database Schemas](development/database_schema.md)
|
||||
- [Experimental features](development/experimental_features.md)
|
||||
- [Synapse Architecture]()
|
||||
- [Cancellation](development/synapse_architecture/cancellation.md)
|
||||
- [Log Contexts](log_contexts.md)
|
||||
- [Replication](replication.md)
|
||||
- [TCP Replication](tcp_replication.md)
|
||||
|
|
|
@ -0,0 +1,392 @@
|
|||
# Cancellation
|
||||
Sometimes, requests take a long time to service and clients disconnect
|
||||
before Synapse produces a response. To avoid wasting resources, Synapse
|
||||
can cancel request processing for select endpoints marked with the
|
||||
`@cancellable` decorator.
|
||||
|
||||
Synapse makes use of Twisted's `Deferred.cancel()` feature to make
|
||||
cancellation work. The `@cancellable` decorator does nothing by itself
|
||||
and merely acts as a flag, signalling to developers and other code alike
|
||||
that a method can be cancelled.
|
||||
|
||||
## Enabling cancellation for an endpoint
|
||||
1. Check that the endpoint method, and any `async` functions in its call
|
||||
tree handle cancellation correctly. See
|
||||
[Handling cancellation correctly](#handling-cancellation-correctly)
|
||||
for a list of things to look out for.
|
||||
2. Add the `@cancellable` decorator to the `on_GET/POST/PUT/DELETE`
|
||||
method. It's not recommended to make non-`GET` methods cancellable,
|
||||
since cancellation midway through some database updates is less
|
||||
likely to be handled correctly.
|
||||
|
||||
## Mechanics
|
||||
There are two stages to cancellation: downward propagation of a
|
||||
`cancel()` call, followed by upwards propagation of a `CancelledError`
|
||||
out of a blocked `await`.
|
||||
Both Twisted and asyncio have a cancellation mechanism.
|
||||
|
||||
| | Method | Exception | Exception inherits from |
|
||||
|---------------|---------------------|-----------------------------------------|-------------------------|
|
||||
| Twisted | `Deferred.cancel()` | `twisted.internet.defer.CancelledError` | `Exception` (!) |
|
||||
| asyncio | `Task.cancel()` | `asyncio.CancelledError` | `BaseException` |
|
||||
|
||||
### Deferred.cancel()
|
||||
When Synapse starts handling a request, it runs the async method
|
||||
responsible for handling it using `defer.ensureDeferred`, which returns
|
||||
a `Deferred`. For example:
|
||||
|
||||
```python
|
||||
def do_something() -> Deferred[None]:
|
||||
...
|
||||
|
||||
@cancellable
|
||||
async def on_GET() -> Tuple[int, JsonDict]:
|
||||
d = make_deferred_yieldable(do_something())
|
||||
await d
|
||||
return 200, {}
|
||||
|
||||
request = defer.ensureDeferred(on_GET())
|
||||
```
|
||||
|
||||
When a client disconnects early, Synapse checks for the presence of the
|
||||
`@cancellable` decorator on `on_GET`. Since `on_GET` is cancellable,
|
||||
`Deferred.cancel()` is called on the `Deferred` from
|
||||
`defer.ensureDeferred`, ie. `request`. Twisted knows which `Deferred`
|
||||
`request` is waiting on and passes the `cancel()` call on to `d`.
|
||||
|
||||
The `Deferred` being waited on, `d`, may have its own handling for
|
||||
`cancel()` and pass the call on to other `Deferred`s.
|
||||
|
||||
Eventually, a `Deferred` handles the `cancel()` call by resolving itself
|
||||
with a `CancelledError`.
|
||||
|
||||
### CancelledError
|
||||
The `CancelledError` gets raised out of the `await` and bubbles up, as
|
||||
per normal Python exception handling.
|
||||
|
||||
## Handling cancellation correctly
|
||||
In general, when writing code that might be subject to cancellation, two
|
||||
things must be considered:
|
||||
* The effect of `CancelledError`s raised out of `await`s.
|
||||
* The effect of `Deferred`s being `cancel()`ed.
|
||||
|
||||
Examples of code that handles cancellation incorrectly include:
|
||||
* `try-except` blocks which swallow `CancelledError`s.
|
||||
* Code that shares the same `Deferred`, which may be cancelled, between
|
||||
multiple requests.
|
||||
* Code that starts some processing that's exempt from cancellation, but
|
||||
uses a logging context from cancellable code. The logging context
|
||||
will be finished upon cancellation, while the uncancelled processing
|
||||
is still using it.
|
||||
|
||||
Some common patterns are listed below in more detail.
|
||||
|
||||
### `async` function calls
|
||||
Most functions in Synapse are relatively straightforward from a
|
||||
cancellation standpoint: they don't do anything with `Deferred`s and
|
||||
purely call and `await` other `async` functions.
|
||||
|
||||
An `async` function handles cancellation correctly if its own code
|
||||
handles cancellation correctly and all the async function it calls
|
||||
handle cancellation correctly. For example:
|
||||
```python
|
||||
async def do_two_things() -> None:
|
||||
check_something()
|
||||
await do_something()
|
||||
await do_something_else()
|
||||
```
|
||||
`do_two_things` handles cancellation correctly if `do_something` and
|
||||
`do_something_else` handle cancellation correctly.
|
||||
|
||||
That is, when checking whether a function handles cancellation
|
||||
correctly, its implementation and all its `async` function calls need to
|
||||
be checked, recursively.
|
||||
|
||||
As `check_something` is not `async`, it does not need to be checked.
|
||||
|
||||
### CancelledErrors
|
||||
Because Twisted's `CancelledError`s are `Exception`s, it's easy to
|
||||
accidentally catch and suppress them. Care must be taken to ensure that
|
||||
`CancelledError`s are allowed to propagate upwards.
|
||||
|
||||
<table width="100%">
|
||||
<tr>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Bad**:
|
||||
```python
|
||||
try:
|
||||
await do_something()
|
||||
except Exception:
|
||||
# `CancelledError` gets swallowed here.
|
||||
logger.info(...)
|
||||
```
|
||||
</td>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Good**:
|
||||
```python
|
||||
try:
|
||||
await do_something()
|
||||
except CancelledError:
|
||||
raise
|
||||
except Exception:
|
||||
logger.info(...)
|
||||
```
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**OK**:
|
||||
```python
|
||||
try:
|
||||
check_something()
|
||||
# A `CancelledError` won't ever be raised here.
|
||||
except Exception:
|
||||
logger.info(...)
|
||||
```
|
||||
</td>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Good**:
|
||||
```python
|
||||
try:
|
||||
await do_something()
|
||||
except ValueError:
|
||||
logger.info(...)
|
||||
```
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
#### defer.gatherResults
|
||||
`defer.gatherResults` produces a `Deferred` which:
|
||||
* broadcasts `cancel()` calls to every `Deferred` being waited on.
|
||||
* wraps the first exception it sees in a `FirstError`.
|
||||
|
||||
Together, this means that `CancelledError`s will be wrapped in
|
||||
a `FirstError` unless unwrapped. Such `FirstError`s are liable to be
|
||||
swallowed, so they must be unwrapped.
|
||||
|
||||
<table width="100%">
|
||||
<tr>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Bad**:
|
||||
```python
|
||||
async def do_something() -> None:
|
||||
await make_deferred_yieldable(
|
||||
defer.gatherResults([...], consumeErrors=True)
|
||||
)
|
||||
|
||||
try:
|
||||
await do_something()
|
||||
except CancelledError:
|
||||
raise
|
||||
except Exception:
|
||||
# `FirstError(CancelledError)` gets swallowed here.
|
||||
logger.info(...)
|
||||
```
|
||||
|
||||
</td>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Good**:
|
||||
```python
|
||||
async def do_something() -> None:
|
||||
await make_deferred_yieldable(
|
||||
defer.gatherResults([...], consumeErrors=True)
|
||||
).addErrback(unwrapFirstError)
|
||||
|
||||
try:
|
||||
await do_something()
|
||||
except CancelledError:
|
||||
raise
|
||||
except Exception:
|
||||
logger.info(...)
|
||||
```
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
### Creation of `Deferred`s
|
||||
If a function creates a `Deferred`, the effect of cancelling it must be considered. `Deferred`s that get shared are likely to have unintended behaviour when cancelled.
|
||||
|
||||
<table width="100%">
|
||||
<tr>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Bad**:
|
||||
```python
|
||||
cache: Dict[str, Deferred[None]] = {}
|
||||
|
||||
def wait_for_room(room_id: str) -> Deferred[None]:
|
||||
deferred = cache.get(room_id)
|
||||
if deferred is None:
|
||||
deferred = Deferred()
|
||||
cache[room_id] = deferred
|
||||
# `deferred` can have multiple waiters.
|
||||
# All of them will observe a `CancelledError`
|
||||
# if any one of them is cancelled.
|
||||
return make_deferred_yieldable(deferred)
|
||||
|
||||
# Request 1
|
||||
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
|
||||
# Request 2
|
||||
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
|
||||
```
|
||||
</td>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Good**:
|
||||
```python
|
||||
cache: Dict[str, Deferred[None]] = {}
|
||||
|
||||
def wait_for_room(room_id: str) -> Deferred[None]:
|
||||
deferred = cache.get(room_id)
|
||||
if deferred is None:
|
||||
deferred = Deferred()
|
||||
cache[room_id] = deferred
|
||||
# `deferred` will never be cancelled now.
|
||||
# A `CancelledError` will still come out of
|
||||
# the `await`.
|
||||
# `delay_cancellation` may also be used.
|
||||
return make_deferred_yieldable(stop_cancellation(deferred))
|
||||
|
||||
# Request 1
|
||||
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
|
||||
# Request 2
|
||||
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
|
||||
```
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="50%" valign="top">
|
||||
</td>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Good**:
|
||||
```python
|
||||
cache: Dict[str, List[Deferred[None]]] = {}
|
||||
|
||||
def wait_for_room(room_id: str) -> Deferred[None]:
|
||||
if room_id not in cache:
|
||||
cache[room_id] = []
|
||||
# Each request gets its own `Deferred` to wait on.
|
||||
deferred = Deferred()
|
||||
cache[room_id]].append(deferred)
|
||||
return make_deferred_yieldable(deferred)
|
||||
|
||||
# Request 1
|
||||
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
|
||||
# Request 2
|
||||
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
|
||||
```
|
||||
</td>
|
||||
</table>
|
||||
|
||||
### Uncancelled processing
|
||||
Some `async` functions may kick off some `async` processing which is
|
||||
intentionally protected from cancellation, by `stop_cancellation` or
|
||||
other means. If the `async` processing inherits the logcontext of the
|
||||
request which initiated it, care must be taken to ensure that the
|
||||
logcontext is not finished before the `async` processing completes.
|
||||
|
||||
<table width="100%">
|
||||
<tr>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Bad**:
|
||||
```python
|
||||
cache: Optional[ObservableDeferred[None]] = None
|
||||
|
||||
async def do_something_else(
|
||||
to_resolve: Deferred[None]
|
||||
) -> None:
|
||||
await ...
|
||||
logger.info("done!")
|
||||
to_resolve.callback(None)
|
||||
|
||||
async def do_something() -> None:
|
||||
if not cache:
|
||||
to_resolve = Deferred()
|
||||
cache = ObservableDeferred(to_resolve)
|
||||
# `do_something_else` will never be cancelled and
|
||||
# can outlive the `request-1` logging context.
|
||||
run_in_background(do_something_else, to_resolve)
|
||||
|
||||
await make_deferred_yieldable(cache.observe())
|
||||
|
||||
with LoggingContext("request-1"):
|
||||
await do_something()
|
||||
```
|
||||
</td>
|
||||
<td width="50%" valign="top">
|
||||
|
||||
**Good**:
|
||||
```python
|
||||
cache: Optional[ObservableDeferred[None]] = None
|
||||
|
||||
async def do_something_else(
|
||||
to_resolve: Deferred[None]
|
||||
) -> None:
|
||||
await ...
|
||||
logger.info("done!")
|
||||
to_resolve.callback(None)
|
||||
|
||||
async def do_something() -> None:
|
||||
if not cache:
|
||||
to_resolve = Deferred()
|
||||
cache = ObservableDeferred(to_resolve)
|
||||
run_in_background(do_something_else, to_resolve)
|
||||
# We'll wait until `do_something_else` is
|
||||
# done before raising a `CancelledError`.
|
||||
await make_deferred_yieldable(
|
||||
delay_cancellation(cache.observe())
|
||||
)
|
||||
else:
|
||||
await make_deferred_yieldable(cache.observe())
|
||||
|
||||
with LoggingContext("request-1"):
|
||||
await do_something()
|
||||
```
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="50%">
|
||||
|
||||
**OK**:
|
||||
```python
|
||||
cache: Optional[ObservableDeferred[None]] = None
|
||||
|
||||
async def do_something_else(
|
||||
to_resolve: Deferred[None]
|
||||
) -> None:
|
||||
await ...
|
||||
logger.info("done!")
|
||||
to_resolve.callback(None)
|
||||
|
||||
async def do_something() -> None:
|
||||
if not cache:
|
||||
to_resolve = Deferred()
|
||||
cache = ObservableDeferred(to_resolve)
|
||||
# `do_something_else` will get its own independent
|
||||
# logging context. `request-1` will not count any
|
||||
# metrics from `do_something_else`.
|
||||
run_as_background_process(
|
||||
"do_something_else",
|
||||
do_something_else,
|
||||
to_resolve,
|
||||
)
|
||||
|
||||
await make_deferred_yieldable(cache.observe())
|
||||
|
||||
with LoggingContext("request-1"):
|
||||
await do_something()
|
||||
```
|
||||
</td>
|
||||
<td width="50%">
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
Loading…
Reference in New Issue