Commit Graph

460 Commits (9890f23469092be88f5669e226e9f81d2d309cb2)

Author SHA1 Message Date
Patrick Cloke a5fb382a29
Separate HTTP preview code and URL previewer. (#15269)
Separates REST layer code from the actual URL previewing.
2023-03-20 14:32:26 -04:00
Patrick Cloke 4fc8875876
Refactor media modules. (#15146)
* Removes the `v1` directory from `test.rest.media.v1`.
* Moves the non-REST code from `synapse.rest.media.v1` to `synapse.media`.
* Flatten the `v1` directory from `synapse.rest.media`,  but leave compatiblity
  with 3rd party media repositories and spam checkers.
2023-02-27 08:26:05 -05:00
Patrick Cloke 682151a464
Do not fail completely if oEmbed autodiscovery fails. (#15092)
Previously if an autodiscovered oEmbed request failed (e.g. the
oEmbed endpoint is down or does not exist) then the entire URL
preview would fail. Instead we now return everything we can, even
if this additional request fails.
2023-02-23 16:08:53 -05:00
Patrick Cloke f8a584ed02
Stop parsing the unspecced type parameter on thumbnail requests. (#15137)
Ideally we would replace this with parsing of the Accept header
or something else, but for now just make Synapse spec compliant
by ignoring the unspecced parameter.

It does not seem that this is ever sent by a client, and even if it is
there's a reasonable fallback.
2023-02-23 16:07:46 -05:00
dependabot[bot] 9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
David Robertson ffc2ee521d
Use mypy 1.0 (#15052)
* Update mypy and mypy-zope
* Remove unused ignores

These used to suppress

```
synapse/storage/engines/__init__.py:28: error: "__new__" must return a
class instance (got "NoReturn")  [misc]
```

and

```
synapse/http/matrixfederationclient.py:1270: error: "BaseException" has no attribute "reasons"  [attr-defined]
```

(note that we check `hasattr(e, "reasons")` above)

* Avoid empty body warnings, sometimes by marking methods as abstract

E.g.

```
tests/handlers/test_register.py:58: error: Missing return statement  [empty-body]
tests/handlers/test_register.py:108: error: Missing return statement  [empty-body]
```

* Suppress false positive about `JaegerConfig`

Complaint was

```
synapse/logging/opentracing.py:450: error: Function "Type[Config]" could always be true in boolean context  [truthy-function]
```

* Fix not calling `is_state()`

Oops!

```
tests/rest/client/test_third_party_rules.py:428: error: Function "Callable[[], bool]" could always be true in boolean context  [truthy-function]
```

* Suppress false positives from ParamSpecs

````
synapse/logging/opentracing.py:971: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]"  [arg-type]
synapse/logging/opentracing.py:1017: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]"  [arg-type]
````

* Drive-by improvement to `wrapping_logic` annotation

* Workaround false "unreachable" positives

See https://github.com/Shoobx/mypy-zope/issues/91

```
tests/http/test_proxyagent.py:626: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:762: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:826: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:838: error: Statement is unreachable  [unreachable]
tests/http/test_proxyagent.py:845: error: Statement is unreachable  [unreachable]
tests/http/federation/test_matrix_federation_agent.py:151: error: Statement is unreachable  [unreachable]
tests/http/federation/test_matrix_federation_agent.py:452: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:60: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:93: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:127: error: Statement is unreachable  [unreachable]
tests/logging/test_remote_handler.py:152: error: Statement is unreachable  [unreachable]
```

* Changelog

* Tweak DBAPI2 Protocol to be accepted by mypy 1.0

Some extra context in:
- https://github.com/matrix-org/python-canonicaljson/pull/57
- https://github.com/python/mypy/issues/6002
- https://mypy.readthedocs.io/en/latest/common_issues.html#covariant-subtyping-of-mutable-protocol-members-is-rejected

* Pull in updated canonicaljson lib

so the protocol check just works

* Improve comments in opentracing

I tried to workaround the ignores but found it too much trouble.

I think the corresponding issue is
https://github.com/python/mypy/issues/12909. The mypy repo has a PR
claiming to fix this (https://github.com/python/mypy/pull/14677) which
might mean this gets resolved soon?

* Better annotation for INTERACTIVE_AUTH_CHECKERS

* Drive-by AUTH_TYPE annotation, to remove an ignore
2023-02-16 16:09:11 +00:00
David Robertson 2dff93099b
Typecheck tests.rest.media.v1.test_media_storage (#15008)
* Fix MediaStorage type hint

* Typecheck tests.rest.media.v1.test_media_storage

* Changelog

* Remove assert and make the comment succinct

* Fix syntax for olddeps
2023-02-07 15:24:44 +00:00
David Robertson 796a4b7482
Prefer `type(x) is int` to `isinstance(x, int)` (#14945)
* Perfer `type(x) is int` to `isinstance(x, int)`

This covered all additional instances I could see where `x` was
user-controlled.
The remaining cases are

```
$ rg -s 'isinstance.*[^_]int'
tests/replication/_base.py
576:        if isinstance(obj, int):

synapse/util/caches/stream_change_cache.py
136:        assert isinstance(stream_pos, int)
214:        assert isinstance(stream_pos, int)
246:        assert isinstance(stream_pos, int)
267:        assert isinstance(stream_pos, int)

synapse/replication/tcp/external_cache.py
133:        if isinstance(result, int):

synapse/metrics/__init__.py
100:        if isinstance(calls, (int, float)):

synapse/handlers/appservice.py
262:        assert isinstance(new_token, int)

synapse/config/_util.py
62:        if isinstance(p, int):
```

which cover metrics, logic related to `jsonschema`, and replication and
data streams. AFAICS these are all internal to Synapse

* Changelog
2023-01-31 10:33:07 +00:00
Jeyachandran Rathnam babeeb4e7a
Unescape HTML entities in oEmbed titles. (#14781)
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.

There should not be harm in unescaping these.
2023-01-09 14:22:02 +00:00
Patrick Cloke 9d8a3234ba
Respond with proper error responses on unknown paths. (#14621)
Returns a proper 404 with an errcode of M_RECOGNIZED for
unknown endpoints per MSC3743.
2022-12-08 11:37:05 -05:00
Patrick Cloke d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Patrick Cloke 00c93d2e7e
Be more lenient in the oEmbed response parsing. (#14089)
Attempt to parse any valid information from an oEmbed response
(instead of bailing at the first unexpected data). This should allow
for more partial oEmbed data to be returned, resulting in better /
more URL previews, even if those URL previews are only partial.
2022-10-07 09:29:43 -04:00
Andrew Morgan 918c74bfb5
Add a `MXCUri` class to make working with mxc uri's easier. (#13162) 2022-09-15 12:57:16 +00:00
Jacek Kuśnierz 84ddcd7bbf
Drop support for calling `/_matrix/client/v3/rooms/{roomId}/invite` without an `id_access_token` (#13241)
Fixes #13206

Signed-off-by: Jacek Kusnierz jacek.kusnierz@tum.de
2022-08-31 12:10:25 +00:00
Erik Johnston 1c26acd815
Fix bug where we wedge media plugins if clients disconnect early (#13660)
We incorrectly didn't use the returned `Responder` if the client had
disconnected, which meant that the resource used by the Responder
wasn't correctly released.

In particular, this exhausted the thread pools so that *all* requests
timed out.
2022-08-30 12:17:48 +01:00
Patrick Cloke 303b40b988
Do not wait for background updates to complete do expire URL cache. (#13657)
Media downloaded as part of a URL preview is normally deleted after two days.
However, while a background database migration is running, the process is
stopped. A long-running database migration can therefore cause the media
store to fill up with old preview files.

This logic was added in #2697 to make sure that we didn't try to run the expiry
without an index on `local_media_repository.created_ts`; the original logic that
needs that index was added in #2478 (in `get_url_cache_media_before`, as
amended by 93247a424a), and is still present.

Given that the background update was added before Synapse v1.0.0, just drop
this check and assume the index exists.
2022-08-30 07:15:54 -04:00
Eric Eastwood 7b67e93d49
Provide more info why we don't have any thumbnails to serve (#13038)
Fix https://github.com/matrix-org/synapse/issues/13016

## New error code and status

### Before

Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.

```json
{
  "errcode": "M_NOT_FOUND",
  "error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```

### After

What does the spec say?

> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*

Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.

```json
{
    "errcode": "M_UNKNOWN",
    "error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```

> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)


---

We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122

We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
2022-07-15 11:42:21 -05:00
Patrick Cloke 1381563988
Inline URL preview documentation. (#13261)
Inline URL preview documentation near the implementation.
2022-07-12 15:01:58 -04:00
David Teller 11f811470f
Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return `Tuple[Codes, dict]` (#13044)
Signed-off-by: David Teller <davidt@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-07-11 16:52:10 +00:00
Andrew Morgan bc9b0912cc fix linting error from the 1.61.1 main -> develop merge 2022-06-28 16:47:04 +01:00
Andrew Morgan 6cba6a51af Merge branch 'master' into develop 2022-06-28 15:19:48 +01:00
reivilibre fa13080618
Merge pull request from GHSA-22p3-qrh9-cx32
* Make _iterate_over_text easier to read by using simple data structures

* Prefer a set of tags to ignore

In my tests, it's 4x faster to check for containment in a set of this size

* Add a stack size limit to _iterate_over_text

* Continue accepting the case where there is no body element

* Use an early return instead for None

Co-authored-by: Richard van der Hoff <richard@matrix.org>
2022-06-28 14:29:08 +01:00
Robert Long 9b683ea80f
Add Cross-Origin-Resource-Policy header to thumbnail and download media endpoints (#12944) 2022-06-27 14:44:05 +01:00
Patrick Cloke 0fcc0ae37c
Improve URL previews for sites with only Twitter card information. (#13056)
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.

Prefers Open Graph information over Twitter card information.
2022-06-16 07:41:57 -04:00
David Teller a164a46038
Uniformize spam-checker API, part 4: port other spam-checker callbacks to return `Union[Allow, Codes]`. (#12857)
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-06-13 18:16:16 +00:00
Andrew Morgan a47636c570
Prevent local quarantined media from being claimed by media retention (#12972) 2022-06-07 10:53:47 +00:00
Patrick Cloke 148fe58a24
Do not break URL previews if an image is unreachable. (#12950)
Avoid breaking a URL preview completely if the chosen image 404s
or is unreachable for some other reason (e.g. DNS).
2022-06-06 07:46:04 -04:00
Patrick Cloke 01df5bacac
Improve URL previews for some pages (#12951)
* Skip `og` and `meta` tags where the value is empty.
* Fallback to the favicon if there are no other images.
* Ignore tags meant for navigation.
2022-06-03 12:09:12 -04:00
Erik Johnston 5949ab86f8
Fix potential thumbnail memory leaks. (#12932) 2022-06-01 10:57:49 +00:00
Andrew Morgan 2fc787c341
Add config options for media retention (#12732) 2022-05-31 16:35:29 +00:00
Sean Quah 053ca5f3ca Synapse 1.60.0rc2 (2022-05-27)
==============================
 
 This release of Synapse adds a unique index to the `state_group_edges` table, in
 order to prevent accidentally introducing duplicate information (for example,
 because a database backup was restored multiple times). If your Synapse database
 already has duplicate rows in this table, this could fail with an error and
 require manual remediation.
 
 Additionally, the signature of the `check_event_for_spam` module callback has changed.
 The previous signature has been deprecated and remains working for now. Module authors
 should update their modules to use the new signature where possible.
 
 See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
 for more details.
 
 Features
 --------
 
 - Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))
 
 Bugfixes
 --------
 
 - Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))
 
 Internal Changes
 ----------------
 
 - Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmKQqMcACgkQ+84Kzgcy
 GG9Z2Av+N+b/fvaB3D56UkFqTW/xLmCEyri65njcXU8625bWiLSPM6hssmyJB1FA
 xlc2RBKr8QxlnHRS/v31wDtONC8YZ2O3fyzYPFfY1fF5Ul7Kg3XCzLeUH4/j1/Ar
 5bqriDqaN9FQ/6QJybShXlA4l7lY1Fs0C4P23jDBgqfKjnlToeVLqhVA70dDaFu/
 ir+vVprKCkQI1iqnYXwIxGRmgBzLWGoVqQFGbSI6hugGwXpGIyX7+2I+0v8tI6vA
 SZ99vLFWcvnd6DJTyBhIeV22Ff4qA7eQsyPvSrMETdsaZmrxGlG+t332HNCgplv8
 gv2gUpJL0br++5WTAX+nRc7HpfKo/74vKeTktqPmlvFP8kUOg+PbzmoJFUu21PhA
 rnq5TzgsPHK0dqBhM1RC2vtOiJ5v3ZBqzJJzSRXl6lsFpWxxFmwesEcIDAYS0Nmh
 QoJb7/L8cPCHksHvZM76bzB465tSH9NhuFYZQoLGHcpxa0kYekrdlYasP8U0FU7L
 nF3C0Pgw
 =D3F+
 -----END PGP SIGNATURE-----

Merge tag 'v1.60.0rc2' into develop

Synapse 1.60.0rc2 (2022-05-27)
==============================

This release of Synapse adds a unique index to the `state_group_edges` table, in
order to prevent accidentally introducing duplicate information (for example,
because a database backup was restored multiple times). If your Synapse database
already has duplicate rows in this table, this could fail with an error and
require manual remediation.

Additionally, the signature of the `check_event_for_spam` module callback has changed.
The previous signature has been deprecated and remains working for now. Module authors
should update their modules to use the new signature where possible.

See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
for more details.

Features
--------

- Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))

Bugfixes
--------

- Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))

Internal Changes
----------------

- Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
2022-05-27 12:07:18 +01:00
reivilibre 317248d42c
Improve URL previews by not including the content of media tags in the generated description. (#12887) 2022-05-26 16:07:27 +01:00
David Robertson e7c77a8750
Correct annotation of `_iterate_over_text` (#12860) 2022-05-24 18:17:21 +00:00
Andrew Morgan 57f6c496d0
URL preview cache expiry logs: INFO -> DEBUG, text clarifications (#12720) 2022-05-12 18:16:32 +01:00
David Robertson 8de0facaae
Fix mypy against latest pillow stubs (#12671) 2022-05-09 10:48:14 +00:00
Sean Quah 800ba87cc8
Refactor and convert `Linearizer` to async (#12357)
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.

Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 15:43:52 +01:00
Brendan Abolivier f96b85eca8
Ensure the type of URL attributes is always str when matching against preview blacklist (#12333) 2022-03-31 11:49:49 +02:00
David Robertson a2b00a4486
Bump `black` and `click` versions (#12320) 2022-03-29 10:41:19 +00:00
Patrick Cloke 4587b35929
Clean-up logic for rebasing URLs during URL preview. (#12219)
By using urljoin from the standard library and reducing the number
of places URLs are rebased.
2022-03-16 07:21:36 -04:00
Patrick Cloke bc9dff1d95
Remove unnecessary pass statements. (#12206) 2022-03-11 07:06:21 -05:00
Sean Quah 5627182788
Use `ParamSpec` in type hints for `synapse.logging.context` (#12150)
Signed-off-by: Sean Quah <seanq@element.io>
2022-03-08 15:58:14 +00:00
Richard van der Hoff e24ff8ebe3
Remove `HomeServer.get_datastore()` (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
AndrewRyanChama 066171643b
Fetch images when previewing Twitter URLs. (#11985)
By including "bot" in the User-Agent, which some sites use
to decide whether to include additional Open Graph information.
2022-02-22 07:11:39 -05:00
Denis Kasak 337f38cac3
Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Patrick Cloke 314ca4c86d
Pass the proper type when uploading files. (#11927)
The Content-Length header should be treated as an int, not
a string. This shouldn't have any user-facing change.
2022-02-07 10:06:52 -05:00
Patrick Cloke 807efd26ae
Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Philippe Daouadi 15ffc4143c
Fix preview of imgur and Tenor URLs. (#11669)
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
2022-01-18 13:20:24 -05:00
Patrick Cloke 10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Patrick Cloke cbd82d0b2d
Convert all namedtuples to attrs. (#11665)
To improve type hints throughout the code.
2021-12-30 18:47:12 +00:00
Patrick Cloke eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00