Commit Graph

444 Commits (5526f9fc4f53c9e0eac6aa610bf0a76906b772dc)

Author SHA1 Message Date
Eric Eastwood 7b67e93d49
Provide more info why we don't have any thumbnails to serve (#13038)
Fix https://github.com/matrix-org/synapse/issues/13016

## New error code and status

### Before

Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.

```json
{
  "errcode": "M_NOT_FOUND",
  "error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```

### After

What does the spec say?

> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*

Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.

```json
{
    "errcode": "M_UNKNOWN",
    "error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```

> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)


---

We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122

We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
2022-07-15 11:42:21 -05:00
Patrick Cloke 1381563988
Inline URL preview documentation. (#13261)
Inline URL preview documentation near the implementation.
2022-07-12 15:01:58 -04:00
David Teller 11f811470f
Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return `Tuple[Codes, dict]` (#13044)
Signed-off-by: David Teller <davidt@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-07-11 16:52:10 +00:00
Andrew Morgan bc9b0912cc fix linting error from the 1.61.1 main -> develop merge 2022-06-28 16:47:04 +01:00
Andrew Morgan 6cba6a51af Merge branch 'master' into develop 2022-06-28 15:19:48 +01:00
reivilibre fa13080618
Merge pull request from GHSA-22p3-qrh9-cx32
* Make _iterate_over_text easier to read by using simple data structures

* Prefer a set of tags to ignore

In my tests, it's 4x faster to check for containment in a set of this size

* Add a stack size limit to _iterate_over_text

* Continue accepting the case where there is no body element

* Use an early return instead for None

Co-authored-by: Richard van der Hoff <richard@matrix.org>
2022-06-28 14:29:08 +01:00
Robert Long 9b683ea80f
Add Cross-Origin-Resource-Policy header to thumbnail and download media endpoints (#12944) 2022-06-27 14:44:05 +01:00
Patrick Cloke 0fcc0ae37c
Improve URL previews for sites with only Twitter card information. (#13056)
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.

Prefers Open Graph information over Twitter card information.
2022-06-16 07:41:57 -04:00
David Teller a164a46038
Uniformize spam-checker API, part 4: port other spam-checker callbacks to return `Union[Allow, Codes]`. (#12857)
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-06-13 18:16:16 +00:00
Andrew Morgan a47636c570
Prevent local quarantined media from being claimed by media retention (#12972) 2022-06-07 10:53:47 +00:00
Patrick Cloke 148fe58a24
Do not break URL previews if an image is unreachable. (#12950)
Avoid breaking a URL preview completely if the chosen image 404s
or is unreachable for some other reason (e.g. DNS).
2022-06-06 07:46:04 -04:00
Patrick Cloke 01df5bacac
Improve URL previews for some pages (#12951)
* Skip `og` and `meta` tags where the value is empty.
* Fallback to the favicon if there are no other images.
* Ignore tags meant for navigation.
2022-06-03 12:09:12 -04:00
Erik Johnston 5949ab86f8
Fix potential thumbnail memory leaks. (#12932) 2022-06-01 10:57:49 +00:00
Andrew Morgan 2fc787c341
Add config options for media retention (#12732) 2022-05-31 16:35:29 +00:00
Sean Quah 053ca5f3ca Synapse 1.60.0rc2 (2022-05-27)
==============================
 
 This release of Synapse adds a unique index to the `state_group_edges` table, in
 order to prevent accidentally introducing duplicate information (for example,
 because a database backup was restored multiple times). If your Synapse database
 already has duplicate rows in this table, this could fail with an error and
 require manual remediation.
 
 Additionally, the signature of the `check_event_for_spam` module callback has changed.
 The previous signature has been deprecated and remains working for now. Module authors
 should update their modules to use the new signature where possible.
 
 See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
 for more details.
 
 Features
 --------
 
 - Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))
 
 Bugfixes
 --------
 
 - Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))
 
 Internal Changes
 ----------------
 
 - Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmKQqMcACgkQ+84Kzgcy
 GG9Z2Av+N+b/fvaB3D56UkFqTW/xLmCEyri65njcXU8625bWiLSPM6hssmyJB1FA
 xlc2RBKr8QxlnHRS/v31wDtONC8YZ2O3fyzYPFfY1fF5Ul7Kg3XCzLeUH4/j1/Ar
 5bqriDqaN9FQ/6QJybShXlA4l7lY1Fs0C4P23jDBgqfKjnlToeVLqhVA70dDaFu/
 ir+vVprKCkQI1iqnYXwIxGRmgBzLWGoVqQFGbSI6hugGwXpGIyX7+2I+0v8tI6vA
 SZ99vLFWcvnd6DJTyBhIeV22Ff4qA7eQsyPvSrMETdsaZmrxGlG+t332HNCgplv8
 gv2gUpJL0br++5WTAX+nRc7HpfKo/74vKeTktqPmlvFP8kUOg+PbzmoJFUu21PhA
 rnq5TzgsPHK0dqBhM1RC2vtOiJ5v3ZBqzJJzSRXl6lsFpWxxFmwesEcIDAYS0Nmh
 QoJb7/L8cPCHksHvZM76bzB465tSH9NhuFYZQoLGHcpxa0kYekrdlYasP8U0FU7L
 nF3C0Pgw
 =D3F+
 -----END PGP SIGNATURE-----

Merge tag 'v1.60.0rc2' into develop

Synapse 1.60.0rc2 (2022-05-27)
==============================

This release of Synapse adds a unique index to the `state_group_edges` table, in
order to prevent accidentally introducing duplicate information (for example,
because a database backup was restored multiple times). If your Synapse database
already has duplicate rows in this table, this could fail with an error and
require manual remediation.

Additionally, the signature of the `check_event_for_spam` module callback has changed.
The previous signature has been deprecated and remains working for now. Module authors
should update their modules to use the new signature where possible.

See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
for more details.

Features
--------

- Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))

Bugfixes
--------

- Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))

Internal Changes
----------------

- Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
2022-05-27 12:07:18 +01:00
reivilibre 317248d42c
Improve URL previews by not including the content of media tags in the generated description. (#12887) 2022-05-26 16:07:27 +01:00
David Robertson e7c77a8750
Correct annotation of `_iterate_over_text` (#12860) 2022-05-24 18:17:21 +00:00
Andrew Morgan 57f6c496d0
URL preview cache expiry logs: INFO -> DEBUG, text clarifications (#12720) 2022-05-12 18:16:32 +01:00
David Robertson 8de0facaae
Fix mypy against latest pillow stubs (#12671) 2022-05-09 10:48:14 +00:00
Sean Quah 800ba87cc8
Refactor and convert `Linearizer` to async (#12357)
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.

Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 15:43:52 +01:00
Brendan Abolivier f96b85eca8
Ensure the type of URL attributes is always str when matching against preview blacklist (#12333) 2022-03-31 11:49:49 +02:00
David Robertson a2b00a4486
Bump `black` and `click` versions (#12320) 2022-03-29 10:41:19 +00:00
Patrick Cloke 4587b35929
Clean-up logic for rebasing URLs during URL preview. (#12219)
By using urljoin from the standard library and reducing the number
of places URLs are rebased.
2022-03-16 07:21:36 -04:00
Patrick Cloke bc9dff1d95
Remove unnecessary pass statements. (#12206) 2022-03-11 07:06:21 -05:00
Sean Quah 5627182788
Use `ParamSpec` in type hints for `synapse.logging.context` (#12150)
Signed-off-by: Sean Quah <seanq@element.io>
2022-03-08 15:58:14 +00:00
Richard van der Hoff e24ff8ebe3
Remove `HomeServer.get_datastore()` (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
AndrewRyanChama 066171643b
Fetch images when previewing Twitter URLs. (#11985)
By including "bot" in the User-Agent, which some sites use
to decide whether to include additional Open Graph information.
2022-02-22 07:11:39 -05:00
Denis Kasak 337f38cac3
Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Patrick Cloke 314ca4c86d
Pass the proper type when uploading files. (#11927)
The Content-Length header should be treated as an int, not
a string. This shouldn't have any user-facing change.
2022-02-07 10:06:52 -05:00
Patrick Cloke 807efd26ae
Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Philippe Daouadi 15ffc4143c
Fix preview of imgur and Tenor URLs. (#11669)
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
2022-01-18 13:20:24 -05:00
Patrick Cloke 10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Patrick Cloke cbd82d0b2d
Convert all namedtuples to attrs. (#11665)
To improve type hints throughout the code.
2021-12-30 18:47:12 +00:00
Patrick Cloke eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Sean Quah 858d80bf0f
Fix media repository failing when media store path contains symlinks (#11446) 2021-12-02 16:05:24 +00:00
Sean Quah 454c3d7694 Merge branch 'master' into develop 2021-11-23 13:06:56 +00:00
Sean Quah 91f2bd0907 Prevent the media store from writing outside of the configured directory
Also tighten validation of server names by forbidding invalid characters
in IPv6 addresses and empty domain labels.
2021-11-19 13:39:15 +00:00
Patrick Cloke 9b90b9454b
Add type hints to media repository storage module (#11311) 2021-11-12 11:05:26 -05:00
Neeeflix 6ce19b94e8
Fix error in thumbnail generation (#11288)
Signed-off-by: Jonas Zeunert <jonas@zeunert.org>
2021-11-10 20:49:43 +00:00
Erik Johnston 237f7eb87a Merge remote-tracking branch 'origin/master' into develop 2021-11-02 14:28:27 +00:00
Shay f5c6a80886
Handle missing Content-Type header when accessing remote media (#11200)
* add code to handle missing content-type header and a test to verify that it works

* add handling for missing content-type in the /upload endpoint as well

* slightly refactor test code to put private method in approriate place

* handle possible null value for content-type when pulling from the local db

* add changelog

* refactor test and add code to handle missing content-type in cached remote media

* requested changes

* Update changelog.d/11200.bugfix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2021-11-01 10:26:02 -07:00
Patrick Cloke b3e843be88
Fix URL preview errors when previewing XML documents. (#11196) 2021-10-27 14:48:02 +00:00
Patrick Cloke efd0074ab7
Ensure each charset is attempted only once during media preview. (#11089)
There's no point in trying more than once since it is guaranteed to
continually fail.
2021-10-14 18:51:44 +00:00
Patrick Cloke e2f0b49b3f
Attempt different character encodings when previewing a URL. (#11077)
This follows similar logic to BeautifulSoup where we attempt different
character encodings until we find one which works.
2021-10-14 10:17:20 -04:00
Sean Quah b59f3281d5
Remove dead code from `MediaFilePaths` (#11056) 2021-10-13 13:41:24 +01:00
Patrick Cloke 732bbf6737
Be more lenient when parsing the version for oEmbed responses. (#11065) 2021-10-13 07:00:07 -04:00
Patrick Cloke 9abc5f2a05 Merge remote-tracking branch 'origin/release-v1.45' into develop 2021-10-12 14:21:05 -04:00
Sean Quah 8eaffe013c
Update `_wrap_in_base_path` type hints to preserve function arguments (#11055) 2021-10-12 18:19:21 +01:00
Patrick Cloke 1db9282dfa
Fix formatting string when oEmbed errors occur. (#11061) 2021-10-12 17:15:42 +00:00
Patrick Cloke 1b112840d2
Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00