Commit Graph

86 Commits (7b3a8f2b0cb19ec31f922829edf41a13e364d0e2)

Author SHA1 Message Date
Jeyachandran Rathnam babeeb4e7a
Unescape HTML entities in oEmbed titles. (#14781)
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.

There should not be harm in unescaping these.
2023-01-09 14:22:02 +00:00
Patrick Cloke 00c93d2e7e
Be more lenient in the oEmbed response parsing. (#14089)
Attempt to parse any valid information from an oEmbed response
(instead of bailing at the first unexpected data). This should allow
for more partial oEmbed data to be returned, resulting in better /
more URL previews, even if those URL previews are only partial.
2022-10-07 09:29:43 -04:00
Eric Eastwood 7b67e93d49
Provide more info why we don't have any thumbnails to serve (#13038)
Fix https://github.com/matrix-org/synapse/issues/13016

## New error code and status

### Before

Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.

```json
{
  "errcode": "M_NOT_FOUND",
  "error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```

### After

What does the spec say?

> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*

Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.

```json
{
    "errcode": "M_UNKNOWN",
    "error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```

> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)


---

We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122

We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
2022-07-15 11:42:21 -05:00
David Teller 11f811470f
Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return `Tuple[Codes, dict]` (#13044)
Signed-off-by: David Teller <davidt@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-07-11 16:52:10 +00:00
Andrew Morgan 6cba6a51af Merge branch 'master' into develop 2022-06-28 15:19:48 +01:00
reivilibre fa13080618
Merge pull request from GHSA-22p3-qrh9-cx32
* Make _iterate_over_text easier to read by using simple data structures

* Prefer a set of tags to ignore

In my tests, it's 4x faster to check for containment in a set of this size

* Add a stack size limit to _iterate_over_text

* Continue accepting the case where there is no body element

* Use an early return instead for None

Co-authored-by: Richard van der Hoff <richard@matrix.org>
2022-06-28 14:29:08 +01:00
Robert Long 9b683ea80f
Add Cross-Origin-Resource-Policy header to thumbnail and download media endpoints (#12944) 2022-06-27 14:44:05 +01:00
Patrick Cloke 0fcc0ae37c
Improve URL previews for sites with only Twitter card information. (#13056)
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.

Prefers Open Graph information over Twitter card information.
2022-06-16 07:41:57 -04:00
Patrick Cloke 148fe58a24
Do not break URL previews if an image is unreachable. (#12950)
Avoid breaking a URL preview completely if the chosen image 404s
or is unreachable for some other reason (e.g. DNS).
2022-06-06 07:46:04 -04:00
Patrick Cloke 01df5bacac
Improve URL previews for some pages (#12951)
* Skip `og` and `meta` tags where the value is empty.
* Fallback to the favicon if there are no other images.
* Ignore tags meant for navigation.
2022-06-03 12:09:12 -04:00
Brendan Abolivier f96b85eca8
Ensure the type of URL attributes is always str when matching against preview blacklist (#12333) 2022-03-31 11:49:49 +02:00
Patrick Cloke 4587b35929
Clean-up logic for rebasing URLs during URL preview. (#12219)
By using urljoin from the standard library and reducing the number
of places URLs are rebased.
2022-03-16 07:21:36 -04:00
Dirk Klimpel 32c828d0f7
Add type hints to `tests/rest`. (#12208)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-03-11 12:42:22 +00:00
Dirk Klimpel 7e91107be1
Add type hints to `tests/rest` (#12146)
* Add type hints to `tests/rest`

* newsfile

* change import from `SigningKey`
2022-03-03 16:05:44 +00:00
Patrick Cloke 02d708568b
Replace assertEquals and friends with non-deprecated versions. (#12092) 2022-02-28 07:12:29 -05:00
Richard van der Hoff e24ff8ebe3
Remove `HomeServer.get_datastore()` (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Denis Kasak 337f38cac3
Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Patrick Cloke 807efd26ae
Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Patrick Cloke eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Sean Quah 858d80bf0f
Fix media repository failing when media store path contains symlinks (#11446) 2021-12-02 16:05:24 +00:00
Sean Quah 91f2bd0907 Prevent the media store from writing outside of the configured directory
Also tighten validation of server names by forbidding invalid characters
in IPv6 addresses and empty domain labels.
2021-11-19 13:39:15 +00:00
Shay f5c6a80886
Handle missing Content-Type header when accessing remote media (#11200)
* add code to handle missing content-type header and a test to verify that it works

* add handling for missing content-type in the /upload endpoint as well

* slightly refactor test code to put private method in approriate place

* handle possible null value for content-type when pulling from the local db

* add changelog

* refactor test and add code to handle missing content-type in cached remote media

* requested changes

* Update changelog.d/11200.bugfix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2021-11-01 10:26:02 -07:00
Patrick Cloke 732bbf6737
Be more lenient when parsing the version for oEmbed responses. (#11065) 2021-10-13 07:00:07 -04:00
Sean Quah 84f5d83257
Add tests for `MediaFilePaths` (#11057) 2021-10-12 18:19:35 +01:00
Patrick Cloke 1b112840d2
Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00
Sean Quah 2be0fde3d6
Fix empty `url_cache_thumbnails/yyyy-mm-dd/` directories being left behind (#10924) 2021-09-29 10:24:37 +01:00
Sean Quah f7768f62cb
Avoid storing URL cache files in storage providers (#10911)
URL cache files are short-lived and it does not make sense to offload
them (eg. to the cloud) or back them up.
2021-09-27 12:55:27 +01:00
Patrick Cloke bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
Erik Johnston 50022cff96
Add reactor to `SynapseRequest` and fix up types. (#10868) 2021-09-24 11:01:25 +01:00
Patrick Cloke 6fc8be9a1b
Include more information in oEmbed previews. (#10819)
* Improved titles (fall back to the author name if there's not title) and include the site name.
* Handle photo/video payloads.
* Include the original URL in the Open Graph response.
* Fix the expiration time (by properly converting from seconds to milliseconds).
2021-09-22 09:45:20 -04:00
Patrick Cloke ba7a91aea5
Refactor oEmbed previews (#10814)
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).

This also makes more minor refactorings:

* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
  previews.
* Fixes tests to preview an oEmbed image.
2021-09-21 16:09:57 +00:00
Patrick Cloke bfb4b858a9
Create a constant for a small png image in tests. (#10834)
To avoid duplicating it between a few tests.
2021-09-16 12:01:14 -04:00
Patrick Cloke 580a15e039
Request JSON for oEmbed requests (and ignore XML only providers). (#10759)
This adds the format to the request arguments / URL to
ensure that JSON data is returned (which is all that
Synapse supports).

This also adds additional error checking / filtering to the
configuration file to ignore XML-only providers.
2021-09-08 07:17:52 -04:00
Patrick Cloke e2481dbe93
Allow configuration of the oEmbed URLs. (#10714)
This adds configuration options (under an `oembed` section) to
configure which URLs are matched to use oEmbed for URL
previews.
2021-08-31 18:37:07 -04:00
Sean 7367473f96
Fix error when selecting between thumbnails with the same quality (#10684)
Fixes #10318
2021-08-25 09:51:08 +00:00
reivilibre 642a42edde
Flatten the synapse.rest.client package (#10600) 2021-08-17 11:57:58 +00:00
Jonathan de Jong 89cfc3dd98
[pyupgrade] `tests/` (#10347) 2021-07-13 11:43:15 +01:00
Brendan Abolivier 1b3e398bea
Standardise the module interface (#10062)
This PR adds a common configuration section for all modules (see docs). These modules are then loaded at startup by the homeserver. Modules register their hooks and web resources using the new `register_[...]_callbacks` and `register_web_resource` methods of the module API.
2021-06-18 12:15:52 +01:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Patrick Cloke 0b3112123d
Use mock from the stdlib. (#9772) 2021-04-09 13:44:38 -04:00
Patrick Cloke 075c16b410
Handle image transparency better when thumbnailing. (#9473)
Properly uses RGBA mode for 1- and 8-bit images with transparency
(instead of RBG mode).
2021-03-09 07:37:09 -05:00
Erik Johnston 3a2fe5054f Add test 2021-02-19 15:52:04 +00:00
Eric Eastwood 0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Erik Johnston 7e8083eb48 Add check_media_file_for_spam spam checker hook 2021-02-04 17:01:30 +00:00
Patrick Cloke a7882f9887
Return a 404 if no valid thumbnail is found. (#9163)
If no thumbnail of the requested type exists, return a 404 instead
of erroring. This doesn't quite match the spec (which does not define
what happens if no thumbnail can be found), but is consistent with
what Synapse already does.
2021-01-21 14:53:58 -05:00
Richard van der Hoff 8d3d264052
Skip unit tests which require optional dependencies (#9031)
If we are lacking an optional dependency, skip the tests that rely on it.
2021-01-07 11:41:28 +00:00
Richard van der Hoff 394516ad1b Remove spurious "SynapseRequest" result from `make_request"
This was never used, so let's get rid of it.
2020-12-15 22:35:40 +00:00
Aaron Raimist cd9e72b185
Add X-Robots-Tag header to stop crawlers from indexing media (#8887)
Fixes / related to: https://github.com/matrix-org/synapse/issues/6533

This should do essentially the same thing as a robots.txt file telling robots to not index the media repo. https://developers.google.com/search/reference/robots_meta_tag

Signed-off-by: Aaron Raimist <aaron@raim.ist>
2020-12-08 22:51:03 +00:00
Richard van der Hoff f347f0cd58
remove unused FakeResponse (#8864) 2020-12-02 18:58:25 +00:00
Patrick Cloke 30fba62108
Apply an IP range blacklist to push and key revocation requests. (#8821)
Replaces the `federation_ip_range_blacklist` configuration setting with an
`ip_range_blacklist` setting with wider scope. It now applies to:

* Federation
* Identity servers
* Push notifications
* Checking key validitity for third-party invite events

The old `federation_ip_range_blacklist` setting is still honored if present, but
with reduced scope (it only applies to federation and identity servers).
2020-12-02 11:09:24 -05:00