Commit Graph

467 Commits (4d6b80038588f273a95d8d705aca885dc61768cc)

Author SHA1 Message Date
Denis Kasak 337f38cac3
Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Patrick Cloke 314ca4c86d
Pass the proper type when uploading files. (#11927)
The Content-Length header should be treated as an int, not
a string. This shouldn't have any user-facing change.
2022-02-07 10:06:52 -05:00
Patrick Cloke 807efd26ae
Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Philippe Daouadi 15ffc4143c
Fix preview of imgur and Tenor URLs. (#11669)
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
2022-01-18 13:20:24 -05:00
Patrick Cloke 10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Patrick Cloke cbd82d0b2d
Convert all namedtuples to attrs. (#11665)
To improve type hints throughout the code.
2021-12-30 18:47:12 +00:00
Patrick Cloke eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Sean Quah 858d80bf0f
Fix media repository failing when media store path contains symlinks (#11446) 2021-12-02 16:05:24 +00:00
Sean Quah 454c3d7694 Merge branch 'master' into develop 2021-11-23 13:06:56 +00:00
Sean Quah 91f2bd0907 Prevent the media store from writing outside of the configured directory
Also tighten validation of server names by forbidding invalid characters
in IPv6 addresses and empty domain labels.
2021-11-19 13:39:15 +00:00
Patrick Cloke 9b90b9454b
Add type hints to media repository storage module (#11311) 2021-11-12 11:05:26 -05:00
Neeeflix 6ce19b94e8
Fix error in thumbnail generation (#11288)
Signed-off-by: Jonas Zeunert <jonas@zeunert.org>
2021-11-10 20:49:43 +00:00
Erik Johnston 237f7eb87a Merge remote-tracking branch 'origin/master' into develop 2021-11-02 14:28:27 +00:00
Shay f5c6a80886
Handle missing Content-Type header when accessing remote media (#11200)
* add code to handle missing content-type header and a test to verify that it works

* add handling for missing content-type in the /upload endpoint as well

* slightly refactor test code to put private method in approriate place

* handle possible null value for content-type when pulling from the local db

* add changelog

* refactor test and add code to handle missing content-type in cached remote media

* requested changes

* Update changelog.d/11200.bugfix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2021-11-01 10:26:02 -07:00
Patrick Cloke b3e843be88
Fix URL preview errors when previewing XML documents. (#11196) 2021-10-27 14:48:02 +00:00
Patrick Cloke efd0074ab7
Ensure each charset is attempted only once during media preview. (#11089)
There's no point in trying more than once since it is guaranteed to
continually fail.
2021-10-14 18:51:44 +00:00
Patrick Cloke e2f0b49b3f
Attempt different character encodings when previewing a URL. (#11077)
This follows similar logic to BeautifulSoup where we attempt different
character encodings until we find one which works.
2021-10-14 10:17:20 -04:00
Sean Quah b59f3281d5
Remove dead code from `MediaFilePaths` (#11056) 2021-10-13 13:41:24 +01:00
Patrick Cloke 732bbf6737
Be more lenient when parsing the version for oEmbed responses. (#11065) 2021-10-13 07:00:07 -04:00
Patrick Cloke 9abc5f2a05 Merge remote-tracking branch 'origin/release-v1.45' into develop 2021-10-12 14:21:05 -04:00
Sean Quah 8eaffe013c
Update `_wrap_in_base_path` type hints to preserve function arguments (#11055) 2021-10-12 18:19:21 +01:00
Patrick Cloke 1db9282dfa
Fix formatting string when oEmbed errors occur. (#11061) 2021-10-12 17:15:42 +00:00
Patrick Cloke 1b112840d2
Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00
David Robertson 797ee7812d
Relax `ignore-missing-imports` for modules that have stubs now and update mypy (#11006)
Updating mypy past version 0.9 means that third-party stubs are no-longer distributed with typeshed. See http://mypy-lang.blogspot.com/2021/06/mypy-0900-released.html for details.
We therefore pull in stub packages in setup.py

Additionally, some modules that we were previously ignoring import failures for now have stubs. So let's use them.

The rest of this change consists of fixups to make the newer mypy + stubs pass CI.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-10-08 14:49:41 +01:00
Sean Quah 2be0fde3d6
Fix empty `url_cache_thumbnails/yyyy-mm-dd/` directories being left behind (#10924) 2021-09-29 10:24:37 +01:00
Sean Quah f7768f62cb
Avoid storing URL cache files in storage providers (#10911)
URL cache files are short-lived and it does not make sense to offload
them (eg. to the cloud) or back them up.
2021-09-27 12:55:27 +01:00
Sean Quah 6c83c27107
Fix race conditions when creating media store and config directories (#10913) 2021-09-27 11:29:23 +01:00
Patrick Cloke bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
Erik Johnston 50022cff96
Add reactor to `SynapseRequest` and fix up types. (#10868) 2021-09-24 11:01:25 +01:00
Patrick Cloke 47854c71e9
Use direct references for configuration variables (part 4). (#10893) 2021-09-23 12:03:01 -04:00
Patrick Cloke 6fc8be9a1b
Include more information in oEmbed previews. (#10819)
* Improved titles (fall back to the author name if there's not title) and include the site name.
* Handle photo/video payloads.
* Include the original URL in the Open Graph response.
* Fix the expiration time (by properly converting from seconds to milliseconds).
2021-09-22 09:45:20 -04:00
Patrick Cloke ba7a91aea5
Refactor oEmbed previews (#10814)
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).

This also makes more minor refactorings:

* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
  previews.
* Fixes tests to preview an oEmbed image.
2021-09-21 16:09:57 +00:00
Patrick Cloke b93259082c
Add missing type hints to non-client REST servlets. (#10817)
Including admin, consent, key, synapse, and media. All REST servlets
(the synapse.rest module) now require typed method definitions.
2021-09-15 08:45:32 -04:00
Patrick Cloke b996782df5
Convert media repo's FileInfo to attrs. (#10785)
This is mostly an internal change, but improves type hints in the
media code.
2021-09-14 07:09:38 -04:00
Patrick Cloke 580a15e039
Request JSON for oEmbed requests (and ignore XML only providers). (#10759)
This adds the format to the request arguments / URL to
ensure that JSON data is returned (which is all that
Synapse supports).

This also adds additional error checking / filtering to the
configuration file to ignore XML-only providers.
2021-09-08 07:17:52 -04:00
Patrick Cloke 89ba834818
Use attrs internally for the URL preview code & add documentation. (#10753) 2021-09-07 13:10:34 +00:00
Patrick Cloke e2481dbe93
Allow configuration of the oEmbed URLs. (#10714)
This adds configuration options (under an `oembed` section) to
configure which URLs are matched to use oEmbed for URL
previews.
2021-08-31 18:37:07 -04:00
Sean 7367473f96
Fix error when selecting between thumbnails with the same quality (#10684)
Fixes #10318
2021-08-25 09:51:08 +00:00
Dirk Klimpel 915b37e5ef
Admin API to delete media for a specific user (#10558) 2021-08-11 19:29:59 +00:00
sri-vidyut 8e1febc6a1
Support underscores (in addition to hyphens) for charset detection. (#10410) 2021-07-27 17:29:42 +00:00
Denis Kasak 2476d5373c
Mitigate media repo XSSs on IE11. (#10468)
IE11 doesn't support Content-Security-Policy but it has support for
a non-standard X-Content-Security-Policy header, which only supports the
sandbox directive. This prevents script execution, so it at least offers
some protection against media repo-based attacks.

Signed-off-by: Denis Kasak <dkasak@termina.org.uk>
2021-07-27 13:45:10 +02:00
Patrick Cloke 5db118626b
Add a return type to parse_string. (#10438)
And set the required attribute in a few places which will error if
a parameter is not provided.
2021-07-21 09:47:56 -04:00
Jonathan de Jong 95e47b2e78
[pyupgrade] `synapse/` (#10348)
This PR is tantamount to running 
```
pyupgrade --py36-plus --keep-percent-format `find synapse/ -type f -name "*.py"`
```

Part of #9744
2021-07-19 15:28:05 +01:00
Jonathan de Jong 98aec1cc9d
Use inline type hints in `handlers/` and `rest/`. (#10382) 2021-07-16 18:22:36 +01:00
Patrick Cloke 9e4610cc27
Correct type hints for parse_string(s)_from_args. (#10137) 2021-06-08 08:30:48 -04:00
Michael Telatynski e8ac9ac8ca
Fix /upload 500'ing when presented a very large image (#10029)
* Fix /upload 500'ing when presented a very large image

Catch DecompressionBombError and re-raise as ThumbnailErrors

* Set PIL's MAX_IMAGE_PIXELS to match homeserver.yaml

to get it to bomb out quicker, to load less into memory
in the case of super large images

* Add changelog entry for 10029
2021-05-21 18:31:59 +02:00
Andrew Morgan fe604a022a
Remove various bits of compatibility code for Python <3.6 (#9879)
I went through and removed a bunch of cruft that was lying around for compatibility with old Python versions. This PR also will now prevent Synapse from starting unless you're running Python 3.6+.
2021-04-27 13:13:07 +01:00
Richard van der Hoff 3ff2251754
Improved validation for received requests (#9817)
* Simplify `start_listening` callpath

* Correctly check the size of uploaded files
2021-04-23 19:20:44 +01:00
Richard van der Hoff 5a153772c1
remove `HomeServer.get_config` (#9815)
Every single time I want to access the config object, I have to remember
whether or not we use `get_config`. Let's just get rid of it.
2021-04-14 19:09:08 +01:00
rkfg c9a2b5d402
More robust handling of the Content-Type header for thumbnail generation (#9788)
Signed-off-by: Sergey Shpikin <rkfg@rkfg.me>
2021-04-14 16:30:59 +01:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Patrick Cloke 44bb881096
Add type hints to expiring cache. (#9730) 2021-04-06 08:58:18 -04:00
Erik Johnston b5efcb577e
Make it possible to use dmypy (#9692)
Running `dmypy run` will do a `mypy` check while spinning up a daemon
that makes rerunning `dmypy run` a lot faster.

`dmypy` doesn't support `follow_imports = silent` and has
`local_partial_types` enabled, so this PR enables those options and
fixes the issues that were newly raised. Note that `local_partial_types`
will be enabled by default in upcoming mypy releases.
2021-03-26 16:49:46 +00:00
Patrick Cloke b7748d3c00
Import HomeServer from the proper module. (#9665) 2021-03-23 07:12:48 -04:00
Patrick Cloke 55da8df078
Fix additional type hints from Twisted 21.2.0. (#9591) 2021-03-12 11:37:57 -05:00
Richard van der Hoff a7a3790066
Convert Requester to attrs (#9586)
... because namedtuples suck

Fix up a couple of other annotations to keep mypy happy.
2021-03-10 18:15:56 +00:00
Patrick Cloke 075c16b410
Handle image transparency better when thumbnailing. (#9473)
Properly uses RGBA mode for 1- and 8-bit images with transparency
(instead of RBG mode).
2021-03-09 07:37:09 -05:00
Patrick Cloke a0bc9d387e
Use the proper Request in type hints. (#9515)
This also pins the Twisted version in the mypy job for CI until
proper type hints are fixed throughout Synapse.
2021-03-01 12:23:46 -05:00
Tim Leung ddb240293a
Add support for no_proxy and case insensitive env variables (#9372)
### Changes proposed in this PR

- Add support for the `no_proxy` and `NO_PROXY` environment variables
  - Internally rely on urllib's [`proxy_bypass_environment`](bdb941be42/Lib/urllib/request.py (L2519))
- Extract env variables using urllib's `getproxies`/[`getproxies_environment`](bdb941be42/Lib/urllib/request.py (L2488)) which supports lowercase + uppercase, preferring lowercase, except for `HTTP_PROXY` in a CGI environment

This does contain behaviour changes for consumers so making sure these are called out:
- `no_proxy`/`NO_PROXY` is now respected
- lowercase `https_proxy` is now allowed and taken over `HTTPS_PROXY`

Related to #9306 which also uses `ProxyAgent`

Signed-off-by: Timothy Leung tim95@hotmail.co.uk
2021-02-26 17:37:57 +00:00
Erik Johnston 3d2acc930f Return a 404 if we don't have the original file 2021-02-19 10:46:18 +00:00
Erik Johnston b106080fb4 Regenerate exact thumbnails if missing 2021-02-18 17:05:32 +00:00
Eric Eastwood 0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Patrick Cloke 7950aa8a27 Fix some typos. 2021-02-12 11:14:12 -05:00
Patrick Cloke 0963d39ea6
Handle additional errors when previewing URLs. (#9333)
* Handle the case of lxml not finding a document tree.
* Parse the document encoding from the XML tag.
2021-02-08 12:33:30 -05:00
Erik Johnston 7e8083eb48 Add check_media_file_for_spam spam checker hook 2021-02-04 17:01:30 +00:00
Patrick Cloke 4937fe3d6b
Try to recover from unknown encodings when previewing media. (#9164)
Treat unknown encodings (according to lxml) as UTF-8
when generating a preview for HTML documents. This
isn't fully accurate, but will hopefully give a reasonable
title and summary.
2021-01-26 07:32:17 -05:00
Patrick Cloke a7882f9887
Return a 404 if no valid thumbnail is found. (#9163)
If no thumbnail of the requested type exists, return a 404 instead
of erroring. This doesn't quite match the spec (which does not define
what happens if no thumbnail can be found), but is consistent with
what Synapse already does.
2021-01-21 14:53:58 -05:00
Patrick Cloke d34c6e1279
Add type hints to media rest resources. (#9093) 2021-01-15 10:57:37 -05:00
David Teller f14428b25c
Allow spam-checker modules to be provide async methods. (#8890)
Spam checker modules can now provide async methods. This is implemented
in a backwards-compatible manner.
2020-12-11 14:05:15 -05:00
Aaron Raimist cd9e72b185
Add X-Robots-Tag header to stop crawlers from indexing media (#8887)
Fixes / related to: https://github.com/matrix-org/synapse/issues/6533

This should do essentially the same thing as a robots.txt file telling robots to not index the media repo. https://developers.google.com/search/reference/robots_meta_tag

Signed-off-by: Aaron Raimist <aaron@raim.ist>
2020-12-08 22:51:03 +00:00
Patrick Cloke 1f3748f033
Do not raise a 500 exception when previewing empty media. (#8883) 2020-12-07 10:00:08 -05:00
Patrick Cloke df3e6a23a7
Do not 500 if the content-length is not provided when uploading media. (#8862)
Instead return the proper 400 error.
2020-12-04 10:26:09 -05:00
Patrick Cloke 30fba62108
Apply an IP range blacklist to push and key revocation requests. (#8821)
Replaces the `federation_ip_range_blacklist` configuration setting with an
`ip_range_blacklist` setting with wider scope. It now applies to:

* Federation
* Identity servers
* Push notifications
* Checking key validitity for third-party invite events

The old `federation_ip_range_blacklist` setting is still honored if present, but
with reduced scope (it only applies to federation and identity servers).
2020-12-02 11:09:24 -05:00
Erik Johnston 46f4be94b4
Fix race for concurrent downloads of remote media. (#8682)
Fixes #6755
2020-10-30 10:55:24 +00:00
Dirk Klimpel 49d72dea2a
Add an admin api to delete local media. (#8519)
Related to: #6459, #3479

Add `DELETE /_synapse/admin/v1/media/<server_name>/<media_id>` to delete
a single file from server.
2020-10-26 17:02:28 +00:00
Andrew Morgan 3e58ce72b4
Don't bother responding to client requests that have already disconnected (#8465)
This PR ports the quick fix from https://github.com/matrix-org/synapse/pull/2796 to further methods which handle media, URL preview and `/key/v2/server` requests. This prevents a harmless `ERROR` that comes up in the logs when we were unable to respond to a client request when the client had already disconnected. In this case we simply bail out if the client has already done so.

This is the 'simple fix' as suggested by https://github.com/matrix-org/synapse/issues/5304#issuecomment-574740003.

Fixes https://github.com/matrix-org/synapse/issues/6700
Fixes https://github.com/matrix-org/synapse/issues/5304
2020-10-06 10:03:39 +01:00
Richard van der Hoff 73d93039ff
Fix bug in remote thumbnail search (#8438)
#7124 changed the behaviour of remote thumbnails so that the thumbnailing method was included in the filename of the thumbnail. To support existing files, it included a fallback so that we would check the old filename if the new filename didn't exist.

Unfortunately, it didn't apply this logic to storage providers, so any thumbnails stored on such a storage provider was broken.
2020-10-02 12:29:29 +01:00
Richard van der Hoff b1f4e6e4fc
fix a logging error in thumbnailer (#8435)
Introduced in #8236
2020-10-01 13:34:24 +01:00
Will Hunt c2bdf040aa
Discard an empty upload_name before persisting an uploaded file (#7905) 2020-09-29 12:15:27 -04:00
Richard van der Hoff 11c9e17738
Add type annotations to SimpleHttpClient (#8372) 2020-09-24 15:47:20 +01:00
Patrick Cloke aec294ee0d
Use slots in attrs classes where possible (#8296)
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
2020-09-14 12:50:06 -04:00
Patrick Cloke d2a3eb04a4 Fix typos in comments. 2020-09-14 11:46:58 -04:00
Patrick Cloke b312769c0e
Do not error when thumbnailing invalid files (#8236)
If a file cannot be thumbnailed for some reason (e.g. the file is empty), then
catch the exception and convert it to a reasonable error message for the client.
2020-09-09 12:59:41 -04:00
DeepBlueV7.X 560f3b8609
Include method in thumbnail media name (#7124)
This fixes an issue where different methods (crop/scale) overwrite each other.

This first tries the new path. If that fails and we are looking for a
remote thumbnail, it tries the old path. If that still isn't found, it
continues as normal.

This should probably be removed in the future, after some of the newer
thumbnails were generated with the new path on most deployments. Then
the overhead should be minimal if the other thumbnails need to be
regenerated.

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
2020-09-08 17:19:50 +01:00
Patrick Cloke c619253db8
Stop sub-classing object (#8249) 2020-09-04 06:54:56 -04:00
Patrick Cloke 4e874ed593
Remove unnecessary maybeDeferred calls (#8044) 2020-08-07 09:44:48 -04:00
David Vo 4dd27e6d11
Reduce unnecessary whitespace in JSON. (#7372) 2020-08-07 08:02:55 -04:00
Erik Johnston a7bdf98d01
Rename database classes to make some sense (#8033) 2020-08-05 21:38:57 +01:00
Patrick Cloke 8ff2deda72
Fix async/await calls for broken media providers. (#8027) 2020-08-04 09:44:25 -04:00
Patrick Cloke 68626ff8e9
Convert the remaining media repo code to async / await. (#7947) 2020-07-27 14:40:11 -04:00
Patrick Cloke 3fc8fdd150
Support oEmbed for media previews. (#7920)
Fixes previews of Twitter URLs by using their oEmbed endpoint to grab content.
2020-07-27 07:50:44 -04:00
Patrick Cloke 5ea29d7f85
Convert more of the media code to async/await (#7873) 2020-07-24 09:39:02 -04:00
Will Hunt 62b1ce8539
isort 5 compatibility (#7786)
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
2020-07-05 16:32:02 +01:00
Erik Johnston 5cdca53aa0
Merge different Resource implementation classes (#7732) 2020-07-03 19:02:19 +01:00
Erik Johnston b44bdd7f7b
Support running multiple media repos. (#7706)
This requires a new config option to specify which media repo should be
responsible for running background jobs to e.g. clear out expired URL
preview caches.
2020-06-17 14:13:30 +01:00
Patrick Cloke 434716e1d3
Fetch from the r0 media path instead of the unspecced v1. (#7714) 2020-06-17 08:36:46 -04:00
Dagfinn Ilmari Mannsåker a3f11567d9
Replace all remaining six usage with native Python 3 equivalents (#7704) 2020-06-16 08:51:47 -04:00
Patrick Cloke bd6dc17221
Replace iteritems/itervalues/iterkeys with native versions. (#7692) 2020-06-15 07:03:36 -04:00
Richard van der Hoff d4676910c9 remove miscellaneous PY2 code 2020-05-15 19:37:41 +01:00
Michael Kaye 5308239d5d
Reduce logging verbosity of URL cache cleanup. (#7295) 2020-04-22 07:45:16 -04:00