Commit Graph

19 Commits (121b9e2475f4d7b3bca50d81732f07db80b2264f)

Author SHA1 Message Date
Patrick Cloke eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Patrick Cloke b3e843be88
Fix URL preview errors when previewing XML documents. (#11196) 2021-10-27 14:48:02 +00:00
Patrick Cloke efd0074ab7
Ensure each charset is attempted only once during media preview. (#11089)
There's no point in trying more than once since it is guaranteed to
continually fail.
2021-10-14 18:51:44 +00:00
Patrick Cloke e2f0b49b3f
Attempt different character encodings when previewing a URL. (#11077)
This follows similar logic to BeautifulSoup where we attempt different
character encodings until we find one which works.
2021-10-14 10:17:20 -04:00
Patrick Cloke 1b112840d2
Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00
sri-vidyut 8e1febc6a1
Support underscores (in addition to hyphens) for charset detection. (#10410) 2021-07-27 17:29:42 +00:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Patrick Cloke 0963d39ea6
Handle additional errors when previewing URLs. (#9333)
* Handle the case of lxml not finding a document tree.
* Parse the document encoding from the XML tag.
2021-02-08 12:33:30 -05:00
Patrick Cloke 4937fe3d6b
Try to recover from unknown encodings when previewing media. (#9164)
Treat unknown encodings (according to lxml) as UTF-8
when generating a preview for HTML documents. This
isn't fully accurate, but will hopefully give a reasonable
title and summary.
2021-01-26 07:32:17 -05:00
Richard van der Hoff 8d3d264052
Skip unit tests which require optional dependencies (#9031)
If we are lacking an optional dependency, skip the tests that rely on it.
2021-01-07 11:41:28 +00:00
Patrick Cloke 1f3748f033
Do not raise a 500 exception when previewing empty media. (#8883) 2020-12-07 10:00:08 -05:00
Amber Brown 32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
black 8b3d9b6b19 Run black. 2018-08-10 23:54:09 +10:00
Amber Brown 49af402019 run isort 2018-07-09 16:09:20 +10:00
Marcin Bachry 24c16fc349 Fix crash in url preview when html tag has no text
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg 6c9a0ba415 test_preview: Fix incorrect wrapping
The old test expected an incorrect wrapping due to the preview function
not using unicode properly, so it got the wrong length.

Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-05 16:33:57 +01:00
Johannes Löthberg 0697bb2247 Make test_preview use unicode strings
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-05 16:33:57 +01:00
Erik Johnston 48b5829aea Fix up preview URL API. Add tests.
This includes:

- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston e97648c4e2 Test summarization 2016-08-04 16:09:09 +01:00