Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118
fix urlparse import thinko breaking tiny URLs
2016-04-14 15:23:55 +01:00
Erik Johnston
d0633e6dbe
Sanitize the optional dependencies for spider API
2016-04-13 13:38:09 +01:00
Erik Johnston
17515bae14
PEP8
2016-04-11 11:02:50 +01:00
Matthew Hodgson
5ffacc5e84
fix typos and needless try/except from PR review
2016-04-11 10:39:16 +01:00
Matthew Hodgson
83b2f83da0
actually throw meaningful errors
2016-04-08 21:36:59 +01:00
Mark Haines
b36270b5e1
Fix pep8 warning
2016-04-08 19:52:23 +01:00
Matthew Hodgson
1ccabe2965
more PR feedback
2016-04-08 18:58:08 +01:00
Matthew Hodgson
dafef5a688
Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
...
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00
Matthew Hodgson
cf51c4120e
report image size (bytewise) in OG meta
2016-04-03 23:57:05 +01:00
Matthew Hodgson
0834b152fb
char encoding
2016-04-03 12:59:27 +01:00
Matthew Hodgson
8b98a7e8c3
pep8
2016-04-03 12:56:29 +01:00
Matthew Hodgson
eab4d462f8
fix etag typing error. fix timestamp typing error
2016-04-03 02:02:46 +01:00
Matthew Hodgson
c3916462f6
rebase all image URLs
2016-04-03 01:33:12 +01:00
Matthew Hodgson
110780b18b
remove stale todo
2016-04-03 00:48:31 +01:00
Matthew Hodgson
b09e29a03c
Ensure only one download for a given URL is active at a time
2016-04-03 00:47:40 +01:00
Matthew Hodgson
7426c86eb8
add a persistent cache of URL lookups, and fix up the in-memory one to work
2016-04-03 00:31:57 +01:00
Matthew Hodgson
d1b154a10f
support gzip compression, and don't pass through error msgs
2016-04-02 03:06:39 +01:00
Matthew Hodgson
5037ee0d37
handle missing dimensions without crashing
2016-04-02 02:29:57 +01:00
Matthew Hodgson
b26e8604f1
make meta comparisons case insensitive
2016-04-02 01:35:44 +01:00
Matthew Hodgson
5fd07da764
refactor calc_og; spider image URLs; fix xpath; add a (broken) expiringcache; loads of other fixes
2016-04-02 00:35:49 +01:00
Matthew Hodgson
c60b751694
fix assorted redirect, unicode and screenscraping bugs
2016-04-01 02:17:48 +01:00
Matthew Hodgson
683e564815
handle spidered relative images correctly
2016-03-31 23:52:58 +01:00
Matthew Hodgson
72550c3803
prevent choking on invalid utf-8, and handle image thumbnailing smarter
2016-03-31 15:14:14 +01:00
Matthew Hodgson
bb9a2ca87c
synthesise basig OG metadata from pages lacking it
2016-03-31 14:15:09 +01:00
Matthew Hodgson
a8a5dd3b44
handle requests with missing content-length headers (e.g. YouTube)
2016-03-31 01:55:21 +01:00
Matthew Hodgson
ae5831d303
fix bugs
2016-03-29 03:32:55 +01:00
Matthew Hodgson
19038582d3
debug
2016-03-29 03:14:16 +01:00
Matthew Hodgson
64b4aead15
make it work
2016-03-29 03:13:25 +01:00
Matthew Hodgson
dd4287ca5d
make it build
2016-03-29 02:07:57 +01:00
Matthew Hodgson
7dd0c1730a
initial WIP of a tentative preview_url endpoint - incomplete, untested, experimental, etc. just putting it here for safekeeping for now
2016-01-24 18:47:27 -05:00