Fix the SQL SELECT query in _paginate_room_events_txn

Doing a SELECT DISTINCT when paginating is quite expensive, because it requires the engine to do sorting on the entire events table. However, we only need to run it if we're filtering on 2+ labels, so this PR is changing the request so that DISTINCT is only used then.
pull/6340/head
Brendan Abolivier 2019-11-07 11:49:37 +00:00
parent 807ec3bd99
commit 3f9b61ff95
No known key found for this signature in database
GPG Key ID: 1E015C145F1916CD
1 changed files with 13 additions and 2 deletions

View File

@ -871,14 +871,25 @@ class StreamWorkerStore(EventsWorkerStore, SQLBaseStore):
args.append(int(limit))
# Using DISTINCT in this SELECT query is quite expensive, because it requires the
# engine to sort on the entire (not limited) result set, i.e. the entire events
# table. We only need to use it when we're filtering on more than two labels,
# because that's the only scenario in which we can possibly to get multiple times
# the same event ID in the results.
if event_filter.labels and len(event_filter.labels) > 1:
select_keywords = "SELECT DISTINCT"
else:
select_keywords = "SELECT"
sql = (
"SELECT DISTINCT event_id, topological_ordering, stream_ordering"
"%(select_keywords)s event_id, topological_ordering, stream_ordering"
" FROM events"
" LEFT JOIN event_labels USING (event_id, room_id, topological_ordering)"
" WHERE outlier = ? AND room_id = ? AND %(bounds)s"
" ORDER BY topological_ordering %(order)s,"
" stream_ordering %(order)s LIMIT ?"
) % {"bounds": bounds, "order": order}
) % {"select_keywords": select_keywords, "bounds": bounds, "order": order}
txn.execute(sql, args)