Commit Graph

29 Commits (1abfb15f07d4f8119afcf908f9e1903e7feef371)

Author SHA1 Message Date
David Robertson b83e822556
Stop user directory from failing if it encounters users not in the `users` table. (#11053)
The following scenarios would halt the user directory updater:

- user joins room
- user leaves room
- user present in room which switches from private to public, or vice versa.

for two classes of users:

- appservice senders
- users missing from the user table.

If this happened, the user directory would be stuck, unable to make forward progress.

Exclude both cases from the user directory, so that we ignore them.

Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2021-10-13 09:38:22 +00:00
David Robertson 4f00432ce1
Fix potential leak of per-room profiles when the user dir is rebuilt. (#10981)
There are two steps to rebuilding the user directory:

1. a scan over rooms, followed by
2. a scan over local users.

The former reads avatars and display names from the `room_memberships`
table and therefore contains potentially private avatars and
display names. The latter reads from the the `profiles` table which only
contains public data; moreover it will overwrite any private profiles
that the rooms scan may have written to the user directory. This means
that the rebuild could leak private user while the rebuild was in
progress, only to later cover up the leaks once the rebuild had completed.

This change skips over local users when writing user_directory rows
when scanning rooms. Doing so means that it'll take longer for a rebuild
to make local users searchable, which is unfortunate. I think a future
PR can improve this by swapping the order of the two steps above. (And
indeed there's more to do here, e.g. copying from `profiles` without
going via Python.)

Small tidy-ups while I'm here:

* Remove duplicated code from test_initial. This was meant to be pulled into `purge_and_rebuild_user_dir`.
* Move `is_public` before updating sharing tables. No functional change; it's still before the first read of `is_public`.
* Don't bother creating a set from dict keys. Slightly nicer and makes the code simpler.

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2021-10-05 18:35:25 +01:00
David Robertson f7b034a24b
Consistently exclude from user_directory (#10960)
* Introduce `should_include_local_users_in_dir`

We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the caller's responsibility to make the local versus remote
distinction clear and correct.

A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.

We also add new test cases to covert the exclusion logic.

----

By my reading this makes these changes:

* When an app service user registers or changes their profile, they will
  _not_ be added to the user directory. (Previously only support and
  deactivated users were excluded). This is consistent with the logic that
  rebuilds the user directory. See also [the discussion
  here](https://github.com/matrix-org/synapse/pull/10914#discussion_r716859548).
* When rebuilding the directory, exclude support and disabled users from
  room sharing tables. Previously only appservice users were excluded.
* Exclude all three categories of local users when rebuilding the
  directory. Previously `_populate_user_directory_process_users` didn't do
  any exclusion.

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2021-10-04 11:45:51 +00:00
David Robertson 3aefc7b66d
Refactor user directory tests (#10935)
* Pull out GetUserDirectoryTables helper
* Don't rebuild the dir in tests that don't need it

In #10796 I changed registering a user to add directory entries under.
This means we don't have to force a directory regbuild in to tests of
the user directory search.

* Move test_initial to tests/storage
* Add type hints to both test_user_directory files

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2021-09-30 11:04:40 +01:00
Patrick Cloke bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
David Robertson 7f3352743e
Improve typing in user_directory files (#10891)
* Improve typing in user_directory files

This makes the user_directory.py in storage pass most of mypy's
checks (including `no-untyped-defs`). Unfortunately that file is in the
tangled web of Store class inheritance so doesn't pass mypy at the moment.

The handlers directory has already been mypyed.

Co-authored-by: reivilibre <olivier@librepush.net>
2021-09-24 10:38:22 +01:00
David Robertson 60453315bd
Always add local users to the user directory (#10796)
It's a simplification, but one that'll help make the user directory logic easier
to follow with the other changes upcoming. It's not strictly required for those
changes, but this will help simplify the resulting logic that listens for
`m.room.member` events and generally make the logic easier to follow.

This means the config option `search_all_users` ends up controlling the
search query only, and not the data we store. The cost of doing so is an
extra row in the `user_directory` and `user_directory_search` tables for
each local user which

- belongs to no public rooms
- belongs to no private rooms of size ≥ 2

I think the cost of this will be marginal (since they'll already have entries
 in `users` and `profiles` anyway).

As a small upside, a homeserver whose directory was built with this
change can toggle `search_all_users` without having to rebuild their
directory.

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2021-09-21 12:02:34 +00:00
Patrick Cloke 01c88a09cd
Use direct references for some configuration variables (#10798)
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
2021-09-13 13:07:12 -04:00
David Robertson 318162f5de
Easy refactors of the user directory (#10789)
No functional changes here. This came out as I was working to tackle #5677
2021-09-10 10:54:38 +01:00
Patrick Cloke bec01c0758
Convert room member storage tuples to attrs. (#10629)
Instead of using namedtuples. This helps with asserting type hints
and code completion.
2021-08-18 09:22:07 -04:00
Erik Johnston 38b346a504
Replace `or_ignore` in `simple_insert` with `simple_upsert` (#10442)
Now that we have `simple_upsert` that should be used in preference to
trying to insert and looking for an exception. The main benefit is that
we ERROR message don't get written to postgres logs.

We also have tidy up the return value on `simple_upsert`, rather than
having a tri-state of inserted/not-inserted/unknown.
2021-07-22 12:39:50 +01:00
Erik Johnston d0aee697ac
Use get_current_users_in_room from store and not StateHandler (#9910) 2021-05-05 16:49:34 +01:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Andrew Morgan 0a363f9ca4
Remove cache for get_shared_rooms_for_users (#9416)
This PR remove the cache for the `get_shared_rooms_for_users` storage method (the db method driving the experimental "what rooms do I share with this user?" feature: [MSC2666](https://github.com/matrix-org/matrix-doc/pull/2666)). Currently subsequent requests to the endpoint will return the same result, even if your shared rooms with that user have changed.

The cache was added in https://github.com/matrix-org/synapse/pull/7785, but we forgot to ensure it was invalidated appropriately.

Upon attempting to invalidate it, I found that the cache had to be entirely invalidated whenever a user (remote or local) joined or left a room. This didn't make for a very useful cache, especially for a function that may or may not be called very often. Thus, I've opted to remove it instead of invalidating it.
2021-02-22 16:52:45 +00:00
Andrew Morgan 13e9029f44
Add a config option to prioritise local users in user directory search results (#9383)
This PR adds a homeserver config option, `user_directory.prefer_local_users`, that when enabled will show local users higher in user directory search results than remote users. This option is off by default.

Note that turning this on doesn't necessarily mean that remote users will always be put below local users, but they should be assuming all other ranking factors (search query match, profile information present etc) are identical.

This is useful for, say, University networks that are openly federating, but want to prioritise local students and staff in the user directory over other random users.
2021-02-19 11:02:03 +00:00
Patrick Cloke 43f1c82457
Add back the guard against the user directory stream position not existing. (#9428)
As the comment says, this guard was there for when the
initial user directory update has yet to happen.
2021-02-18 08:44:19 -05:00
Eric Eastwood 0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Patrick Cloke 1baab20352
Add type hints to various handlers. (#9223)
With this change all handlers except the e2e_* ones have
type hints enabled.
2021-01-26 10:50:21 -05:00
Brendan Abolivier f2783fc201
Use the simple dictionary in full text search for the user directory (#8959)
* Use the simple dictionary in fts for the user directory

* Clarify naming
2020-12-17 14:42:30 +01:00
Patrick Cloke be2db93b3c
Do not assume that the contents dictionary includes history_visibility. (#8945) 2020-12-16 08:46:37 -05:00
Erik Johnston 19b15d63e8
Use autocommit mode for single statement DB functions. (#8542)
Autocommit means that we don't wrap the functions in transactions, and instead get executed directly. Introduced in #8456. This will help:

1. reduce the number of `could not serialize access due to concurrent delete` errors that we see (though there are a few functions that often cause serialization errors that we don't fix here);
2. improve the DB performance, as it no longer needs to deal with the overhead of `REPEATABLE READ` isolation levels; and
3. improve wall clock speed of these functions, as we no longer need to send `BEGIN` and `COMMIT` to the DB.

Some notes about the differences between autocommit mode and our default `REPEATABLE READ` transactions:

1. Currently `autocommit` only applies when using PostgreSQL, and is ignored when using SQLite (due to silliness with [Twisted DB classes](https://twistedmatrix.com/trac/ticket/9998)).
2. Autocommit functions may get retried on error, which means they can get applied *twice* (or more) to the DB (since they are not in a transaction the previous call would not get rolled back). This means that the functions need to be idempotent (or otherwise not care about being called multiple times). Read queries, simple deletes, and updates/upserts that replace rows (rather than generating new values from existing rows) are all idempotent.
3. Autocommit functions no longer get executed in [`REPEATABLE READ`](https://www.postgresql.org/docs/current/transaction-iso.html) isolation level, and so data can change queries, which is fine for single statement queries.
2020-10-14 15:50:59 +01:00
Patrick Cloke 8a4a4186de
Simplify super() calls to Python 3 syntax. (#8344)
This converts calls like super(Foo, self) -> super().

Generated with:

    sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
2020-09-18 09:56:44 -04:00
Will Hunt b257c788c0
Add /user/{user_id}/shared_rooms/ api (#7785)
* Add shared_rooms api

* Add changelog

* Add .

* Wrap response in {"rooms": }

* linting

* Add unstable_features key

* Remove options from isort that aren't part of 5.x

`-y` and `-rc` are now default behaviour and no longer exist.

`dont-skip` is no longer required

https://timothycrosley.github.io/isort/CHANGELOG/#500-penny-july-4-2020

* Update imports to make isort happy

* Add changelog

* Update tox.ini file with correct invocation

* fix linting again for isort

* Vendor prefix unstable API

* Fix to match spec

* import Codes

* import Codes

* Use FORBIDDEN

* Update changelog.d/7785.feature

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

* Implement get_shared_rooms_for_users

* a comma

* trailing whitespace

* Handle the easy feedback

* Switch to using runInteraction

* Add tests

* Feedback

* Seperate unstable endpoint from v2

* Add upgrade node

* a line

* Fix style by adding a blank line at EOF.

* Update synapse/storage/databases/main/user_directory.py

Co-authored-by: Tulir Asokan <tulir@maunium.net>

* Update synapse/storage/databases/main/user_directory.py

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

* Update UPGRADE.rst

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

* Fix UPGRADE/CHANGELOG unstable paths

unstable unstable unstable

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
2020-09-02 13:18:40 +01:00
Patrick Cloke b939251c37
Fix errors when updating the user directory with invalid data (#8223) 2020-09-01 13:02:41 -04:00
Patrick Cloke 37db6252b7
Convert additional databases to async/await part 3 (#8201) 2020-09-01 11:04:17 -04:00
Patrick Cloke 4a739c73b4
Convert simple_update* and simple_select* to async (#8173) 2020-08-27 07:08:38 -04:00
Patrick Cloke 4c6c56dc58
Convert simple_select_one and simple_select_one_onecol to async (#8162) 2020-08-26 07:19:32 -04:00
Patrick Cloke f3fe6961b2
Convert additional database stores to async/await (#8045) 2020-08-07 12:17:17 -04:00
Erik Johnston a7bdf98d01
Rename database classes to make some sense (#8033) 2020-08-05 21:38:57 +01:00