Added initial draft for human-readable ID rules.
parent
550e8f32ac
commit
d5704cf2a3
|
@ -0,0 +1,71 @@
|
|||
This document outlines the format for human-readable IDs within matrix.
|
||||
|
||||
Overview
|
||||
--------
|
||||
UTF-8 is quickly becoming the standard character encoding set on the web. As
|
||||
such, Matrix requires that all strings MUST be encoded as UTF-8. However,
|
||||
using Unicode as the character set for human-readable IDs is troublesome. There
|
||||
are many different characters which appear identical to each other, but would
|
||||
identify different users. In addition, there are non-printable characters which
|
||||
cannot be rendered the the end-user. This opens up a security vulnerability with
|
||||
phishing/spoofing of IDs, commonly known as a homograph attack.
|
||||
|
||||
Web browers encountered this problem when International Domain Names were
|
||||
introduced. A variety of checks were put in place in order to protect users. If
|
||||
an address failed the check, the raw punycode would be displayed to disambiguate
|
||||
the address. Similar checks are performed by home servers in Matrix, which will
|
||||
then warn the client about the potentially misleading ID. However, Matrix does
|
||||
not use punycode, and so does not show raw punycode on a failed check. Instead,
|
||||
home servers must outright reject these misleading IDs.
|
||||
|
||||
Types of human-readable IDs
|
||||
---------------------------
|
||||
There are two main human-readable IDs in question:
|
||||
|
||||
- Room aliases
|
||||
- User IDs
|
||||
|
||||
Room aliases look like ``#localpart:domain``. These aliases point to opaque
|
||||
non human-readable room IDs. These pointers can change, so there is already an
|
||||
issue present with the same ID pointing to a different destination at a later
|
||||
date.
|
||||
|
||||
User IDs look like ``@localpart:domain``. These represent actual end-users, and
|
||||
unlike room aliases, there is no layer of indirection. This presents a much
|
||||
greater concern with homograph attacks.
|
||||
|
||||
Checks
|
||||
------
|
||||
- Similar to web browsers.
|
||||
- blacklisted chars (e.g. non-printable characters)
|
||||
- mix of language sets from 'preferred' language not allowed.
|
||||
- Language sets from CLDR dataset.
|
||||
- Treated in segments (localpart, domain)
|
||||
|
||||
Rejecting
|
||||
---------
|
||||
- Home servers MUST reject room aliases which do not pass the check, both on
|
||||
GETs and PUTs.
|
||||
- Home servers MUST reject user ID localparts which do not pass the check, both
|
||||
on creation and on events.
|
||||
- Any home server whose domain does not pass this check, MUST use their punycode
|
||||
domain name instead of the IDN, to prevent other home servers rejecting you.
|
||||
- Error code is M_FAILED_HOMOGRAPH_CHECK.
|
||||
- Error message MAY go into further information about which characters were
|
||||
rejected and why.
|
||||
|
||||
Other considerations
|
||||
--------------------
|
||||
- Basic security: Informational key on the event attached by HS to say "unsafe
|
||||
ID". Problem: clients can just ignore it, and since it will appear only very
|
||||
rarely, easy to forget when implementing clients.
|
||||
- Moderate security: Requires client handshake. Forces clients to implement
|
||||
a check, else they cannot communicate with the misleading ID. However, this is
|
||||
extra overhead in both client implementations and round-trips.
|
||||
- High security: Outright rejection of the ID at the point of creation /
|
||||
receiving event. Point of creation rejection is preferable to avoid the ID
|
||||
entering the system in the first place. However, malicious HSes can just allow
|
||||
the ID. Hence, other home servers must reject them if they see them in events.
|
||||
Client never sees the problem ID, provided the HS is correctly implemented.
|
||||
- High security decided; client doesn't need to worry about it, no additional
|
||||
protocol complexity aside from rejection of an event.
|
Loading…
Reference in New Issue