Expand description
Operations related to UTF-8 validation.
ConstantsΒ§
- CONT_
MASK π - Mask of the value bits of a continuation byte.
- NONASCII_
MASK π - UTF8_
CHAR_ πWIDTH
FunctionsΒ§
- contains_
nonascii π - Returns
true
if any byte in the wordx
is nonascii (>= 128). - next_
code_ π βpoint_ reverse - Reads the last code point out of a byte iterator (assuming a UTF-8-like encoding).
- run_
utf8_ πvalidation - Walks through
v
checking that itβs a valid UTF-8 sequence, returningOk(())
in that case, or, if it is invalid,Err(err)
. - utf8_
acc_ πcont_ byte - Returns the value of
ch
updated with continuation bytebyte
. - utf8_
first_ πbyte - Returns the initial codepoint accumulator for the first byte. The first byte is special, only want bottom 5 bits for width 2, 4 bits for width 3, and 3 bits for width 4.
- utf8_
is_ πcont_ byte - Checks whether the byte is a UTF-8 continuation byte (i.e., starts with the
bits
10
). - next_
code_ βpoint Experimental - Reads the next code point out of a byte iterator (assuming a UTF-8-like encoding).
- utf8_
char_ width Experimental - Given a first byte, determines how many bytes are in this UTF-8 character.