Module validations

Source
Expand description

Operations related to UTF-8 validation.

ConstantsΒ§

CONT_MASK πŸ”’
Mask of the value bits of a continuation byte.
NONASCII_MASK πŸ”’
UTF8_CHAR_WIDTH πŸ”’

FunctionsΒ§

contains_nonascii πŸ”’
Returns true if any byte in the word x is nonascii (>= 128).
next_code_point_reverse πŸ”’ ⚠
Reads the last code point out of a byte iterator (assuming a UTF-8-like encoding).
run_utf8_validation πŸ”’
Walks through v checking that it’s a valid UTF-8 sequence, returning Ok(()) in that case, or, if it is invalid, Err(err).
utf8_acc_cont_byte πŸ”’
Returns the value of ch updated with continuation byte byte.
utf8_first_byte πŸ”’
Returns the initial codepoint accumulator for the first byte. The first byte is special, only want bottom 5 bits for width 2, 4 bits for width 3, and 3 bits for width 4.
utf8_is_cont_byte πŸ”’
Checks whether the byte is a UTF-8 continuation byte (i.e., starts with the bits 10).
next_code_point⚠Experimental
Reads the next code point out of a byte iterator (assuming a UTF-8-like encoding).
utf8_char_widthExperimental
Given a first byte, determines how many bytes are in this UTF-8 character.