Trait std::char::CharExtStable
[-] [+]
[src]
pub trait CharExt { fn is_digit(self, radix: u32) -> bool; fn to_digit(self, radix: u32) -> Option<u32>; fn escape_unicode(self) -> EscapeUnicode; fn escape_default(self) -> EscapeDefault; fn len_utf8(self) -> usize; fn len_utf16(self) -> usize; fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>; fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>; fn is_alphabetic(self) -> bool; fn is_xid_start(self) -> bool; fn is_xid_continue(self) -> bool; fn is_lowercase(self) -> bool; fn is_uppercase(self) -> bool; fn is_whitespace(self) -> bool; fn is_alphanumeric(self) -> bool; fn is_control(self) -> bool; fn is_numeric(self) -> bool; fn to_lowercase(self) -> char; fn to_uppercase(self) -> char; fn width(self, is_cjk: bool) -> Option<usize>; }
Functionality for manipulating char
.
Required Methods
fn is_digit(self, radix: u32) -> bool
Checks if a char
parses as a numeric digit in the given radix.
Compared to is_numeric()
, this function only recognizes the characters
0-9
, a-z
and A-Z
.
Return value
Returns true
if c
is a valid digit under radix
, and false
otherwise.
Panics
Panics if given a radix > 36.
fn to_digit(self, radix: u32) -> Option<u32>
Converts a character to the corresponding digit.
Return value
If c
is between '0' and '9', the corresponding value between 0 and
9. If c
is 'a' or 'A', 10. If c
is 'b' or 'B', 11, etc. Returns
none if the character does not refer to a digit in the given radix.
Panics
Panics if given a radix outside the range [0..36].
fn escape_unicode(self) -> EscapeUnicode
Returns an iterator that yields the hexadecimal Unicode escape
of a character, as char
s.
All characters are escaped with Rust syntax of the form \\u{NNNN}
where NNNN
is the shortest hexadecimal representation of the code
point.
fn escape_default(self) -> EscapeDefault
Returns an iterator that yields the 'default' ASCII and
C++11-like literal escape of a character, as char
s.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
- Tab, CR and LF are escaped as '\t', '\r' and '\n' respectively.
- Single-quote, double-quote and backslash chars are backslash- escaped.
- Any other chars in the range [0x20,0x7e] are not escaped.
- Any other chars are given hex Unicode escapes; see
escape_unicode
.
fn len_utf8(self) -> usize
Returns the amount of bytes this character would need if encoded in UTF-8.
fn len_utf16(self) -> usize
Returns the amount of bytes this character would need if encoded in UTF-16.
fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>
Encodes this character as UTF-8 into the provided byte buffer, and then returns the number of bytes written.
If the buffer is not large enough, nothing will be written into it
and a None
will be returned.
fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>
Encodes this character as UTF-16 into the provided u16
buffer,
and then returns the number of u16
s written.
If the buffer is not large enough, nothing will be written into it
and a None
will be returned.
fn is_alphabetic(self) -> bool
Returns whether the specified character is considered a Unicode alphabetic code point.
fn is_xid_start(self) -> bool
Returns whether the specified character satisfies the 'XID_Start' Unicode property.
'XID_Start' is a Unicode Derived Property specified in UAX #31, mostly similar to ID_Start but modified for closure under NFKx.
fn is_xid_continue(self) -> bool
Returns whether the specified char
satisfies the 'XID_Continue'
Unicode property.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
fn is_lowercase(self) -> bool
Indicates whether a character is in lowercase.
This is defined according to the terms of the Unicode Derived Core
Property Lowercase
.
fn is_uppercase(self) -> bool
Indicates whether a character is in uppercase.
This is defined according to the terms of the Unicode Derived Core
Property Uppercase
.
fn is_whitespace(self) -> bool
Indicates whether a character is whitespace.
Whitespace is defined in terms of the Unicode Property White_Space
.
fn is_alphanumeric(self) -> bool
Indicates whether a character is alphanumeric.
Alphanumericness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
fn is_control(self) -> bool
Indicates whether a character is a control code point.
Control code points are defined in terms of the Unicode General
Category Cc
.
fn is_numeric(self) -> bool
Indicates whether the character is numeric (Nd, Nl, or No).
fn to_lowercase(self) -> char
Converts a character to its lowercase equivalent.
The case-folding performed is the common or simple mapping. See
to_uppercase()
for references and more information.
Return value
Returns the lowercase equivalent of the character, or the character itself if no conversion is possible.
fn to_uppercase(self) -> char
Converts a character to its uppercase equivalent.
The case-folding performed is the common or simple mapping: it maps
one Unicode codepoint (one character in Rust) to its uppercase
equivalent according to the Unicode database 1. The additional
[SpecialCasing.txt
] is not considered here, as it expands to multiple
codepoints in some cases.
A full reference can be found here 2.
Return value
Returns the uppercase equivalent of the character, or the character itself if no conversion was made.
fn width(self, is_cjk: bool) -> Option<usize>
Returns this character's displayed width in columns, or None
if it is a
control character other than '\x00'
.
is_cjk
determines behavior for characters in the Ambiguous category:
if is_cjk
is true
, these are 2 columns wide; otherwise, they are 1.
In CJK contexts, is_cjk
should be true
, else it should be false
.
Unicode Standard Annex #11
recommends that these characters be treated as 1 column (i.e.,
is_cjk
= false
) if the context cannot be reliably determined.