Trait std::char::CharExtStable [-] [+] [src]

pub trait CharExt {
    fn is_digit(self, radix: u32) -> bool;
    fn to_digit(self, radix: u32) -> Option<u32>;
    fn escape_unicode(self) -> EscapeUnicode;
    fn escape_default(self) -> EscapeDefault;
    fn len_utf8(self) -> usize;
    fn len_utf16(self) -> usize;
    fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>;
    fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>;
    fn is_alphabetic(self) -> bool;
    fn is_xid_start(self) -> bool;
    fn is_xid_continue(self) -> bool;
    fn is_lowercase(self) -> bool;
    fn is_uppercase(self) -> bool;
    fn is_whitespace(self) -> bool;
    fn is_alphanumeric(self) -> bool;
    fn is_control(self) -> bool;
    fn is_numeric(self) -> bool;
    fn to_lowercase(self) -> char;
    fn to_uppercase(self) -> char;
    fn width(self, is_cjk: bool) -> Option<usize>;
}

Functionality for manipulating char.

Required Methods

fn is_digit(self, radix: u32) -> bool

Checks if a char parses as a numeric digit in the given radix.

Compared to is_numeric(), this function only recognizes the characters 0-9, a-z and A-Z.

Return value

Returns true if c is a valid digit under radix, and false otherwise.

Panics

Panics if given a radix > 36.

fn to_digit(self, radix: u32) -> Option<u32>

Converts a character to the corresponding digit.

Return value

If c is between '0' and '9', the corresponding value between 0 and 9. If c is 'a' or 'A', 10. If c is 'b' or 'B', 11, etc. Returns none if the character does not refer to a digit in the given radix.

Panics

Panics if given a radix outside the range [0..36].

fn escape_unicode(self) -> EscapeUnicode

Returns an iterator that yields the hexadecimal Unicode escape of a character, as chars.

All characters are escaped with Rust syntax of the form \\u{NNNN} where NNNN is the shortest hexadecimal representation of the code point.

fn escape_default(self) -> EscapeDefault

Returns an iterator that yields the 'default' ASCII and C++11-like literal escape of a character, as chars.

The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:

  • Tab, CR and LF are escaped as '\t', '\r' and '\n' respectively.
  • Single-quote, double-quote and backslash chars are backslash- escaped.
  • Any other chars in the range [0x20,0x7e] are not escaped.
  • Any other chars are given hex Unicode escapes; see escape_unicode.

fn len_utf8(self) -> usize

Returns the amount of bytes this character would need if encoded in UTF-8.

fn len_utf16(self) -> usize

Returns the amount of bytes this character would need if encoded in UTF-16.

fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>

Encodes this character as UTF-8 into the provided byte buffer, and then returns the number of bytes written.

If the buffer is not large enough, nothing will be written into it and a None will be returned.

fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>

Encodes this character as UTF-16 into the provided u16 buffer, and then returns the number of u16s written.

If the buffer is not large enough, nothing will be written into it and a None will be returned.

fn is_alphabetic(self) -> bool

Returns whether the specified character is considered a Unicode alphabetic code point.

fn is_xid_start(self) -> bool

Returns whether the specified character satisfies the 'XID_Start' Unicode property.

'XID_Start' is a Unicode Derived Property specified in UAX #31, mostly similar to ID_Start but modified for closure under NFKx.

fn is_xid_continue(self) -> bool

Returns whether the specified char satisfies the 'XID_Continue' Unicode property.

'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.

fn is_lowercase(self) -> bool

Indicates whether a character is in lowercase.

This is defined according to the terms of the Unicode Derived Core Property Lowercase.

fn is_uppercase(self) -> bool

Indicates whether a character is in uppercase.

This is defined according to the terms of the Unicode Derived Core Property Uppercase.

fn is_whitespace(self) -> bool

Indicates whether a character is whitespace.

Whitespace is defined in terms of the Unicode Property White_Space.

fn is_alphanumeric(self) -> bool

Indicates whether a character is alphanumeric.

Alphanumericness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.

fn is_control(self) -> bool

Indicates whether a character is a control code point.

Control code points are defined in terms of the Unicode General Category Cc.

fn is_numeric(self) -> bool

Indicates whether the character is numeric (Nd, Nl, or No).

fn to_lowercase(self) -> char

Converts a character to its lowercase equivalent.

The case-folding performed is the common or simple mapping. See to_uppercase() for references and more information.

Return value

Returns the lowercase equivalent of the character, or the character itself if no conversion is possible.

fn to_uppercase(self) -> char

Converts a character to its uppercase equivalent.

The case-folding performed is the common or simple mapping: it maps one Unicode codepoint (one character in Rust) to its uppercase equivalent according to the Unicode database 1. The additional [SpecialCasing.txt] is not considered here, as it expands to multiple codepoints in some cases.

A full reference can be found here 2.

Return value

Returns the uppercase equivalent of the character, or the character itself if no conversion was made.

fn width(self, is_cjk: bool) -> Option<usize>

Returns this character's displayed width in columns, or None if it is a control character other than '\x00'.

is_cjk determines behavior for characters in the Ambiguous category: if is_cjk is true, these are 2 columns wide; otherwise, they are 1. In CJK contexts, is_cjk should be true, else it should be false. Unicode Standard Annex #11 recommends that these characters be treated as 1 column (i.e., is_cjk = false) if the context cannot be reliably determined.

Implementors