UTF-8 and Chars
Unicode handling.
Rust Strings Are UTF-8
Every Rust String and &str is encoded as UTF-8. This means text can include any Unicode character, not just ASCII.
fn main() {
let s = String::from("caf\u{e9}");
println!("{}", s);
}What Is UTF-8?
UTF-8 encodes each Unicode character using one to four bytes. ASCII characters use a single byte; accented letters and emoji use more.
fn main() {
let ascii = "A"; // 1 byte
let accent = "\u{e9}"; // 2 bytes (e-acute)
println!("{} bytes vs {} bytes", ascii.len(), accent.len());
}All lessons in this course
- String vs &str
- Slices
- String Methods
- UTF-8 and Chars