Unicode and utf8
Handle multibyte text.
Why Unicode Matters
Modern text contains accents, emoji, and non-Latin scripts. Unicode gives every character a number called a code point.
Go stores strings as UTF-8, a variable-width encoding of those code points.
package main
import "fmt"
func main() {
s := "héllo 世界"
fmt.Println(s)
}Bytes per Character Vary
In UTF-8 a character uses 1 to 4 bytes:
- ASCII letters: 1 byte.
- Accented Latin: usually 2 bytes.
- CJK characters: usually 3 bytes.
package main
import "fmt"
func main() {
fmt.Println(len("a"))
fmt.Println(len("é"))
fmt.Println(len("世"))
}All lessons in this course
- Runes vs Bytes
- strings Package Functions
- strings.Builder
- Unicode and utf8