0Pricing
Go Academy · Lesson

Unicode and utf8

Handle multibyte text.

Why Unicode Matters

Modern text contains accents, emoji, and non-Latin scripts. Unicode gives every character a number called a code point.

Go stores strings as UTF-8, a variable-width encoding of those code points.

package main

import "fmt"

func main() {
    s := "héllo 世界"
    fmt.Println(s)
}

Bytes per Character Vary

In UTF-8 a character uses 1 to 4 bytes:

  • ASCII letters: 1 byte.
  • Accented Latin: usually 2 bytes.
  • CJK characters: usually 3 bytes.
package main

import "fmt"

func main() {
    fmt.Println(len("a"))
    fmt.Println(len("é"))
    fmt.Println(len("世"))
}

All lessons in this course

  1. Runes vs Bytes
  2. strings Package Functions
  3. strings.Builder
  4. Unicode and utf8
← Back to Go Academy