HTML5 Character Encoding

HTML5 Character Encoding: The character encoding comes into action when they are a huge number of characters are available. There are many characters are available in HTML 5 like Latin letters, Arabic numbers, mathematical symbols, foreign alphabets, and some other special characters. In HTML documents they are different character encoding that define them differently.

HTML5 Character Encoding

An incorrect text can leads to many varieties of issues.

  • Users may unable to read the text correctly.
  • The search engines may cant find the data.
  • The machines cant process the information.

The charsets are the available characters that are grouped together.

ASCII – HTML5 Character Encoding

The ASCII is the basic Charset of character encoding. The ASCII is American Standard Code for Information Interchange which contains 128 characters and 95 of them are printable.

  • Latin letters in lower case
  • Latin letters in upper case
  • Punctuation Symbols
  • 0 to 9 Numbers.

In 128 characters, 95 of them are printable and the remaining 33 characters are control characters which are transparent symbols. However, the ASCII gained more popularity on the internet but it supports only Latin characters.

UTF-8 – HTML5 Character Encoding

The Unicode was published in early 1990 with few charsets like UTF-8, UTF-16, and UTF-32. The Unicode is the industry standard that is used for character encoding.
The UFT-8 is the Unicode Transformation Format-8 which is a more popular HTML character encoding from 2008. At the time of 2019, most of the websites use UTF-8 and it is recommended by W3C as default HTML character encoding.

Why do we use UTF-8

  • UTF-8 supports multiple languages
  • The UFT-8 is compatible with ASCII
  • XML uses the UFT-8
  • It occupies less space than other Unicode encodings.

In HTML document, if you want to use UTF-8 then use in the <meta> tag with the charset attribute & UTF-8 as value.