Converting Gurmukhi to Unicode

From SikhiWiki
Revision as of 00:28, 5 September 2020 by Hari singh (talk | contribs) (→‎Unicode)
Jump to navigationJump to search

Before the Unicode system for the representation of characters was functional, it was common for Gurmukhi characters to be represented in using part of the existing ASCII character set in custom font sets. Examples of these ASCII Punjabi fonts are:

  • Gurbani Akhar
  • Anmol Lipi
  • Amar Lipi
  • Bulara
  • Prabhki
  • Raaj
  • Gurbani Web Thick
  • Web Akhar Slim
  • Web Lipi Heavy

Unicode Gurmukhi

Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 Indian Script Code for Information Interchange ("ISCII") standard. The Devanagari (Hindi, Marathi, Sanskrit, Konkani), Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Unicode

Unicode is an information technology (IT) standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, there is a repertoire of 143,859 characters, with Unicode 13.0 (these characters consist of 143,696 graphic characters and 163 format characters) covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

The Unicode Standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference data files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional text display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts).[1]

Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, Java (and other programming languages), and the .NET Framework.

Unicode can be implemented by different character encodings. The Unicode standard defines UTF-8, UTF-16, and UTF-32, and several other encodings are in use. The most commonly used encodings are UTF-8, UTF-16, and UCS-2 (a precursor of UTF-16 without full support for Unicode); GB18030 is standardized in China and implements Unicode fully, while not an official Unicode standard.


Unicode Block

A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Each block is generally, but not always, meant to include all the glyphs used by one or more specific languages, or in some general application area such as mathematics, surveying, decorative typesetting, social forums, etc.

Code pages for ISCII conversion

To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:

  • 57002: Devanagari (Hindi, Marathi, Sanskrit, Konkani)
  • 57003: Bengali
  • 57004: Tamil
  • 57005: Telugu
  • 57006: Assamese
  • 57007: Odia
  • 57008: Kannada
  • 57009: Malayalam
  • 57010: Gujarati
  • 57011: Punjabi (Gurmukhi)

Block

Gurmukhi1
Unicode.org chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+0A0x
U+0A1x
U+0A2x
U+0A3x ਿ
U+0A4x
U+0A5x
U+0A6x
U+0A7x
Notes
1.^ As of Unicode version 6.0


External Links