The History of Digital Ogam: How it came to be, and the challenges it faced along the way; Guest blog by Adrian Doyle

We are grateful to Adrian Doyle for contributing a guest blog this month. He is the creator of the Würzburg Irish Glosses website (wurzburg.ie) and is currently completing a PhD researching Natural Language Processing techniques for Old Irish at the University of Galway. He is a research associate at the Insight Centre for Data Analytics at the University of Galway. Adrian writes:

The introduction of ogam characters to Unicode was mentioned in an earlier blog post on this site. The aim of that blog was to identify certain implications which arise as a result of the way ogam was implemented in digital format, some of which may impede attempts to represent historical ogams particularly accurately. Nevertheless, it concluded that as a result of its digital implementation, ogam is now more widely accessible than it has ever been. What was not mentioned in that blog was that the introduction of ogam to Unicode initially faced strong opposition from Joseph Becker, the original creator of the Unicode standard. This blog will discuss why Becker initially opposed its introduction, and how it eventually came to take its place among other digital scripts. To do so, however, it is first necessary to discuss how computers “comprehend” and interact with digital text.

A major commonality shared between modern computers and their predecessors, telegraph machines and early punch card computers, is that at a fundamental level they process information in binary format. In order for a machine to interact with text it must be represented as a string of ones and zeros. Every text character, every numeral, every punctuation mark fed into a computer by a user typing on a keyboard is mapped to such a binary string, and is interpreted by the machine not as text but numerically.

This mapping of characters to binary strings is referred to as encoding, and the practice has been in wide use for at least two centuries, serving a variety of purposes. The braille alphabet, for example, developed in 1824, initially encoded the letters of the French alphabet in order to make them accessible to blind readers. As each braille letter is comprised of between one and six dots, arranged in a 2×3 matrix, the alphabet represents an early example of binary character encoding. Another notable early example of binary encoding is morse code, in which characters are represented as a series of dots and dashes. These early means of encoding characters made it possible to transmit information by mechanical means, whether by the movement of a finger over a page, or the electrical pulses produced by a telegraph machine. In this respect, they are comparable to modern standards for encoding text, such as Unicode.

A table showing how different decimal numbers are represented in different forms of binary.
Table 1: Mapping of Decimal Numbers to Binary Using 2-bit, 4-bit, 7-bit and 8-bit Bytes.

The strings of ones and zeros to which text characters are mapped are known as bytes. The length of each byte is measured in bits, with each “bit” of information being comprised of a single binary digit (one or zero). While bits are the basic units used in computing, the number of bits which make up a byte is also very important. It is only when two or more bits of information are combined to form a byte that this combination can represent something more substantial than the simple values, 1 and 0. In order to represent letters, for example, each letter character needs to be mapped to a unique byte (a = 01100001, b =  01100010, c = 01100011, and so on). The number of unique bytes available to be mapped to text characters is dependent on the number of bits which make up each byte. If the length of a byte is too small, there may not be enough unique variants available to map all required characters. This is exemplified in Table 1, which demonstrates mappings of decimal numerals to bytes of various lengths. As can be seen in the table, 2-bit bytes run out of unique variants after mapping only four (22) numeral characters, while 4-bit bytes can represent up to sixteen (24) distinct characters.

To represent an alphabet containing 26 letters, therefore, would require bytes to be at least five bits in length, yielding thirty-two (25) potential mappings. To encode numerals, punctuation marks, and both upper and lower-case letters would require bytes to be longer again. The larger bytes are, however, the more difficult they are for a machine to process, and this posed a particular problem for underpowered early computers. A difficult balance had to be struck between the desire to encode many discrete text characters, and the requirement to keep bytes as small as possible.

Another table showing how characters are mapped to 7-bit Bytes in ASCII.
Table 2: ASCII Character Mapping to 7-bit Bytes (control characters in italics).
Columns = first three bits in byte, Rows = last four bits in byte
(eg. a = 110 0001, b = 110 0010, c = 110 0011, etc.).

The American Standard Code for Information Interchange (ASCII)1, first released in 1963, encoded the twenty-six letters of the English alphabet (majuscule and minuscule) along with numerals, some punctuation marks, and a selection of control characters using 7-bit mappings. These mappings, which can be seen in Table 2, were reasonably well suited to the processing capabilities of the time, and ASCII became widely adopted as a standard in telegraphy. It was not well suited to representing the character-sets of languages other than English, however, even for languages which also make us of the Latin alphabet. In an attempt to address this issue, the European Computer Manufacturers Association agreed upon the ECMA-6 standard2 as early as 1965. Among other benefits, ECMA-6 made it possible to represent a variety of diacritic marks which are common in European languages other than English by combining the letters of the basic Latin alphabet with multi-use punctuation characters.3 For example, the combination i + BACKSPACE + on a typewriter would overwrite the letter i with an apostrophe, producing an acute accent above the letter.4 While this creative solution enabled the use of more diacritics than could normally be encoded using only 7-bit bytes, the overall potential for character variety in this standard was still limited. The encoding of characters from more than one alphabet at a time could not be accomplished with bytes of this size.

By the time that Joseph Becker first described Unicode in 1988, computing power had far surpassed the limitations which had restricted character encodings to 7-bits in the 1960s. Even computers intended for home use, like the IBM PC-XT 286 and the Apple Macintosh, contained 16/32-bit microprocessors. Still, the 7-bit and 8-bit character encodings of two decades past remained the most commonly utilised, limiting the number of characters which could be represented to, at most, 256. Becker intended  Unicode to enable the consistent encoding of the text characters used by most of the world’s writing systems, all within the one standard. This would necessitate using a significantly larger byte size, but Becker described this increase as long overdue:

The idea of expanding the basis for character encoding from 8 to 16 bits is so sensible, indeed so obvious, that the mind initially recoils from it. There must be a catch to it, otherwise why didn’t we think of this long ago?5

The move to using 16-bit encodings would offer significant improvements over earlier standards like ASCII and ECMA-6, increasing the number of potential characters from 128 (27) to 65,536 (216). Becker suggested that, if a reasonable definition of a “character” was used, this would be “sufficient to encode all characters of all the world’s scripts”6. He was clear, however, that this was only true of modern writing systems. Therefore, Unicode was not initially intended to support historical writing systems like ogam:

Unicode gives higher priority to ensuring the utility for the future than to preserving past antiquities. Unicode aims in the first instance at the characters published in modern text (e.g. in the union of all newspapers and magazines printed in the world in 1988) whose number is undoubtedly far below 214 = 16,384. Beyond those modern-use characters, all others may be defined to be obsolete or rare; these are better candidates for private use registration than for congesting the public list of generally-useful Unicodes.

… one can decide up-front that preserving a pure 16-bit architecture has a higher design priority than publicly encoding every extinct or obscure character form. Then the sufficiency of 16 bits for the writing technology of the future becomes a matter of our active intention, rather than passive victimization by writing systems of the past.7

Linguists may disagree with the characterisation of one script or another as “extinct”, “obscure”, or “congesting the public list of generally-useful Unicodes”, however, Becker’s exclusion of rare or historical scripts was arguably justified at the time. The inclusion of a wide range of historical characters and symbols would likely exceed even the 16-bit character limit, and more powerful 32-bit computers would not become dominant in the market until the mid-90s. Nevertheless, it was during this period that the earliest attempts were made to have ogam included in the Unicode standard.

When the inclusion of ogam in Unicode was proposed by Michael Everson in 1994, perhaps in an attempt to circumvent the ban on historical writing systems, he made the case that “Ogham enjoys marginal but continued use in Ireland, the Isle of Man, and Scotland”.8 He argued that, while “Use of Ogham is relatively rare in terms of numbers… it is found in many specialized contexts. For instance, one can buy T-shirts, jewellery, and cards which make use of the Ogham script.” Everson suggested that ogam “is used by a relatively small community of scholars studying early Irish inscriptions”, though this would hardly have satisfied Becker’s stated requirements for the script’s inclusion in Unicode. Possibly for this very reason Everson continued, “individuals at many levels have learned Ogham and used it for one purpose or another in recent times. (It might be said that schoolchildren writing notes to one another in Ogham script might be producing texts more interesting than the monumental texts in stone which form the corpus of Ogham texts studied by scholars!)”

Even as this proposal was being considered, computer technology progressed quickly throughout the 90s, and the release of Unicode version 2.0 in 19969 saw the implementation of a “surrogate character mechanism” which increased the Unicode codespace to over a million points. This alleviated earlier concerns that the available codespace might become congested if historical scripts were included, and in the following years many such scripts were finally adopted into the standard. The most recent version of Unicode as of this writing gives the impression that the exclusion of historical scripts is well and truly a thing of the past:

The Unicode Standard is designed to meet the needs of diverse user communities within each language, … covering the needs of both modern and historical texts.10

Ogam characters were eventually included in Unicode version 3.0 in 199911. This year will mark its twenty-fifth year as a part of the Unicode standard, and the third decade since its inclusion was proposed by Everson. In that time, relatively little about the digital script has changed. Everson’s original proposal suggested that twenty-seven ogam characters should be introduced to Unicode. Some of the Middle Irish spellings originally provided for the character names (FERN, NUIN, HUATH, CERT, NGETAL, EDAD, IDAD, EBAD, UILEN, EMANCHOLL) were replaced with standardised modern Irish forms, albeit lacking the síneadh fada (FEARN, NION, UATH, CEIRT, NGEDAL, EADHADH, IODHADH, EABHADH, UILLEANN, EAMHANCHOLL), and two English descriptions (OGHAM SPACE MARK and OGHAM FEATHER MARK) have replaced the Irish names which were initially suggested (OGHAM SIGN BEARNA and OGHAM SIGN SAIGHEAD). Two further characters have also been added to the twenty-seven originally proposed (PEITH, and OGHAM REVERSED FEATHER MARK).

Several statements regarding the script, alongside recommendations for its usage, which can be found in the most recent version of the Unicode standard12, can be traced all the way back to Everson’s 1994 proposal. The requirement that “Ogham should … be rendered on computers from left to right or from bottom to top (never starting from top to bottom)”13 has been copied word-for-word into version 15.0 from Everson’s initial proposal, and his note proposing that the character, OGHAM SIGN SAIGHEAD, “marks the direction in which a text is to be read (in later Ogham)”14 is still reflected in version 15.0’s more tentative claim that “In some cases, only the Ogham feather mark is used, which can indicate the direction of the text”.15

The handful of changes mentioned above had already been introduced by the time ogam was incorporated into Unicode in 1999. The script has remained unchanged ever since. A suggestion was made in 2007 by Mark Davis that the OGAM SPACE MARK character should not have the properties of a whitespace character because “Users of the UCD expect that whitespace characters are, well, white space — that is, that they do not have visible glyph in normal usage”.16 This claim was strongly refuted by Michael Everson, however, who wrote, “Mark Davis’ suggestion … seems to be based on a misunderstanding of the use of the character. His assertion … is, in our view, incorrect. Users of the **Ogham script** do, in fact, expect this character to act as a white space”.17 He concluded, “The Irish National Body requests that the Unicode Technical Committee to make **NO CHANGE** to the properties of the OGHAM SPACE MARK. It is correctly specified at present.” Everson’s argument seems to have convinced the Unicode Technical Committee, and no change was made to the OGHAM SPACE MARK’s whitespace status as a result of this interaction (see Tom Scott’s excellent account of these events).18 A further, minor disagreement emerged as to whether, given its recently affirmed status as a whitespace character, the Unicode glyph for the OGHAM SPACE MARK should be bounded by a dashed box to ensure consistency with other whitespace characters in the standard,19 however no change seems to have occurred as a result of this suggestion either.20

A forum post where a user named revolvingocelot laments the inclusion of ancient scripts in Unicode but the lack of Klingon characters.
Figure 1a: Criticism of Ogam’s Inclusion in the Unicode Standard.
A response to the forum post where user WorkLobster corrects revolvingocelot, indicating that ogham is in fact in popular usage and that the Unicode is being used. The original lamenter relents and celebrates that ogham is receiving more attention due to its inclusion in Unicode, and turns their attention to other matters.
Figure 1b: Response to Criticism of Ogam’s Inclusion in the Unicode Standard.

From the beginning, ogam faced an uphill battle to gain and maintain appropriate representation in digital format. Despite progress in Unicode’s acceptance of historical scripts in the 90s, and its acceptance of the unconventional whitespace of ogam in the 2000s, the script’s inclusion in the standard can still be a contentious subject even to this day. As recently as 2021 it has drawn the ire of constructed language (conlang) advocates after a user on the Hacker News forum complained that ogam is “unlikely to be used by the vast majority of people, now or in future, simply as a consequence of its antiquity and obsolescence”21 (see Figure 1a). On this basis the argument was made that it would be more reasonable for Unicode to accommodate the writing system of the Star Trek conlang, Klingon. The user seemed happy to relent, however, after being informed that the inclusion of ogam in Unicode has inspired people to interact with it, providing “a nice little way for people to connect and engage with an old part of our culture” (see figure 1b). While this may provide little in the way of consolation to Star Trek fans and conlang enthusiasts, it is worth mentioning that Unicode does set aside a “private use area” of non-defined letters which can be used by anyone to support scripts of their own design. As happenstance would have it, in 1997, two years before the inclusion of ogam in Unicode, a standard for Klingon was created using this private area by none other than Michael Everson.22


  1. American Standard Code for Information Exchange (Standard). (1963). American Standards Association. New York. Retrieved from https://www.sensitiveresearch.com/Archive/CharCodeHist/X3.4-1963/index.html on 21/12/2023. ↩︎
  2. ECMA-6: 7-bit Coded Character Set (5th ed., Standard). (1985). European Computer Manufacturers Association. Geneva. p. 12. Retrieved from https://web.archive.org/web/20160529230908/http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/ECMA-6%2C%205th%20Edition%2C%20March%201985.pdf on 21/12/2023. ↩︎
  3. ibid., p. 12. ↩︎
  4. Irish readers, who may have had the aggravating experience of being told their name “cannot contain special characters” when trying to apply for various online services, will note that this means it has, in fact, been possible to represent the síneadh fada in digital text since before the first Moon landing or Woodstock. This diacritic has been supported by ECMA-6, and by following standards, for almost six decades as of the time of this writing. ↩︎
  5. Becker, J. D. (1988). Unicode 88 (Standard). Unicode Consortium. Palo Alto. p. 4. ↩︎
  6. ibid. ↩︎
  7. ibid., p. 5. ↩︎
  8. Everson, M. (1994). Proposal for encoding the Ogham script in ISO 10646. Retrieved from https://www.evertype.com/standards/og/ogham.html on 21/12/2023. ↩︎
  9. (1996). Components of The Unicode Standard Version 2.0.0. Unicode Consortium. Retrieved from https://www.unicode.org/versions/components-2.0.0.html on 21/12/2023. ↩︎
  10. (2022). The Unicode Standard Version 15.0 – Core Specification (Standard). Unicode Consortium. Mountain View. p. 14. Retrieved from https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf on 27/01/2024. ↩︎
  11. (1999). Components of The Unicode Standard Version 3.0.0. Unicode Consortium. Retrieved from https://www.unicode.org/versions/components-3.0.0.html on 21/12/2023. ↩︎
  12. (2022). The Unicode Standard Version 15.0 – Core Specification (Standard). p.362. ↩︎
  13. ibid ↩︎
  14. Everson, M. (1994). ↩︎
  15. (2022). The Unicode Standard Version 15.0 – Core Specification (Standard). p.362. ↩︎
  16. Davis, M. (2007). OGHAM SPACE MARK shouldn’t be whitespace (tech. rep. No. L2/07-340). Unicode Technical Committee Document Registry. Retrieved from https://www.unicode.org/L2/L2007/07340-ogham-space.txt on 27/01/2024. ↩︎
  17. Everson, M. (2007). Irish comments on L2/07-340 “OGHAM SPACE MARK shouldn’t be whitespace” (tech. rep. No. L2/07-392). Unicode Technical Committee Document Registry. Retrieved from https://www.unicode.org/L2/L2007/07392-ogham.txt on 27/01/2024. ↩︎
  18. Scott, T. (2018). ᚛ᚈᚑᚋ ᚄᚉᚑᚈᚈ᚜ and ᚛ᚑᚌᚐᚋ᚜. Retrieved from https://www.youtube.com/watch?v=2yWWFLI5kFU ↩︎
  19. (2008). Representation of Ogham Space, Tamil named sequences (Input to ISO/IEC 10646). WG2. Retrieved from https://www.unicode.org/wg2/docs/n3407.pdf on 27/01/2024. ↩︎
  20. List of Unicode Characters of Category “Space Separator”. Compart. Retrieved from https://www.compart.com/en/unicode/category/Zs on 27/01/2024. ↩︎
  21. revolvingocelot (2021). The Prince Symbol has been Salvaged from a 1993 Floppy Disk. Hacker News. Retrieved from https://news.ycombinator.com/item?id=29393534 on 21/12/2023. ↩︎
  22. (2022). Klingon in Unicode. Klingon Wiki. Retrieved from https://klingon.wiki/En/Unicode on 27/01/2024. ↩︎

Leave a comment

Your email address will not be published. Required fields are marked *