System for Ethiopic Representation in ASCII (SERA)
Dedicated to the work and memory of Abraham Demoz (1925 - 1994)
In the time since the original publication of our paper
in "The Journal of EthioSciences" Volume 3 Number 1 on the
topic of representation of Fidel in 7-bit ASCII, the need
became apparent to extend the system to encompass
representation for Ethiopic numerals, punctuation, and mixed
script notations. In the same period more was learned about
the treatment of certain characters outside of Amharic that
allowed for simplification of the ASCII representation. The
following is a recapitulation of the original publication and
an assessment of some of the more recent developments. A
complete discussion of many of these changes are available at
the Rensselar Polytechnic Institute ftp archive under the file
name SERA-94.
As we have indicated before, this system, though well
developed, is still not in its final form. Further refinements
will only come after many have had the chance to use it and
test its strengths and weaknesses on their own. As Abraham
Demoz, to whom we have dedicated this work, noted [1]:
"...script reform calls not only for a competent
professional assessment of the technical aspects of
the script but also for a careful weighing of these
against the psychological and socio-political factors
that have a bearing on the written word and all that
it stands for"
(Demoz, "Amharic Script Reform Efforts". ETHIOPIAN
STUDIES. S. Segert and A.J.E. Bodrogligeti, Eds. 1983).
Any and all feed back will be greatly appreciated.
dan'El yaqob (Daniel Mulholland)
yTna frdyweq (Yitna Firdyiwek)
=============================================================================
Contents
1. The SERA Table
2. Considerations We Took in the Development of SERA
3. Developments of the System
4. Some Commonly Asked Questions
5. A Full Sample Text with Statistics
6. Notes, Appendices and References
1. The System for Ethiopic Representation of ASCII (SERA) Table
Although some questions still remain to be answered regarding
the number of "forms" to use for the ASCII/ETHIOPIC table, we
have retained the original arrangement of twelve (12) for SERA
pending decisions relating to the Unicode/ISO standards
currently under discussion. We do not believe a change in the
matrix of the table will affect the work discussed in this paper.
The Ethiopic Script in ASCII
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 2 3 4 5 6 7 8 9 10 11 12
g`Iz ka`Ib sals rab`I hams sads sab`I diqala -->
1 he hu hi ha hE h ho
2 le lu li la lE l lo lWa
3 H2e Hu Hi Ha HE H Ho
4 me mu mi ma mE m mo mWa
5 `se `su `si `sa `sE `s `so
6 re ru ri ra rE r ro rWa
7 se su si sa sE s so sWa
8 xe xu xi xa xE x xo xWa
9 qe qu qi qa qE q qo qWe qWu qWi qWa qWE
10 Qe Qu Qi Qa QE Q Qo QWe QWu QWi QWa QWE
11 be bu bi ba bE b bo bWa (Q is Tigrignia)
12 ve vu vi va vE v vo vWa
13 te tu ti ta tE t to tWa
14 ce cu ci ca cE c co cWa
15 `he `hu `hi `ha `hE `h `ho hWe hWu hWi hWa hWE
16 ne nu ni na nE n no nWa
17 Ne Nu Ni Na NE N No NWa
18 e\a u\U i A E I o\O e3 (e3 as in "e3re!")
19 ke ku ki ka kE k ko kWe kWu kWi kWa kWE
20 `ke `ku `ki `ka `kE `k `ko (`k is Chaha)
21 Ke Ku Ki Ka KE K Ko KWe KW KWi KWa KWE
22 Xe Xu Xi Xa XE X Xo (X is Chaha )
23 we wu wi wa wE w wo
24 `e `u `i `a `E `I `o
25 ze zu zi za zE z zo zWa
26 Ze Zu Zi Za ZE Z Zo ZWa
27 ye yu yi ya yE y yo
28 de du di da dE d do dWa
29 De Du Di Da DE D Do (D is Oromiffa)
30 je ju ji ja jE j jo
31 ge gu gi ga gE g go gWe gWu gWi gWa gWE
32 Ge Gu Gi Ga GE G Go (G is Chaha)
33 Te Tu Ti Ta TE T To TWa
34 Ce Cu Ci Ca CE C Co CWa
35 Pe Pu Pi Pa PE P Po
36 Se Su Si Sa SE S So SWa
37 `Se `Su `Si `Sa `SE `S `So
38 fe fu fi fa fE f fo fWa
39 pe pu pi pa pE p po
(Letters will be referred to both by their ASCII spelling and by their
position on the above number matrix (e.g. "he" or 1/1). The columns are also
known as "forms" (e.g., first form, second form, . . .etc.) or by their
Ethiopic names: e.g. g`Iz, ka`Ib, sals, . . .etc.)
2. Considerations We Took in the Development of SERA
We have taken the following two considerations in coming up
with our proposed standard
a) The system must be easy to type on a 101 keyboard. This entails:
-- finding the closest match between the Latin and Ethiopic
phonetic system (while being as systematic as possible with
the inevitable exceptions),
-- limiting the number of keystrokes necessary for each Ethiopic
character to a minimum, and
-- placing the most frequently used keys as close as possible to
the "home keys" row of the 101 keyboard
b) The system must also be easy for machine translation. In this
case, the systematicity of the mapping of Ethiopic to ASCII is
exploited to make the machine translation between ASCII and
Ethiopic script (in word processors, for example) as fast as
possible.
3. Development of the System
It may first occur to one when attempting to write Ethiopic script
with Latin letters, to represent the 7 forms with numbers as so:
Consonants:
h1 h2 h3 h4 h5 h6 h7
Independent Vowels:
a1 a2 a3 a4 a5 a6 a7
It is soon found in practice, however, that while this is a very simple
system for representing the Ethiopic characters, it is not so simple to read
or write in it (e.g., "T5n1y6s6T6l6N6", "a1d5s6 a1b1b4"). This is true
largely because our minds are not trained to associate the Latin script with
Arabic numbers to form words. One will soon wonder why not use the Latin
vowel letters to denote the 7 forms of the Ethiopic characters. This is
where the trouble begins: How do you represent the standard 7 Ethiopic forms
(plus the "W" forms) with only 5 Latin vowels?
The first step we took was to assign a punctuation mark (the apostrophe ')
and "I" for the two extra Ethiopic vowels (plus "W" for forms 8-12). So,
following phonetic guide lines we came up with the following system:
Consonants:
h' hu hi ha he hI ho
Independent Vowels:
a' au ai aa ae aI ao
Again, after some trial use (e.g., "Ten'yIsITIlINI", "a'disI a'b'ba") we
found that the writing can be made more readable if we used only one
character for the pure vowel form. Then the system reduces to:
Consonants:
l' lu li la le lI lo
Independent Vowels:
' u i a e I o
and our sample text would look like: "TenayIsITIlINI", "'disI 'b'ba"
which becomes a little easier to read and to type.
After a short time a reader is likely to find that trying to "read a sound"
from punctuation proves too difficult. Our minds have been conditioned for
too long already to skip over apostrophes when reading possesive and
contracted words. We introduce the principle now that whenever possible
punctuation be avoided to represent spoken sounds and seek another alphabetic
character to replace the apostrophe.
We find a suitable substitute in "E" but recognize right away the draw back
of the extra "shift" required to type it. With only a small intuitive feeling
one will come to realize that the 5th form letters are used less often in
writing than are 1st form. Hence a swap between the two forms makes the use
of "E" a little easier and gives us the new table :
Consonants:
le lu li la lE lI lo
Independent Vowels:
e u i a E I o
and our sample text appears a little more naturally as:
"TEnayIsITIlINI", "edisI ebeba"
It is at this point that we began to notice two problems:
1) the 6th (or "sadis") form of the Ethiopic characters occurs more
often than any other form (about a third more often), and
2) the use of "e" for the first vowel makes the "look" of some familiar
Amharic words peculiar, and the sound association is poor.
The quick solution:
1) stop using "I" for the sadis (sixth form) consonants, letting the
consonants stand by themselves, and
2) allow the use of "a" for the first form independent vowel with "e",
and introduce "A" for the 4th form independent vowel.
Consonants:
le lu li la lE l lo
Independent Vowels:
e\a u i A E I o
Examples:
TEna ysTlN
adis abeba
Indemn kermachWal
zarE Tewat suq hEjE neber
manew smh? manew smx?
Ambiguity Problem with The Independent Vowel
This system is easier to read and type, but there is still a problem. If
you have never before seen the word "Tena" how will you know if you
are reading 2 Ethiopic characters or 4 -- "TE-na" or "T-E-n-a"? This problem
of ambiguity usually occurs because it is not clear whether a consonant
letter is a sadis (6th) form followed by an independent vowel form, or a
syllable made up of the consonant and following vowel form. Of course, this
is a problem only if the reader does not know the language. An Amharic
speaker would not make such a mistake.
In another scenario, the name "Gabriel" can be read "ge-b-r-E-l" (correctly),
or "ge-b-rE-l" (not quite correct, but okay when speaking fast). Though the
ambiguity is there, whether you interpret the Latin as showing 5 (ge-b-r-E-l)
characters or 4 (ge-b-rE-l) makes almost no difference.
These conditions may not always be true, however, and the difference does
become a big problem for word processors and computer software for
translation. It is better then to insure that the characters are
unmistakably represented. To accomplish this, our decision was to recycle
the apostrophe ' as a separator for independent vowels that appear after a
sadis (sixth form) consonant. Thus, we can rewrite Gabriel as "gabr'El" and
modify our system, which now includes a third category, accordingly:
Consonants:
le lu li la lE l lo
Independent Vowels:
e\a u i A E I o
Independent Vowels Following a 6th Form Consonant:
l'a l'u l'i l'A l'E l'I l'o
l'e lU lO <--also
If we consider now an application for the remaining uppercase vowels; "U"
and "O", we find that in some instances, as shown in the 2nd row of the
third category, the use of the apostrophe may be omitted without confusion.
4. Some Commonly Asked Questions
1. Why not use "sh" for "x" and "ie" or "y" for "E"?
These would make logical choices for readers familiar with rules in
English but may not make sence in non-English speaking nations where
a form of the Latin script is used. It is desirable also to keep
the keystrokes to a minimum for humans, the parsing requirements of
computers as simple as possible, also media and transfer sizes to a
minimum by avoiding multiple character representations when possible.
Further, the reader is left to infer the meaning "sh" as one or two
Fidel characters. The separator ' presents a solution here but again
complicates parsing and introduces special case rules vs generalized.
The acception to the general rules also lends towards greater
occurences of spelling errors.
"ie" may be an easier keystroke than "E" but again introduces
inference and parsing complexity. The choice is not always logical
as a phonetic model for the "ay" sound with Latin letters when
considering such examples as "die", "vie", "pie", "lie", and "tie".
"y" occurs more commonly in speech and written text as a consonant
than as the 5th syllabic form. Hence the lowercase Latin character
is better reserved for the consonant to save on keystrokes.
When an Ethiopic interface is available, these kinds of questions
become input method issues and not file IO and transfer which SERA
was primarily designed for.
2. What if I wish to show more sound for a sadis consonant?
It is not always accurate to say that the vowel component of the
sadis consonant is not spoken. For many words the vowel in the 6th
form consonant is clearly enounced. If you wish to write in a more
phonetic manner with out loss of clarity; this may be accomplished
by writing the 2 character representation form of the sadis
consonant when it is needed. As you will recall we have redefined
the 2 character form of the 6th consonant as "l|". We can mix the
two character and one character forms together in the same word to
show when the vowel portion is voiced:
ysTlN = y|sT|l|N
tgrNa = tg|r|Na
alfelgm = alfel|g|m
TrE = T|rE
Writing with both the one and two character representations of the
6th form consonant together may be more laborious to the typist but
has the advantage of giving the reader a better demonstration of
the word's sound when spoken. The mixed representation is not
ambiguous and does not pose any problem for machine translation
when going from Latin to Ethiopic. If it would become a common
practice to mix the two systems, we may wish to try alternate
characters in place of the pipe ( "|" ).
3. I see the ' used in other ways, what are the complete rules?
The apostrophe was introduced as a separator to indicated that
a vowel after a 6th form consonant does not modify the form of
the consonant, ie "nE" is one Fidel letter and "n'E" is two. The
principle of the separator may be applied elsewhere when it enhances
clarity. For instance between vowels as in "beadis" vs "be'adis"
or "keityoPya" vs "ke'ityoPya". Here, the ' helps prevent the reader
from slipping back into rules of English where the vowels would be
combined into a single sound. Also ' following a consonant as in
"t'" may be treated as another definition for the 6th form
representation when convenient.
4. Why Are Numbers Used With Letters?
A problem that occurs when trying to represent Ethiopic script
phonetically in Latin is the presence of Ethiopic letters that are
phonetic equivalents. These cases are encountered with the two
Ethiopic characters for "s" and "S" and the 4 characters for "h".
Representing one of the 2nd forms with an unused Latin character,
say F, R, or V, would be a digression from phonetic norms and adds
a level of complication to the reading. In the case of what would
be h4 the uppercase "K" is chosen for representation. This choice
models the husky "kh" sound that the character has in Tigrignia and
other languages.
For the more common type of email exchanges omitting the number 2
or 3 does not result in a loss of interpretation. The use of the
ordinals becomes more important later if the text is to be read and
translated into Ethiopic script by computer.
5. Why Does "s2" Come Before "s" ?
The "2" is only needed to distinguish the difference between the
two "s"s in Ethiopic script. In modern writing it is the the 2nd "s"
appearing in the fidel that finds the most frequent use in the
spelling of words. The first "s" is represented as "s2" because it
occurs less frequently in writing. Were the 2nd "s" labeled as "s2"
it would give the typist considerably more finger work to perform.
6. How was "e3" arrived at for the 8th vowel?
The choice of "e3" is thought to be the best model for the sound of
the character. The sound of the character is in Amharic the same as
that of "e" (the first vowel) in Tigrigna. The choice of a numeral
to follow "e" will detract from the reading quality of the character,
which should come at a small cost when its infrequent use is
considered.
7. Why is The Capital "W" Used For Diqala Forms?
The uppercase "W" is used to remain phonetically consistent with
the sound of the diqala forms (forms 8 - 12). The lower case "w" is
reserved exclusively for consonant 21 with the "w" sound. Thus
confusion and ambiguity is avoided with use of the uppercase "W".
8. Why is "Wu" Used For the Letter I learned was "W"?
Actually both are valid under SERA. In different geographic regions,
and at different times within the same region, people have been
taught two different sounds for the 2nd form labiovelar (which one
may have learned as a 6th form). Phonetic representations as "kWu"
"kW" and "kW'", in example, are permitted for both ways a person may
have been taught. Each form is no more right or wrong than the
other.
9. Why is "hWa" used in place of "`hWa" or "h2Wa"?
This is a break in consistency from how forms 1 through 7 of "h2"
were represented. However, as "h" does not have forms after
the sab`I (the 7th form) there is no opportunity for confusion to
arise from the omitted "2" of "h2W". Hence "hW" will be uniquely
identifiable as representing diqala forms of the h2 consonant. The
advantage of dropping the "2" in the diqalawoc range, will be the
keystroke saved for typists.
10. What is done with the left-over Latin letters?
The "left over" Latin uppercase consonants; B, F, J, L, M, R, V, and
Y, are now recognized as equivalent to their lowercase counterparts.
That is "Y" in transliteration would be interpretted identically as
"y" etc. These same Latin characters are considered to be on a
"reserve" status to model some overlooked sound in an Eritrean or
Ethiopian language.
5. A Full Sample Text with Statistics
WORD COUNT : 170
CONSONANT COUNT
Form1 : 161 Form2 : 21 Form3 : 35 Form4 : 106
Form5 : 14 Form6 : 216 Form7 : 25 Form8 : 3
VOWEL COUNT
Form1 : 25 Form2 : 0 Form3 : 5 Form4 : 2
Form5 : 1 Form6 : 13 Form7 : 1
From the Ethiopian Examiner January 1994
yeselamna yeIrqu konferans gizEyawiw mengst keslTan Indiwerd Teyeke
bekefateNa gugut siTebeq yeneberew yeselamna ye`Irq gubae, ketahsas
9-13 1986 `a.m. beadis abeba ketema baderegew yeamst qen sbseba,
beih`adEg yemimeraw gizEyawi mengst slTanun Indiyasrekb Teyeqe.
qedem blo paris lay sbsebaw Indidereg keTeyequt sebat teqawami
budnoc wsT, yesostun abalat wede ageracew sigebu awroplan Tabiya lay
bepolis bemasyazna bmaser mengst bzihu sbseba lay Indaysatefu adrgWacewal.
yetasrut abalat, ato abera yemaneab, we/rit genet grma, ato mesfn
tefera, ato alemayehu dErEsana ato genenew asefa (keidE`haq): ato seyum
zenebe (kemed`hn) Ina ato ibsa gutema (keoneg) nacew.
mengst Inezihu sewoc lay yewesedew yeIsrat Irmja sewocn beselamawi
menged beageracew yepoletika hidet wesT Indaysatefa slemiyaderg bzu sewocn
aseqoTtWal. beadis abeba yemigeNu diplomatocm yKEw yemengst Irmja
yesbsebawn tesatafiwoc farhat lay bmeTal sbsebaw mnm bego wTEt IndayameTa
yaderg yhonal bemalet hesabacewn gel`Sewal.
yeityoPya gizEyawi mengst (iH'adEg) besbsebaw lay saysatef qertWal.
lezihum begizEyawi prEzidEntu beato meles zEnawi yeteseTew mknyat sbsebaw
lepropaganda `alama bca yemidereg kentu sbseba new yemil new.
6. Notes, Appendixes and References
As noted at the begining of this discussion complete and up to date copies
of all of these texts can be found at the Rensselar Polytechnic Institute ftp
archive under the file name SERA-94.