conlang, Writing Process

The Roman Code: the Why of our Letters

I’ve been studying linguistics since 1991, so it’s not often that I stumble on research that catches me completely off guard. But a couple of months ago, when I read Primus’s “A Featural Analysis of the Modern Roman Alphabet“, published almost 20 years ago, I was blown away.

Most linguists, myself included, assumed that our alphabet was not really a good area for linguistic research, per se. After all, language is embedded deeply into the human psyche; children learn it quickly, and if there are no adults around to provide a language, they will create their own. The patterns in language are rigorous and intricate, but mostly subconscious (e.g. all English speakers know how to use the word “the”, but few can actually tell you exactly what it means).

Letters, by contrast, are learned, often painfully, through instruction and long practice, and sometimes not at all. Children will not create them spontaneously — in fact, evidence suggests that the alphabet was invented exactly once in human history, and all modern alphabets are either descended from it or developed based on its model. There don’t seem to be many patterns in the shapes of letters, and their well-documented development through time seems to have been part random, part intentional, and part shaped by forces completely foreign to spoken language (like whether they were carved or inked).

But Primus (building on previous work I hadn’t been aware of) suggests that there are, in fact, good reasons to think that letterforms are subject to many of the same cultural and communicative pressures that languages are; and linguists can use our tools of analysis to answer questions such as:

  • Why exactly do we have these letters, and not others? Eg why is the letter “a” a vowel and the letter “d” a consonant?
  • What are the relationships between the forms of the letters? E.g. is it significant that “s” and “z” are basically mirror images of each other?
  • Why do most of our letters face to the right, but most of our numbers face to the left? (And why do we have this intuition about “facing”?)
  • Why have the letters developed the way they have? And how might they develop in the future?

So let’s dive in!

Random Paths

The Latin letters we know were adapted by the Romans from the Etruscans, who, in turn, borrowed them from the Greeks. But the Greeks stole them from the Phoenicians, who first invented the idea of an alphabet. To be clear, the Phoenicians didn’t invent writing; in fact they took inspiration from Egyptian hieroglyphs. But they used the symbols to represent individual sounds (known linguistically as phonemes) rather than whole words or syllables.

Each time the alphabet was passed from language to language, it was altered. Greek added symbols for vowels. Etruscan threw away symbols for voiced consonants like b, d, g, because Etruscan didn’t have those sounds. The Romans had to recreate them (which is why “G” is just a “C” with an extra curl — the Romans were not known for their creativity). At first glance, the development of the Roman alphabet appears to be a mass of random changes and complexities. The most you can say is that, over the first thousand years or so, the letters were made somewhat simpler, somewhat standardized, and made a bit more distinct. All things that you would expect for something that had to be written and read.

But then, in the Middle Ages, all progress toward simplification, standardization and readability seems to have been shunted aside in favor of tiny elegant curlicues. Over the next 2000 years, Europeans developed lower case letters. Each letter now had at least two different forms, and were written in a smaller, cramped style, with lots of variation between different languages. Lower case letters are demonstrably harder to read, especially if they are written in cursive. (This is why upper case letters are usually taught to children first, even though lower case letters are of course much more frequent in text.) So what gives?

Hidden Patterns

But behind the apparent chaos, there are unexpected patterns. For example, punctuation tends to consist of points rather than lines, and letters typically face to the right, while numbers face to the left.

It’s not clear how much of this pattern was consciously put in place. It makes sense that punctuation (which was developed alongside lower case letters in the Middle Ages) should be made to look distinctive from letters. And it would make sense that letters and numbers might face opposite directions, but that doesn’t seem to have been on purpose. The Greek and Etruscan versions had letters facing every which way, but the Romans turned many letters to face to the right — perhaps to have them face the same way as the direction of text. But the Romans used letters for their numbers as well (MCMLXXIII). When the “Arabic” numerals (which were actually invented in India) were borrowed in the Middle Ages, some faced left, and some faced right; but as Europeans began using them regularly, they standardized to all face left (except 5 and 6). If that was a conscious decision, it’s never mentioned.

But the patterns go even deeper. There are variations and exceptions, but intriguing connections have been observed between the forms of letters and their sound properties. Stops like p, t, k, b, d, g often have ascenders or descenders (lines that go above or below the center of the letter), while non-stops like vowels (a, e, i, o, u), sonorants (m, n, w, r), and fricatives (s, v, z) tend to avoid them. This pattern holds for 78% of letters. Furthermore, letters with more overall curvature are associated with an open mouth and lip rounding; this holds for 76% of letters. These patterns seem to have emerged without conscious intent — very reminiscent of the intricate subconscious patterns found in spoken language.

In fact the existence of exceptions is another point of evidence that these patterns are, in fact, linguistic. Language is full of exceptions that appear to have no rationality (e.g. English’s plural is sometimes pronounced s, as in “cats” and sometimes z, as in “dogs”), though oftentimes the exceptions are hiding a deeper pattern (the actual rule is that the plural is pronounced z unless it comes after a voiceless stop like p, t, or k). We’ll see “exceptions” hiding deeper patterns in just this way below.

Markedness and Iconicity

Markedness refers to features of sound that are non-default or cross-linguistically rare. In English, voiceless sounds like s are unmarked, and voiced sounds like z are marked. Cross-linguistically, dental stops like t are unmarked, and interdental fricatives like th are marked.

In the Roman alphabet, unmarked sounds generally face right, are written with one character, and follow the rules of ascenders, descenders, and curvature mentioned above. But marked sounds often face left, are represented by digraphs (pairs of letters), or possess unexpected ascenders or descenders.

Reversing characters to show markedness is an example of iconicity, the non-arbitrary nature of symbols. One of the core principles of linguistics since the 19th century is the arbitrariness of the sign (non-iconicity), but this has been seriously challenged in the last few decades. It now appears linguistic signs are often iconic, but not always systematically or obviously.

sz, td, iu

Let’s consider some examples. S is an unmarked sound, a common alveolar fricative found in numerous languages, and as such, it faces right. It is not a stop, so it has no ascender or descender, and it shows curvature. Z, which represents the same sound but is marked with voicing, is reversed. Breaking the “face right” rule iconically shows that z is a marked sound.

Now consider t and d. T is even less marked sound in almost every way: a dental stop, very common across languages and frequent in English. As a stop, we expect an ascender or descender, and we get an ascender. When t is pronounced, the mouth closes completely, so we expect less curvature, and we find none. And of course we expect it to face right. It also has a “coda”, i.e. an additional mark attached to the main stem, near the top. (This is not predicted but also not not predicted).

d is an identical sound, but it is marked with voicing. So it faces left, like z. To make it clear that it’s facing left, its coda is made into a loop. And the loop is in the opposite place from t: instead of at the top, it’s at the bottom. So d is essentially t upside down and reversed.

Finally, let’s look at a pair of vowels. i (as in fiesta and radio) is an unmarked vowel (found in almost every language on Earth and very frequent in English). As a non-stop, it has no ascenders or descenders (the dot seems to be there to help distinguish it from “l”). The mouth is more closed (relative to other vowels) so it has no curvature. Meanwhile u (as in brunette) is almost the same vowel, but marked because it is a back vowel, so it faces left.
u is also pronounced with lip rounding, so it has curved lines.

Uppercase

As mentioned above, lower case letters did not exit in Roman times. The transition from the all-uppercase Roman writing style to our modern lowercase-with-caps-reserved-for-special-cases took centuries but follows a consistent trajectory across Europe. Uppercase letters now often indicate marked words, like acronyms or names. Their height may be why they are used for sentence edges, since they may iconically represent the container of the sentence.

Letters as a Linguistic System

This research is still in its infancy, and has yet to be applied to many other writing systems. It’s an open question whether similar patterns will be found elsewhere. But it seems as though perhaps, over four thousand years, the chaos of the original Phoenician letters has developed into a linguistic system.

If that’s the case, then we would expect younger, more recent writing systems to lack these features, unless they were explicitly designed to do so. For example, the Cherokee Syllabary was inspired by the appearance of the Roman letters, but their designer (Sequoyah, undoubtedly the greatest American linguist in history) did not know how they worked, and so independently invented a syllabary for his language visually based on them. The Cherokee Syllabary is beautiful and works excellently for Cherokee, but it does not follow the same feature system as the Roman letters. Conversely, Hangul, the Korean writing system, was designed specifically with linguistic features in mind, and you can practically tell how to pronounce the letters just by looking at them.

So much, then, for the past. Can linguistics tell us anything about how our letters might evolve in the future? It’s impossible to say for sure, but we can make some good guesses. We may expect (over the next couple of thousand years) the alphabet to continue to evolve. It will have to, since our languages will continue to change and develop. The marked digraphs (th, sh, ch) are most likely to disappear or change as the marked sounds they represent also change. (Most English dialects no longer pronounce th as an interdental fricative, and have replaced it with t or d. Perhaps one day the writing will reflect that.)

Other letters may also be refined to more perfectly reflect these hidden patterns. Keep your eye particularly on f, c, and y. f should not have an ascender, but it should keep its curve. c is a redundant letter so it could be dropped or repurposed as a vowel. And y should not have a descender; plus, most other languages use j for the sound (a modification of i, a very similar vowel), so it might get dropped too.

If this is correct, we see that the Roman alphabet is a powerful tool that mediates between language and culture. Instead of being a chaotic mess detached from language, it is brimming with patterns that delicately reflect linguistic structure. It’s given me an even deeper appreciation for the complex interplay between speech, cultural evolution, and human cognition. The European letters were created by accident, worn and shaped through long handling, and are ubiquitous to the point of invisibility; but still they are a treasure that continues to surprise.

1 thought on “The Roman Code: the Why of our Letters”

Leave a comment