How to create a language in one day

About a year ago I worked on a very interesting project which involved creating a unique world with all its history, people, physics, metaphysics and so forth. I like fictional worlds that are thoroughly created and I have always marveled at people like Tolkien or Richard Garriot who go such great lengths and even create languages for their worlds. I have since many years felt that it would be awesome to create my own language and Iâ€™m probably not alone in feeling that.

When I started studying linguistics and computational linguistics many years ago I learned a lot about the behavior of language. By getting more acquainted in the world of languages, the task of creating my own language seemed achievable. I knew what I needed to cover and roughly in what end I should start, but I also realized the scope of such a project. It was just too daunting.

However, a year ago I was thinking about the world we were creating and I briefly returned to the idea of creating a language. I though about it and wondered if I couldnâ€™t be much more efficient. I mean, I wouldnâ€™t wanna spend a couple of months on a language that would be part of our fictional world. It would add depth to our world, but few would probably appreciate it â€“ and further, the project was not green lit.

But one evening I began to do some basic research. This lead to some quick tests. And after spending another evening I was done with my language. I had created a fictional language in (less than) one day.

Linear B

First, I wanted a language that felt real. It should reek of history. In the end I turned to Linear B and figured I could use it. (Of course I could have drawn my own set of symbols and worked out their pronunciation, but this time I decided to go with Linear B as it is)

This is not the whole Linear B writing system. There is a set of logograms and special characters in the system as well, but I decided to ignore them and just go with the symbols you see above.

One interesting aspect of this part of the Linear B system is that each symbol corresponds to a syllable. This is quite different from our Latin alphabet. Whereas Linear B uses one symbol to denote the syllable â€œwoâ€, we would in English write it with two symbols: â€˜wâ€™ and â€˜oâ€™.

Translating syllables

Now, what would happen if I could just somehow translate English syllables into Linear B ones? After some more digging I found a list of the few hundred most common digraphic (two character) syllables in English. The 10 most common being:

Syllable	Frequency
TH	3,99%
HE	3,65%
AN	2,17%
ER	2,11%
IN	2,10%
RE	1,64%
ND	1,62%
OU	1,41%
EN	1,37%
ON	1,36%

Thatâ€™s well and good. Now, If I could set up a table matching the 60 most common digraphs in English against the 60 Linear B symbols I might get somewhere. Piece of cake! Python (or Ruby or Perl for that matter) to the rescue! These are excellent languages for these kinds of tasks. Here comes the translation table:

translation_table = [
    ('en','a'),  # Digraphs
    ('er','e'),
    ('nt','i'),
    ('th','o'),
    ('on','u'),
    ('in','da'),
    ('te','de'),
    ('an','di'),
    ('or','do'),
    ('st','du'),
    # ... more pairs like these ...
    ('ll','za'),
    ('ng','ze'),
    ('me','zo')]

I can pretty much pair these as I want since Linear B syllables always have a vowel in them. So I wonâ€™t end up with long strings of consonants ("jfdksjfdf") however hard I try.

Ok, we also need translation functions. translateWord() translates single words syllable for syllable and translate() iterates over a whole string (sentence) and translates it word by word:

punctuation = (',','.',':',';','!','?')
 
def translateWord(word):
    def trans(str):
        for (ep, lp) in translation_table:
            if str.startswith(ep):
                return (lp, str[len(ep):])
        # didn't find a syllable. chip off one character and move on
        if str[0].endswith(punctuation):
            return (str[0], str[1:])
        else:
            return ('', str[1:])
    tword = ''
    word = word.lower()
    while word != '':
        (syl, word) = trans(word)
        tword = tword + syl
    return tword
 
def translate(str):
    return " ".join([translateWord(w) for w in str.split(' ')])

Now we can try to translate sentences:

This is my new language

translates into

oqe qe je teze

This looks promising, but we need to fix one thing. Since there is no corresponding syllable to â€œmyâ€, the whole word â€œmyâ€ gets consumed. Adding the single vowels (â€˜aâ€™, â€˜oâ€™, â€˜uâ€™ etc) to translation_table and have them correspond to Linear B syllables does the trick.

Why is this your new language?

now becomes

o qe oqe opi je tezeanesi?

Giving the language more flavor

Itâ€™s a good start, but we can get a bit further. First of all, the translation table could be expanded a bit with entries for semi-wovels (â€˜wâ€™, â€˜jâ€™, â€˜lâ€™) and some consonants. But thereâ€™s also things we can do with the language structurally. There is a linguistic term called â€œagglutinationâ€ which means that instead of isolating a word of some syntactic meaning, it is instead tacked onto another word as a prefix or a suffix. English does this with the plural marker â€˜-sâ€™, for instance, while pronouns like â€œyourâ€ and â€œusâ€ are separate words.

Some languages are heavily agglutinating, like Finnish whereâ€œtalossanikinâ€ means â€œin my house, tooâ€ whereas a language like Mandarin isolate everything (these are also called analytic languages).

For the sake of making my language more exotic than English I decided to have it use suffixes where English uses separate words in a number of cases. Another table does the trick:

switch_table = [ 
    'a', 'an', 'the', 'my', 'your', 'his', 'her', 'its', 'their', 'your', 'our',
    'i', 'we', 'you', 'he', 'she', 'it',
    'one', 'two', 'three', 'many', 'some',
    'not']

(My final table is a little bigger than this but this illustrates the point)

If any of the words in the table are encountered, they switch place with the next word and joins it as a suffix. The function intermediate()handles that and creates the â€œintermediateâ€ English form:

def intermediate(str):
    i = 0
    s = str.lower().split(' ')
    s2 = []
    while i < len(s) - 1:
        if switch_table.count(s[i]) > 0:
            # Make suffix
            n = s[i+1]
            nsuffix = ''
            if n.endswith(punctuation):
                nsuffix = n[-1]
                n = n[0:-1]
            s2.append(n+s[i]+nsuffix)
            i = i + 1
        else:
            s2.append(s[i])
        i = i + 1
    if i < len(s):
        s2.append(s[i])
    return ' '.join(s2)

So if I run the string Why is this your new language? throughintermediate() I get:

why is this newyour language?

And feeding that through translate() yields:

o qe oqe jeopi tezeanesi?

Writing it out

Now we only have to get it written into the nice Linear B symbols. Fortunately, Unicode covers Linear B so if we only have a font that includes its symbols (Youâ€™ll find one called â€œAegeanâ€ here), any web browser will be able to display the text. First, we just add the Unicode codes for each entry in the translation table:

translation_table = [
    ('en','a', '&#x00010000;'),  # Digraphs
    ('er','e', '&#x00010001;'),
    ('nt','i', '&#x00010002;'),
    ('th','o', '&#x00010003;'),
    # ...and so on...

We also need to modify the translateWord() function to return tuples of Ascii and Unicode (exercise left to the reader). Then we can easily dig out either the written or â€œspokenâ€ version of the text and put it all in a HTML page (another exercise to the reader) for your favorite web browser to render.

Letâ€™s try itâ€¦

Source : http://www.sicher.org/2011/10/18/how-to-create-a-language-in-one-day/

How to create a language in one day

Linear B

Translating syllables

Giving the language more flavor

Writing it out

RELATED

0 COMMENT

ABOUT

HOW IT WORKS

FOLLOW US

FEEDBACK