Creating new Japanese-like words
My personal interest in the Japanese language began long ago as a freshman in high school. By chance my public school had participated in a teacher exchange program and I had the opportunity to study Japanese with an all too kind Mr. Ishii. Though totally unprepared for the misconduct of inner city American youth, he persevered and I had a good introduction to Japanese. What I realized later is that, while I had very little knowledge of Japanese culture at the time, it wasn’t the culture that interested me then (that would come later), or even the supposed access that mastery of the language might grant to it… but rather a specific attraction to the qualities of the language itself. That is… its unique rhythm and textural sensation on a solely formal and visceral level. – I loved the way it felt to speak it and over time I found that I had a something of a proclivity for parsing its phonetic units entirely independent from the meaning expressed by them.
I have for some time considered a process for making “new” words in Japanese on both the phonetic and grapheme level. On occasion I have done so in free form when certain combinations seem to make sense.
Some modest attempts:
ごみ技師 – gomi-gishi (trash specialist)
真実き – shinjitsuki (truthful person) as opposed to the real word:
嘘つき – usotsuki (liar)
家無人 – uchinaijin (homeless person)
Also of interest has been the rapid transformation and recombination of Japanese slang, particularly of the genre coined by the Tokyo “Gyaru“:
A common but outdated example where the phrase:
空気読めない (kuuki yomenai).. Literally:“You can’t read the air” …or “You are clueless”
is shortened to just “KY” and then later modified to the English word “Sky” signifying: スーパ空気読めない or “Su-pa (super) Kuki Yomenai” …“You are totally clueless”.
The above example begins to show the open potential for word play where multiple methods can be implemented; different languages, acronyms, etc.
It seems that it might be possible to generate something that would at least meet some formal criteria through computation.
Proposed methods:
forming comprehensible phonetic structures in Romaji (romanized / ASCII character set) which will simplify language processing and allow for possible further variation later. The workable structures might take the form of set:
a, i, u, e, o, n, ka, ki, ku, ke, ko, sa, shi, su, se, so, ta, chi, tsu, te, to, na, ni, nu, ne, no, ha, hi, fu, he, ho, ma, mi, mu, me, mo, ya, yu, yo, ra, ri, ru, re, ro, wa
Either through rule sets and randomness alone or using N grams and Markov chains as a method for combining the above units into possibly unique or “new” words.
Further analysis, post processing, could then be done to assign Kanji to the words and in turn assigning possible meaning to them. Meaning could also be left unclear or ambiguous, or it could be defined entirely on the textural qualities of the words’ phonetics. For example… Japanese has an abundance of onomatopoeia such as:
ちかちか – Chika Chika (flickering or twinkling)
ざあざあ – Zaa Zaa (the sound of rain)
よぼよぼ – YoboYobo (wobbly-legged, or weak from old age)
Using the simple double repetition form could be an interesting exercise in and of it self .
As a culminating written form a new set of words could either be referenced in a free or non-computational method and incorporated into a new text. Or they could be inserted into an existing corpus and then using further N gram and Markov chain methods, a new text incorporating the generated words could be produced.
To illustrate how character N grams would differ when applied to discreet phonetic romanized units.. the basic example shown in class with the word “condescendences”
流体力学 - Ryūtairikigaku (hydrodynamics)
2 gram units could be:
ryu ta
ta i
i ri
ri ki
ki ga
ga ku
Rather than the much longer character based version. On one hand this shows how compact Asian languages are on the character or unit level, though requiring a lot more energy to encode and decode on the human level. A possible issue may be that words are in most cases not separated by spaces as is the practice in most western languages making the generation of a romanized character corpus a more involved endeavor.
