Saturday, 17 September 2016

LANGANA ENGLISH TO TURKISH TRANSLATOR

LANGANA ENGLISH TO TURKISH TRANSLATOR


Ali Riza SARAL / ÇİZGİ AŞ.




ABSTRACT
LANGANA is an English to Turkish translation framework that can be extended to translate any
English sentence automatically.

INTRODUCTION
There are many programs on the internet that claim to translate English to Turkish correctly.
These  programs use statistical methods[1] to solve the translation problem.  Unfortunately,
every and each of these translation programs fail in many cases.  LANGANA is an English to
Turkish translation program that uses rule-based methods and parsing.
NOMENCLATURE
Translation engine, parser, language processing, English to Turkish translation


There are many programs on the internet that claim to translate English to Turkish correctly.
These  programs use statistical methods [1] to solve the translation problem.  Unfortunately,
every and each of these translation programs fail in many cases.

LANGANA is an English to Turkish translation program that uses rule-based methods and parsing.
This project is still in the feasibility phase.  This means it cannot guarantee a correct
translation of any English text.  It can correctly translate any English text which is similar
to the examples it has been developed for.  It may also do translations out of the range of
these examples due to the builtin flexibility of the languages.  Feasibility indicates
LANGANA may be extended to translate any new example case, a prototype sentence.

Translation from English to Turkish has many facets.  The English text must be parsed to
extract words and punctuations.  Then, the word types must be identified.  LANGANA uses
a Webster dictionary based, wordtype dictionary.  This wordtype dictionary is created
by parsing the Webster dictionary.  Wordtypes are determined according to their context
in the sentence structure.  A parsing algorithm is used to check the correctness of
the produced type.

The translation from English to Turkish requires to find the common minimum items.
For ex.   'the old teacher' is translated as 'yaşlı öğretmen'.  The word sequence
in these two examples are the same.  The translation is done word by word.

According to this, to prepare for the translation LANGANA identifies the word groups
such as noun phrase, preposition phrase  etc.

*******************************************************************

---------------------------------------------------
 He decided to telephone Mrs Jackson, who he had read about in the newspaper
--------------------------------------------------

        0  he ||nom. pron.  SUBJECT0      
0 1 he <-----subject noun="" p="" phrase="">
               1 S decided ||+reg. v.||v. t.||imp. & p. p. Decided||p. pr. & vb. n. Deciding||v. i.||v. t.||v. i.  VERB0

2 S to ||to-inf.  INFINITIVE TO
3 S telephone ||+reg. v.||v. t.  INFINITIVE VERB

                    4 S mrs ||pn.                    
                    5 S jackson ||pn.                    
4 6 mrs jackson <-----object after="" infinitive="" noun="" p="" phrase="">
6 ,S who ||rel. pron.
6 14 who <-----------relative clause="" comma="" p="" with="">
        7 S he ||nom. pron.  SUBJECT0      
7 8 he <-----subject noun="" p="" phrase="">
               8 S had ||+irreg. v. neutral||v. t.||imp. & p. p. Had||p. pr. & vb. n. Having  VERB0
               9 S read ||+irreg. v. neutral||+irreg. v. imp.||+irreg. v. participle||v. t.||p. pr. & vb. n. Reading  VERB0

                    10 S about ||+prep.  PREPOSITION PHRASE BEGINNING
11 S in ||+prep.
                    12 S the ||+def. art.                    
                    13 S newspaper ||n.                    
10 14 about in the newspaper <-----preposition p="" phrase="">
*******************************************************************

The functions of the words as subject, object, verb etc. are determined in the next phase.
This produces the English Parser output.

*******************************************************************

SENTENCE 0 -------------------->ENGLISH PARSER OUTPUT
0 1 he <-----subject noun="" p="" phrase="">
1 2 decided <------ p="" verb="">
2 4 to telephone  <-------------infinitive p="" verb="">
4 6 mrs jackson <-----object after="" infinitive="" noun="" p="" phrase="">
6 14 who <-----------relative clause="" comma="" p="" with="">
7 8 he <-----subject noun="" p="" phrase="">
8 10 had read <------ p="" verb="">
10 14 about in the newspaper <-----preposition p="" phrase="">
*******************************************************************

The English Parser output may be produced  seperately for related purposes.

The English to Turkish translation requires not only the translation of words,
phrases etc. but also the word sequence of the sentence.  In fact, the translation
of even punctuation is necessary in clause sentences.

LANGANA uses a dedicated section to do the word sequnce translation.
It translates the sentence structure based on positive, negative, question,
interrogative, imperative cases and some of their combinations.

Sequence translation is done for the main part of the sentence first.
Then the clause section's word sequence is translated.  A seperate structure
is used to keep the output Turkish word sequence:

*******************************************************************

getEngStructLength=8
------------------------------------>setTranslationSEQforConjunction 5 8
EngStruct==>0 = 0 1 he <-----subject noun="" p="" phrase="">EngStruct==>3 = 4 6 mrs jackson <-----object after="" infinitive="" noun="" p="" phrase="">EngStruct==>2 = 2 4 to telephone  <-------------infinitive p="" verb="">EngStruct==>1 = 1 2 decided <------ p="" verb="">EngStruct==>4 = 6 14 who <-----------relative clause="" comma="" p="" with="">EngStruct==>5 = 7 8 he <-----subject noun="" p="" phrase="">EngStruct==>7 = 10 14 about in the newspaper <-----preposition p="" phrase="">EngStruct==>6 = 8 10 had read <------ p="" verb="">
---------------------------------------------->setTranslationSEQforConjunction 0 8

relClauseInsertPOS=1
EngStruct==>0 = 0 1 he <-----subject noun="" p="" phrase="">EngStruct==>5 = 7 8 he <-----subject noun="" p="" phrase="">EngStruct==>7 = 10 14 about in the newspaper <-----preposition p="" phrase="">EngStruct==>6 = 8 10 had read <------ p="" verb="">EngStruct==>4 = 6 14 who <-----------relative clause="" comma="" p="" with="">EngStruct==>3 = 4 6 mrs jackson <-----object after="" infinitive="" noun="" p="" phrase="">EngStruct==>2 = 2 4 to telephone  <-------------infinitive p="" verb="">EngStruct==>1 = 1 2 decided <------ p="" verb="">
*******************************************************************

Translation of the verbal part is done in two phases.
The first phase translates the word, phrases etc. units word by word.

*******************************************************************

translation is ENABLED
translateEngTOTurk----------TRANSLATION PHASE----->

:
0 1 he <-----subject noun="" p="" phrase="">o
:
7 8 he <-----subject noun="" p="" phrase="">
:
10 14 about in the newspaper <-----preposition p="" phrase="">dahili gazete hakkında
:
8 10 had read <------ p="" verb="">okumuştu
:
6 14 who <-----------relative clause="" comma="" p="" with=""> olduğu
:
4 6 mrs jackson <-----object after="" infinitive="" noun="" p="" phrase="">Bayan jackson
:
2 4 to telephone  <-------------infinitive p="" verb="">telefon etme
:
1 2 decided <------ p="" verb="">karar verdi

*******************************************************************

The second phase produces and corrects the word extensions.

*******************************************************************

translateEngTOTurk-----------TRANSLATION PHASE 2nd PROCESSING----->

==================>0 1 he <-----subject noun="" p="" phrase="">
==================>7 8 he <-----subject noun="" p="" phrase="">
==================>10 14 about in the newspaper <-----preposition p="" phrase="">
==================>8 10 had read <------ p="" verb="">
==================>6 14 who <-----------relative clause="" comma="" p="" with="">
==================>4 6 mrs jackson <-----object after="" infinitive="" noun="" p="" phrase="">
++++++OBJ NOUN PHRASE---->Bayan jackson
findFragment fr==0
verbPOS engStruct[j]= decided=1
verbPOS+1 engStruct[j+1]= to telephone=2
infVB exists =telephone
found inf-verb=telephone
found verb=decided
OBJ in infinitive phrase verbOfOBJ==infVerbOfObject= telephone
verbOfOBJ=telephone objDIR=-e
objDIR of OBJECT NOUN PHRASE=-e
Bayan jacksona
==================>2 4 to telephone  <-------------infinitive p="" verb="">
telefon etme
found verb=decided
decide
-e
telefon etmeye
==================>1 2 decided <------ p="" verb="">
*******************************************************************

The final translation is output below.

*******************************************************************

----------------------------------------------------TÜRKÇE
 He decided to telephone Mrs Jackson, who he had read about in the newspaper.

 O dahili gazete hakkında okumuş olduğu Bayan jacksona telefon etmeye karar verdi.

Done
Done

As you might notice the verb phrase 'read about' is translated wrong and this
causes further complications of the preposition ‘in’.  The phrasal expressions
formed by more than one words are not handled yet.  These include verb + preposition,
multiple word verbs, multiple word adjectives, multiple word prepositions  etc.

Currently, LANGANA at the translation phase of relative clauses ('who' is almost finished),
and subjunctive conjunction, correlative conjunctions, their English parse phases are
completed.

After that an English to Turkish dictionary improvement is urgent.  Webster improvement,
phrase processing  will follow.

ACKNOWLEDGMENTS
Birol BAŞARAN, Ali Tamer ÜNAL, Mehmet Niyazi SARAL have provided financial and intellectual
support by ordering related projects.

REFERENCES
[1] Yandex School of Data Analysis
Russian-English Machine Translation System for WMT14
Alexey Borisov and Irina Galinskaya
Yandex School of Data Analysis
16, Leo Tolstoy street, Moscow, Russia
Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 66–70,
Baltimore, Maryland USA, June 26–27, 2014.
[2] Purdue OWL https://owl.english.purdue.edu/owl
[3] British Council
https://learnenglish.britishcouncil.org/en
[4] 5000 test sentences regression test outputs https://sourceforge.net/projects/turkishlanguageparser/files/English%20to%20Turkish%20Translation%20Engine/20160605ARS%20English%20Translator%20Regression%20Test%205000%20test%20sentences%20_%20translation.txt/download
[5] English Parser outputs https://sourceforge.net/projects/turkishlanguageparser/files/English%20Language%20Syntax%20Parser/
[6] LANGANA testcases for Correlational Conjunctions in comparison with YANDEX http://tekne-techne.blogspot.com.tr/2016/03/langana-testcases-for-correlational.html