Sunday, 5 April 2015

LANGANA-e English Language syntax parser completes firs phase successfully

Incredible! Excellent news!


LANGANA-e, my English Language syntax parser has completed its first phase.
It can parse completely new texts that it incounters for the first time with % 98.2 percent precision.  It is now possible to correct the remaining mistakes by hand and produce tangible parse outputs.  You can find below a sample parse output produced by LANAGANA-e.


AN EXAMPLE SENTENCE OUTPUT:


Sentence=17188-------------------------------------------------------------->
. An inward current flows through cGMP-gated channels, which are confined to the photoreceptor's outer  segment, while an outward K+ current flows through nongated K+-selective channels, which are like those of other neurons and are confined to the inner segment.


.
An
||+indef. art.
inward
||a.
current
||a.
flows
||n.||pl.
through
||+prep.
cGMP-gated
||a.
channels
||n.||pl.
,
which
||nom. pron.
are
||+reg. v.
confined
||v. t.||imp. & p. p. Confined||p. pr. & vb. n. Confining
to
||+prep.
the
||+def. art.
photoreceptor's
||poss. n.
outer
||a.
segment
||n.
,
while
||+conj.
an
||+indef. art.
outward
||a.
K+
||pn.
current
||a.
flows
||n.||pl.
through
||+prep.
nongated

K+-selective
||a.
channels
||n.||pl.
,
which
||nom. pron.
are
||+reg. v.
like
||+prep.
those
||+obliq. pron.
of
||+prep.
other
||a.
neurons
||n.||pl.
and
||+conj.
are
||+reg. v.
confined
||v. t.||imp. & p. p. Confined||p. pr. & vb. n. Confining
to
||+prep.
the
||+def. art.
inner
||a.
segment
||n.
#word=38


I)


I had used 50 sentences / 1057 words long various excerpts from KANDEL's 'Principles of Neural Science' and I have published the results in my techne blog.
 
PREVIOUS TEST CASE RESULTS:
http://tekne-techne.blogspot.com.tr/2015/02/langana-e-english-parser-control-set.html


TOTAL
# of sentences = 50
# of words = 1057
# of ERRORs = 28
% of word errors in words = % 2.6
# of word errors per sentence = 0.56
 The average sentence length is 21 words.


 II)

Using this experience and knowledge accumulated in LANGANA-e  I tested text between 17045 - 17200th sentences.  The trial numbers indicate the test results accrued at the end of each programming-teaching effort phase.

Please note that LANGANA-e continues to learn and improve when more effort is put in.
 
 17045 - 17200


# of sentences = 155       1st trial     2nd     3rd     4th     5th
# of words = 2947    
# of ERRORs = 155           106   89      74      69      57      48     33  13

% of word errors in words = %  5.29    % 3.59  % 3.02  % 2.51  % 2.34  % 1.93  % 1.62  %1.12 % 0.04
# of word errors per sentence =  1.00     0.68    0.58    0.48    0.45    0.38    0.31   0.21   0.08
The average sentence length is 17 words.


III)


After reducing the error rate to less than 4 in one thousand I decided to complete the 'Visual Processing by the Retina' chapter of KANDEL's 'Principles of Neural Science'.

The below given results are without any correction, they are solely what LANGANA-e understands syntactically according to what I taught her in the previous phase.

17200 - 17434


 17200 + # of sentences :                  100       200        234
17200 + # of words :                      1972      4005      4597
17200 + # of ERRORs :                     35          72          86
% of word errors in words =       % 1.77   % 1.79  % 1.87
# of word errors per sentence =       0.35       0.36      0.35           
The average sentence length is 20 words.


IV)


I have corrected the remaining mistakes by hand.  You can find a almost zero mistake, precise word based-syntactical parse of the26thChapter of KANDEL's 'Principles of Neural Science' at SourceForge address:
https://sourceforge.net/projects/turkishlanguageparser/files/English%20Language%20Syntax%20Parser/

V) WHAT'S NEXT?


First, the higher level parse to detect subject, object, adverbial and noun phrases usw. has to be done.


After this various applications may be developed: 


1- A given text may be parsed offline. Online questions may be answered
according to this reference text.


2- A translation engine for English to Turkish may be developed.


3- Advanced search engine for medical references may be developed to get very quick answers from big books.


4- Aviation maintenance reference books


5- Ease in processing very long Legal documents.


6- Scanning of legal voice recordings, telephone companies, banks etc.