TEKNE - TECHNE: April 2015

Incredible! Excellent news!

LANGANA-e, my English Language syntax parser has completed its first phase.
It can parse completely new texts that it incounters for the first time with % 98.2 percent precision. It is now possible to correct the remaining mistakes by hand and produce tangible parse outputs. You can find below a sample parse output produced by LANAGANA-e.

AN EXAMPLE SENTENCE OUTPUT:

Sentence=17188-------------------------------------------------------------->
. An inward current flows through cGMP-gated channels, which are confined to the photoreceptor's outer segment, while an outward K+ current flows through nongated K+-selective channels, which are like those of other neurons and are confined to the inner segment.

.
An
||+indef. art.
inward
||a.
current
||a.
flows
||n.||pl.
through
||+prep.
cGMP-gated
||a.
channels
||n.||pl.
,
which
||nom. pron.
are
||+reg. v.
confined
||v. t.||imp. & p. p. Confined||p. pr. & vb. n. Confining
to
||+prep.
the
||+def. art.
photoreceptor's
||poss. n.
outer
||a.
segment
||n.
,
while
||+conj.
an
||+indef. art.
outward
||a.
K+
||pn.
current
||a.
flows
||n.||pl.
through
||+prep.
nongated

K+-selective
||a.
channels
||n.||pl.
,
which
||nom. pron.
are
||+reg. v.
like
||+prep.
those
||+obliq. pron.
of
||+prep.
other
||a.
neurons
||n.||pl.
and
||+conj.
are
||+reg. v.
confined
||v. t.||imp. & p. p. Confined||p. pr. & vb. n. Confining
to
||+prep.
the
||+def. art.
inner
||a.
segment
||n.
#word=38

I)

I had used 50 sentences / 1057 words long various excerpts from KANDEL's 'Principles of Neural Science' and I have published the results in my techne blog.

PREVIOUS TEST CASE RESULTS:
http://tekne-techne.blogspot.com.tr/2015/02/langana-e-english-parser-control-set.html

TOTAL
# of sentences = 50
# of words = 1057
# of ERRORs = 28
% of word errors in words = % 2.6
# of word errors per sentence = 0.56
The average sentence length is 21 words.

II)

Using this experience and knowledge accumulated in LANGANA-e I tested text between 17045 - 17200th sentences. The trial numbers indicate the test results accrued at the end of each programming-teaching effort phase.

Please note that LANGANA-e continues to learn and improve when more effort is put in.

17045 - 17200

# of sentences = 155       1st trial     2nd     3rd     4th     5th
# of words = 2947
# of ERRORs = 155           106   89      74      69      57      48     33  13
% of word errors in words = % 5.29    % 3.59 % 3.02  % 2.51  % 2.34  % 1.93  % 1.62  %1.12 % 0.04
# of word errors per sentence = 1.00     0.68    0.58    0.48    0.45    0.38    0.31   0.21   0.08
The average sentence length is 17 words.

III)

After reducing the error rate to less than 4 in one thousand I decided to complete the 'Visual Processing by the Retina' chapter of KANDEL's 'Principles of Neural Science'.

The below given results are without any correction, they are solely what LANGANA-e understands syntactically according to what I taught her in the previous phase.

17200 - 17434

17200 + # of sentences :                 100       200       234
17200 + # of words :                      1972     4005     4597
17200 + # of ERRORs :                    35          72         86
% of word errors in words =       % 1.77 % 1.79  % 1.87
# of word errors per sentence =      0.35       0.36     0.35
The average sentence length is 20 words.

IV)

I have corrected the remaining mistakes by hand. You can find a almost zero mistake, precise word based-syntactical parse of the26thChapter of KANDEL's 'Principles of Neural Science' at SourceForge address:
https://sourceforge.net/projects/turkishlanguageparser/files/English%20Language%20Syntax%20Parser/

V) WHAT'S NEXT?

First, the higher level parse to detect subject, object, adverbial and noun phrases usw. has to be done.

After this various applications may be developed:

1- A given text may be parsed offline. Online questions may be answered
according to this reference text.

2- A translation engine for English to Turkish may be developed.

3- Advanced search engine for medical references may be developed to get very quick answers from big books.

4- Aviation maintenance reference books

5- Ease in processing very long Legal documents.

6- Scanning of legal voice recordings, telephone companies, banks etc.

TEKNE - TECHNE

INFORMATION

Blog Archive

About Me

Sunday, 5 April 2015

LANGANA-e English Language syntax parser completes firs phase successfully