Incredible! Excellent news!
LANGANA-e, my English Language syntax parser has completed its first phase.
It can parse completely new texts that it incounters for the first time with % 98.2 percent precision. It is now possible to correct the remaining mistakes by hand and produce tangible parse outputs. You can find below a sample parse output produced by LANAGANA-e.
AN EXAMPLE SENTENCE OUTPUT:
Sentence=17188-------------------------------------------------------------->
. An inward current flows through cGMP-gated channels, which are confined to the photoreceptor's outer segment, while an outward K+ current flows through nongated K+-selective channels, which are like those of other neurons and are confined to the inner segment.
.
An
||+indef. art.
inward
||a.
current
||a.
flows
||n.||pl.
through
||+prep.
cGMP-gated
||a.
channels
||n.||pl.
,
which
||nom. pron.
are
||+reg. v.
confined
||v. t.||imp. & p. p. Confined||p. pr. & vb. n. Confining
to
||+prep.
the
||+def. art.
photoreceptor's
||poss. n.
outer
||a.
segment
||n.
,
while
||+conj.
an
||+indef. art.
outward
||a.
K+
||pn.
current
||a.
flows
||n.||pl.
through
||+prep.
nongated
K+-selective
||a.
channels
||n.||pl.
,
which
||nom. pron.
are
||+reg. v.
like
||+prep.
those
||+obliq. pron.
of
||+prep.
other
||a.
neurons
||n.||pl.
and
||+conj.
are
||+reg. v.
confined
||v. t.||imp. & p. p. Confined||p. pr. & vb. n. Confining
to
||+prep.
the
||+def. art.
inner
||a.
segment
||n.
#word=38
I)
I had used 50 sentences / 1057 words long various excerpts from KANDEL's 'Principles of Neural Science' and I have published the results in my techne blog.
PREVIOUS TEST CASE RESULTS:
http://tekne-techne.blogspot.com.tr/2015/02/langana-e-english-parser-control-set.html
TOTAL
# of sentences = 50
# of words = 1057
# of ERRORs = 28
% of word errors in words = % 2.6
# of word errors per sentence = 0.56
The average sentence length is 21 words.
II)
Using this experience and knowledge accumulated in LANGANA-e I tested text between 17045 - 17200th sentences. The trial numbers indicate the test results accrued at the end of each programming-teaching effort phase.
Please note that LANGANA-e continues to learn and improve when more effort is put in.
17045 - 17200
# of sentences = 155 1st trial 2nd 3rd 4th 5th
# of words = 2947
# of ERRORs = 155 106 89 74 69 57 48 33 13
% of word errors in words = % 5.29 % 3.59 % 3.02 % 2.51 % 2.34 % 1.93 % 1.62 %1.12 % 0.04
# of word errors per sentence = 1.00 0.68 0.58 0.48 0.45 0.38 0.31 0.21 0.08
The average sentence length is 17 words.
III)
After reducing the error rate to less than 4 in one thousand I decided to complete the 'Visual Processing by the Retina' chapter of KANDEL's 'Principles of Neural Science'.
The below given results are without any correction, they are solely what LANGANA-e understands syntactically according to what I taught her in the previous phase.
17200 - 17434
17200 + # of sentences : 100 200 234
17200 + # of words : 1972 4005 4597
17200 + # of ERRORs : 35 72 86
% of word errors in words = % 1.77 % 1.79 % 1.87
# of word errors per sentence = 0.35 0.36 0.35
The average sentence length is 20 words.
IV)
I have corrected the remaining mistakes by hand. You can find a almost zero mistake, precise word based-syntactical parse of the26thChapter of KANDEL's 'Principles of Neural Science' at SourceForge address:
https://sourceforge.net/projects/turkishlanguageparser/files/English%20Language%20Syntax%20Parser/
V) WHAT'S NEXT?
First, the higher level parse to detect subject, object, adverbial and noun phrases usw. has to be done.
After this various applications may be developed:
1- A given text may be parsed offline. Online questions may be answered
according to this reference text.
2- A translation engine for English to Turkish may be developed.
3- Advanced search engine for medical references may be developed to get very quick answers from big books.
4- Aviation maintenance reference books
5- Ease in processing very long Legal documents.
6- Scanning of legal voice recordings, telephone companies, banks etc.
Sunday, 5 April 2015
Subscribe to:
Posts (Atom)