Tuesday 10 February 2015

LANGANA-e English parser control set test results




I had indicated that I have tested my solution with 30 sentences from KANDEL’s book.  I corrected all the mistakes of my English parser program by working on these examples.  Here are the statistical information about them.

Sentences between min and max sentence sequence in the book are indicated.

min=20890;max=20895;        94 words             5 sentences

min=20885;max=20890;        82 words             5 sentences

min=20880;max=20885;        70 words             5 sentences

min=20875;max=20880;        135 words           5 sentences

min=15035; max=15045;       228 words           10 sentences

 

Control set test is necessary to check how the program  behaves when it is used to parse ‘new’ or unexpected texts.  This shows the fitting level of the program.  When a solution is produced using a fixed number of test cases the success depends on the quality and the quantity of test cases.  If you select a too few number of cases your solution will not work.  Even if you choose many cases they have to be linearly independent so that they cover the input space cases as a whole.

 

I chose examples from different parts of KANDEL’s book so that they are possibly written by different writers.

min=15062; max=15072      227 words    10 sentences     5 mistakes          % 2.2

min=15125; max=15135      200 words    10 sentences     5 mistakes         % 2.5

min=33948; max=33958      245 words     10 sentences     8 mistakes          % 3.2

min=2679; max=2689          153 words     10 sentences     5 mistakes          % 3.3

min=42150; max=42160      232 words     10 sentences     5 mistakes          % 2.2

 

The result is; % 2.8 of the words are parsed wrong.  More importantly the fact is that almost half of the sentences have at least one mistake, possibly the only one. 

 

 The average sentence length is 21 words.   Smaller sentences tend to have less errors.

 

This shows my model has to be improved so that there is at most 1 mistake in ten sentences.  This corresponds to approx. 1 / 200 = % 0.5 mistakes.

 

I will continue to do example testing but also I will do theoretical work to outline the linearly independent test cases so that I can catch cases that I may not encounter by chance.

 

 

Saturday 7 February 2015

News about my LANGANA-e English parser project

The first stage of parsing is determining the type of words in the given sentence's context.  This is done by rule-based programming which detects impossible cases and eliminates multiple types of a given word.  For example an adjective can not precede a adverb except the word 'most' and etc. in the form of adjective.  Yet another example is; an adjective can not be the last word in a compound name.


I have done 30 sentences from Kandel's book using this approach with 0 mistakes.  I will do some cleaning and make an abstraction of what I do and put down the principles now.  I will continue, with question sentences and passive and inverted sentences.  After this stage, I will leave the multiple cases if I can not clean them totally.  At the end, I will  make a phase in which idioms, phrases, clauses, compound and complex sentences will be parsed using the output of the previous phases.

You may find below an example parse output from KANDEL's 'Principles of Psychology' book.

Sentence=20891-------------------------------------------------------------->
. More importantly, the transmitter-induced increase in membrane conductance perturbs the critically tuned resonant circuit in the hair cell's membrane, thus decreasing both the sharpness of frequency selectivity and the gain of electrical amplification.


More
||adv. manner


importantly
||adv.


the
||+def. art.


transmitter-induced
||a.


increase
||n.


in
||+prep.


membrane
||n.


conductance
||n.


perturbs
||v. t.||present t.


the
||+def. art.


critically
||adv. manner


tuned
||v. i.||imp. & p. p. Tunned||p. pr. & vb. n. Tunning


resonant
||a.


circuit
||n.


in
||+prep.


the
||+def. art.


hair
||n.


cell's
||poss. n.


membrane
||n.


thus
||adv. manner


decreasing
||v. i.||imp. & p. p. Decreased||p. pr. & vb. n. Decreasing||v. t.


both
||conj.


the
||+def. art.


sharpness
||n.


of
||+prep.


frequency
||n.


selectivity
||n.


and
||+conj.


the
||+def. art.


gain
||n.


of
||+prep.


electrical
||a.


amplification
||n.