LANGANA-E English Language Parser project progresses May 2014

Friday, 9 May 2014

LANGANA-E English Language Parser project progresses May 2014

This is part of a dictionary that indicates the types of English words only. Only the words beginning with the letters W-X-Y-Z are included. The other letters will be posted as the work progresses.

This effort is part of an ongoing process in parallel with my Turkish Language processing package LANGANA. I have two aims for LANGANA. The first one is to make a program that reads texts and parses-converts them to a pseudo language output which it can later use to answer questions about the text. The second is to make a quality Turkish-English and vise versa translation motor.

I parsed the last 30 000 lines of the Webster dictionary which is publicly available. The I did a small converter mechanism to exclude the word names and types. My parser is approx. 1000 lines. I progressed with 30-40 lines successes in the beginning and it took many hours to do this. Recently I have seen 2000 lines successess in a matter of 10 minutes. I am looking forward to more improvements and finish this dictionary in a couple of months at most.

-----------------------------------------------
The second group of chars namely S-T-U-V has been added. This has been a considerable endeavour as these chars are explained in approx. 240 000 lines in Webster(1910ver.) My current parser parses approx. 270 000 lines and lists the word types of 25 - 30 000 English words. The whole of Webster is 1 000 000 lines. I have reached a point of saturation in the development of the parser and it has become fairly facile if not easy to proceed. I am looking forward to finish the parser in 1-2 months time.

After the parser is finished I will do fine tuning to decide what items will be included to the output. I will put the output into a MySQL database afterwards and proceed with the rest of my plans.

I will make the output publicly available as the Webster 1910 ver. but I will provide letter S by e-mail, only to requests clearly identified as non-profit.

TEKNE - TECHNE

INFORMATION

Blog Archive

About Me

Friday, 9 May 2014

LANGANA-E English Language Parser project progresses May 2014