The purpose of this project was to create a parser which would extend its lexicon by interacting with the user to learn new words whenever it attempted to parse a word which it didn't know. The parser was based off of my previous work done for PSYC-526 which is available at here. The project was broken up into three areas: 1. Extend the existing parser to make a user friendly interface. The underlying lisp interpreter should be relatively transparent to the user. 2. Improve the lexicon to accommodate more than just the meaning of the lexeme. Include information on the part of speech, verb form, and other relevant information. 3. Have the parser learn new words when it encounters a word which it doesn't recognize. All three parts were successfully completed. The improved engine is available in Appendix A and the improved dictionary is available in Appendix B. Part one was intended to be the simplest part of the project when in fact it turned out to be the most difficult. Lisp was not designed to input strings -- only lisp expressions (atoms and lists). In the end I was able to accomplish all I intended to though. The read-line function is the work horse behind the input however I had to use a function from the Eliza program in order to convert the string into a list to be parsed. Output was much easier as the princ function did most of what I needed. I did however need to write a function in order to print a list of strings such that only the information in the strings was printed. I simplified my implementation by encapsulating the actual parsing "engine" with another function. All that is required of the user is the execution of the parser function from the lisp prompt. My test output which demonstrates the interface is in Appendix C. Part two actually turned out to be the easiest part of the project. The dictionary is implemented as an a-list. While this would prove unwieldy and slow for a large dictionary, for the purposes of a small research program such as this one, it works very efficiently. Originally, the second item of the a-list was a string representing the meaning of the word or phrase. In part two I changed it into a lisp structure which stores the meaning as a string. It also stores a possible action which could be performed (which is stored as a lisp function call), the part of speech the word or phrase belongs to, the agreement such as first person singular and the verb form and sub category for verbs. The use of a structure also allows us to easily modify and extend this structure during future development for instance to allow for truth and context information. While most of the fields of the structure are not used in this version of the parser, they allow for the storage of important information which may be used by future versions. The field which provides the most promise is the action. This field would allow the storage of lisp functions which would be executed upon the parsing of the sentence. This would most likely be extended through the use of variables. For example the word set may have the following action: (setq (? N) (? O)) so that when the sentence "set x to 25" is parsed, the parser executes the function: (setq x 25) Finally, learning was implemented. This was the primary purpose of implementing a user interface at this stage: since the user will be required to interact with the parser for learning the rest of the parsing process would intuitively be interactive as well. The basic idea is that if the sentence is parsed down to a word which is not in the dictionary, the parser gets the vital information on that word (most all of the information in the lexeme structure) and adds it to the dictionary. The learn function which does all this is easily modified to accommodate new fields for the lexeme. Through very basic flow control, the learn function only prompts for the required fields, for instance the verb form is only requested if the word is a verb. Appendix C demonstrates the learning parser in action. In the first sentence, the entire sentence is understood, and the information on it is printed out. In the second, which is similar to the first, a few words are not understood, so after they are added to the lexicon, the meaning of the entire sentence is printed. In the final sentence, nothing is previously known, so all the words in the sentence are learned and the meaning of the entire sentence is again returned. This sentence also demonstrates how the information on the verb form and sub-category are only retrieved in the case of a verb. Overall, this project turned out as I expected. Very useful additions were made to the parser and I learned a great deal about Natural Language Processing, LISP and the field of AI in general.
History:
12/2/96 - Origional paper finished
1/17/97 - converted to HTML
Last Modified: 5/22/2000