![]() |
The Lancaster Stemming Algorithm |
| Background to Stemmming: | |||
| Home | Morphology is the area of linguistics that is concerned with the internal structure of words. There are two subclasses of morphology, derivational and inflectional. It is the second of these classes that is of interest when attempting to reduce a word to its morphological root or stem. Some of the basic rules of inflexion involve the plural and possessive forms of nouns and the past and progressive form of verbs. Conflation is concerned with attempting to 'reverse' the inflexion process by performing the inverse operation related to the basic inflexion rules. There has been much research into the improvements that can be made to information retrieval systems by matching variants of words, with resultst that infer the performance of information retrieval systems could be improved There are inherent problems that exist when attempting to conflate English words, one of the more difficult problems is the existence of strong verbs that follow no set pattern for inflexion and will change their stem when forming tenses, e.g. throw, threw, thrown. Other verbs are completely irregular, e.g. go, went, gone. These non-formulaic changes are unpredictable and conflation without the use of a lexicon is virtually impossible without introducing errors. This complexity leads to a number of stemming errors, both with words that are unrelated being conflated together and unrelated terms being matched. However the methods proposed to tackle this issue are complex and to maintain a efficient and effective approach to conflation stemming algorithms have been developed that accept that some errors will occur, but the trade off is made with improved performance.
|
||
| Introduction | |||
| Stemming Algorithms | |||
| Algorithm Implemenatations | |||
| Evaluation Techniques | |||
| Evaluation Program | |||
| Resources | |||
| Bibliography | |||