Stemming

The Lovins Stemmer



The Lovins Stemmer is a single pass, context-sensitive, longest-match Stemmer developed by Julie Beth Lovins of Massachusetts Institute of Technology in 1968. This early stemmer was targeted at both the IR and Computational Linguistics areas of stemming.

This stemmer, though innovative for its time, has the problematic task of trying to please two masters (IR and Linguistics) and cannot excel at either. The approach does not excel with linguistics, as it is not complex enough to stem many suffixes due to their not being present in the rule list. This is interesting as Lovinsí rule list was derived by, processing and studying a word sample. Perhaps if this process was repeated with a much larger sample a more satisfactory rule list could be derived. There are also known to be problems regarding the reformation of words. This process uses the recoding rules to reform the stems into words to ensure they match stems of other similar meaning words. The main problem with this process is that it has been found to be highly unreliable and frequently fails to form words from the stems, or match the stems of like meaning words. The Stemmer does not excel from the IR viewpoint either, as its large rule set, and its recoding stage, affect its speed of execution. As discussed above, it has also been found to be unreliable.

The Lovins Stemmer removes a maximum of one suffix from a word, due to its nature as single pass algorithm. It uses a list of about 250 different suffixes, and removes the longest suffix attached to the word, ensuring that the stem after the suffix has been removed is always at least 3 characters long. Then the ending of the stem may be reformed (e.g., by un-doubling a final consonant if applicable), by referring to a list of recoding transformations.

J.B. Lovins, 1968: "Development of a stemming algorithm," Mechanical Translation and Computational Linguistics 11, 22-31.

BackBack to: The Offical Paice/Husk Homepage

BackBack to: What is Stemming?


Lancaster University WWW | Computing Department Intranet | Computing Department FTP server

Comments or questions about these web pages to cdp@comp.lancs.ac.uk