![]() |
The Lancaster Stemming Algorithm |
| Stemming Algorithms: | |||
| Home | Affix removal conflation techniques are referred to as stemming algorithms and can be implemented in a variety of different methods. All remove suffices and/or prefixes in an attempt to reduce a word to its stem.. The algorithms that are discussed in the following sections, and those that will be implemented in this project, are all suffix removal stemmers. During the development of a stemmer the issues of iteration and context awareness must be addressed. Suffices that are concatenated to words are often done so in a certain order, such that a set of order-classes will exist among suffices. An iterative stemming algorithm will remove suffices one at a time, starting at the end of the word and working towards the beginning. An issue also exists about whether a stemmer should be context-free or context-sensitive. A context-sensitive algorithm involves a number of qualitative contextual restrictions that are developed to prevent the removal of endings that, in certain situations, can lead to erroneous stems being produced. A context free algorithm removes endings with no restrictions placed on the circumstances of the removal. The linked sections to the left will give explanations, including flowcharts, of the three major stemming algorithms, Porter, Lovins and Paice/Husk, together with a brief explanation of two other stemmers.
|
||
| Introduction | |||
| Stemming Algorithms | |||
| Porter | |||
| Lovins | |||
| Paice/Husk | |||
| Others | |||
| Algorithm Implementations | |||
| Evaluation Techniques | |||
| Evaluation Program | |||
| Resources | |||
| Bibliography | |||