![]() |
The Lancaster Stemming Algorithm |
| Paice/Husk: | |||
| Home | The Paice/Husk stemmer was first published in 1990] and was developed by Chris Paice with the assistance of Gareth Husk. The stemmer is a conflation based iterative stemmer. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive [4] The stemmer utilises a single table of rules, each of which may specify the removal or replacement of an ending. This technique of replacement is used to avoid the problem spelling exceptions as described earlier, by replacing endings rather than simply removing them the stemmer manages to do without a separate stage in the stemming process, i.e. no recoding or partial matching is required. This helps to maintain the efficiency of the algorithm, whilst still being effective. The rules are indexed by the last letter of the ending to allow efficient searching and are of the following form;
The following example demonstrate how the rules are stored and used and how replacement can be used to negate the need for recoding. The rule nois4j> causes sion endings to be replaced by j. This acts as a pointer to the j section of the rules, leading to the following transformation; provision -> provij -> provid. The j-transformation is utilised to ensure that the terms provision and provide are correctly conflated to the stem provid. The algorithm has four main steps detailed below, and presented in the flowchart |
||
| Introduction | |||
| Background Information | |||
| Stemming Algorithms | |||
| Porter | |||
| Lovins | |||
| Paice/Husk | |||
| Others | |||
| Algorithm Implemenatations | |||
| Evaluation Techniques | |||
| Evaluation Program | |||
| Resources | |||
| Bibliography | |||