The Lancaster Stemming Algorithm
  Paice/Husk:    
Home

The Paice/Husk stemmer was first published in 1990] and was developed by Chris Paice with the assistance of Gareth Husk. The stemmer is a conflation based iterative stemmer. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive [4]

The stemmer utilises a single table of rules, each of which may specify the removal or replacement of an ending. This technique of replacement is used to avoid the problem spelling exceptions as described earlier, by replacing endings rather than simply removing them the stemmer manages to do without a separate stage in the stemming process, i.e. no recoding or partial matching is required. This helps to maintain the efficiency of the algorithm, whilst still being effective. The rules are indexed by the last letter of the ending to allow efficient searching and are of the following form;

  • An ending of one or more characters, held in reverse order
  • An optional intact flag '*'
  • A digit specifying the removal total (zero or more)
  • An optional append string of one or more characters
  • A continuation symbol, '>' or '.'

The following example demonstrate how the rules are stored and used and how replacement can be used to negate the need for recoding. The rule nois4j> causes sion endings to be replaced by j. This acts as a pointer to the j section of the rules, leading to the following transformation; provision -> provij -> provid. The j-transformation is utilised to ensure that the terms provision and provide are correctly conflated to the stem provid.

The algorithm has four main steps detailed below, and presented in the flowchart
1.Select relevant section; Inspect the final letter of the term and, if present, consider the first rule of the relevant section of the rule table.
2.Check applicability of rule; If final letters of term do not match rule, or intact settings are violated or acceptability conditions are not satisfied go to stage 4.
3.Apply Rule; Remove or reform ending as required and then check termination symbol, and either terminate or return to stage 1.
4.Look for another rule; Move to the next rule in table, if the section letter has changed then terminate, else go to stage 2.

Introduction
Background Information
Stemming Algorithms
Porter
Lovins
Paice/Husk
Others
Algorithm Implemenatations
Evaluation Techniques
Evaluation Program
Resources
Bibliography