The Lancaster Stemming Algorithm
  Other Stemmers:    
Home

Dawson:

The Dawson stemmer was developed by John Dawson and was first presented in 1974. The stemmer is similar to Lovins as it is a single-pass context-sensitive suffix removal stemmer and was developed at the Literary and Linguistics Centre, Cambridge. The main aim of the stemmer was to take the original algorithm proposed by Lovins and attempt to refine the rule sets and techniques, and to correct any basic errors that exist. The first step was to include all plurals and combinations of the simple suffices, this increased the size of the ending list to approximately five hundred. The second phase was to employ what Dawson called the completion principle in which any suffix contained within the ending list is completed by including all variants, flexions and combinations in the ending list. This increased the ending list once more to approximately one thousand two hundred terms, although no record of this list is available.

A similarity with the Lovins stemmer is that every ending contained within the list is associated with a number that is used as an index to search an list of exceptions that enforce certain conditions upon the removal of the associated ending. These conditions are similar to the Lovins stemmer in that they may enforce either a minimum length of the remaining stem (with a minimum length of two for all stems) or that the ending can only be removed/shall not be removed when set letters are present in the remaining stem.

The major difference between the Dawson and Lovins stemmers is the technique used to solve the problem of spelling exceptions. The Lovins stemmer utilises the technique known as recoding. This process is seen as part of the main algorithm and performs a number of transformations based on the letters within the stem. In contrast the Dawson stemmer utilises partial matching which, as described above, attempts to match stems that are equal within certain limits. This process is not seen as part of the stemming algorithm and therefore must be implemented within the information retrieval system. Dawson warns that without this additional processing many errors would be produced by this stemmer

Krovetz:

The Krovetz stemmer was presented in 1993 by Robert Krovetz and is a linguistic lexical validation stemmer. It is a very complicated low strength algorithm due to the processes involved in linguistic morphology and its inflectional nature.The stemmer utilises the process of dictionary lookup in order to verify all removals that occur in the following steps
1.Transformation of plural to singular forms,
2.Conversion from past to present forms,
3.The removal of ing,
The dictionary lookup also performs any transformations that are required due to spelling exception and also converts any stem produced into a real word, whose meaning can be understood.

Krovetz proposes that due to the high accuracy of the stemmer, but weak strength, it could be useful within information retrieval if used as a form of pre-processing performed before the main stemming algorithm. This would provide partly stemmed input for the stemmer that deals with common situations accurately and effectively, and therefore could reduce common errors [9]

Introduction
Background Information
Stemming Algorithms
Porter
Lovins
Paice/Husk
Others
Algorithm Implemenatations
Evaluation Techniques
Evaluation Program
Resources
Bibliography