The Lancaster Stemming Algorithm
  Lovins Stemmer:    
Home

The Lovins stemming algorithm was first presented in 1968 by Julie Beth Lovins. It is a single pass, context sensitive stemmer, which removes endings based on the longest-match principle. The stemmer was the first to be published and was extremely well developed considering the date of it's release and has been the main influence on a large amount of the future work in the area

The stemmer is considered complex for its age and utilises a large list (297) of endings, each of which is associated with one of a number of qualitative contextual restrictions that prevent the removal of endings in certain circumstances. The stemmer utilises a number of rules that are designed to cope with the most common exceptions. All endings are associated with the default exception, that a stem must be at least two letters long, which is designed to prevent the production of ambiguous stems. Other rules assert one of the following conditions on the ending's removal,

  • Increasing the minimum length of a stem following a ending's removal.
  • Preventing the removal of endings when certain letters are present in the remaining stem.
  • Combinations of the above restrictions.

When developing the stemmer Lovins described the most desirable form of context sensitive rule as one that can be generalised to apply in numerous situations. During the development of the stemmer it was discovered that few examples of such rules could be found. For each ending a number of special cases exist that cause erroneous stems to be produced, these are often unique to the ending and a number of rules would have to be developed that would prevent only a small number of errors. This process would require large amounts of time and data, and would offer diminishing improvements in performance over time. For this reason it was decided to deal with the more obvious exceptions and to hopefully limit the number of errors that remain unaccounted for in the exception list.

The flowchart details the complete algorithm proposed by Lovins and illustrates the two main stages of the algorithm. The stemming phase has been discussed above and includes the removal of endings and the testing of associated exceptions among other steps. The second part of the algorithm is the recoding phase.

The term spelling exception is used to cover all the circumstances in which a stem may be spelled in more than one way. The majority of these exceptions that occur in English are due to “Latinate derivations” such as matrix and matrices. Other types of exceptions occur that can be attributed to differences in British and American spellings, such as analysed and analyzed, or to basic inflexion rules that cause the doubling of certain consonants when a suffix is added. Two ways are proposed by Lovins to deal with this problem which are called recoding and partial matching. The original paper explains the decision process that led to the implementation of a recoding phase in the algorithm.

Introduction
Background Information
Stemming Algorithms
Porter
Lovins
Paice/Husk
Others
Algorithm Implemenatations
Evaluation Techniques
Evaluation Program
Resources
Bibliography