![]() |
The Lancaster Stemming Algorithm |
| Evaluation Program : | |||
| Home | The evaluation program develop by Rob Hooper at Lancaster University utilises the technique presented Paice, C.D. (1996) Method for Evaluation of Stemming Algorithms based on Error Counting, JASIS, 47(8): 632-649 to determine the performance of various stemmers. The program assists the user in the creation of the grouped word files necessary for the evaluation technique and presents the calculated results for the stemming algorithm selected by the user. The program was developed in Java and uses the Swing API to create a GUI front-end the gives the user access to all the functionality of the program, a screenshot of the GUI is provided below; The program was used to perform tests on the three algorithms and together with an additional optimisation package was used in the creation of a new rule set for the Lovins stemmer, included in the download. All the code is available for download through the following link together with the report detailing the optimisation of the rule sets Evaluation Program and Rule Sets The aim of the project was to develop a systematic procedure for the optimisation of stemming algorithms based on the retention of information about the behaviour of a stemmer during the conflation process. The information allowed for endings/rules involved in a high proportion of errors to be identified and steps taken to reduce the number of incorrect operations performed. The project focussed initially on the Lovins stemmer with the following graph demonstrating the performance gains with the new rule sets The values plotted are the ERRT (error rate relevant to truncation, defined in Paice, C.D. (1996) Method for Evaluation of Stemming Algorithms based on Error Counting, JASIS, 47(8): 632-649) values for the listed stemmers together with the results for the new rule sets during the different optimisation steps. A full presentation of the results is available in the project report |
||
| Introduction | |||
| Stemming Algorithms | |||
| Algorithm Implemenatations | |||
| Evaluation Techniques | |||
| Evaluation Program | |||
| Resources | |||
| Bibliography | |||