Saturday, June 25, 2005

The Burrows-Wheeler Transform: Theory and Practices

The Burrows-Wheeler Transform: Theory and Practices: "In this paper we describe the Burrows-Wheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today."

Modern Information Retrieval - Porter's Algorithm

Modern Information Retrieval - Porter's Algorithm: "The rules in the Porter algorithm are separated into five distinct phases numbered from 1 to 5. They are applied to the words in the text starting from phase 1 and moving on to phase 5. Further, they are applied sequentially one after the other as commands in a program. Thus, in the immediately following, we specify the Porter algorithm in a pseudo programming language whose commands take the form of rules for suffix substitution (as above). This pseudo language adopts the following (semi-formal) conventions: "