Computer Science Department
PARE - An Automatic Text Summarizer
My research is mainly in the area of artificial intelligence, particularly natural language processing. I have focused my efforts lately on the problem of text summarization, which is the problem of compressing the information contained in a document or documents into a smaller summary.
Our system is currently implemented using a word graph, following an algorithm similar to that used by Google for their page ranking. We forge links between words, then weight the importance of individual words based on their links to other important words. The project is currently implemented in Java.
Currently, the summarization of the documents is not perfect - my goals for the project in future years are:
- Clean up the code for the project, making it more readable.
- Make sure everything is implemented in Java and included in the same build.
- Work on the interface, making it easier to use.
- Expand the set of links available in the word graph.
As longer-term work, I would like to explore the possibility of a generative summarization system. Currently, we are employing the sentence extraction method, where the summary is produced by pulling out the more interesting sentences. Agenerative system produces new text and sentences as the summary, and is a much more challenging project.