GitHub - ivmarkp/text_summarization_major: Part 1 of our Major academic project on Automatic Text Summarization

Text Summarization Algorithms

A Comparative Study

Developed as a part of our Semester 7 Major Project, this repository contains scripts and code to run and test the performance of popular text summarization algorithms. The algorithms studied are:

DataSet

For our experiments, the Opinosis dataset was used. It can be obtained here

@inproceedings{ganesan2010opinosis,
 title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
 author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
 booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
 pages={340--348},
 year={2010},
 organization={Association for Computational Linguistics}
}

Performance Metric

To compare the relative performance of the algorithms, a simple implementation of ROGUE-1 metric in python was used.

Replicating project results

To imitate the results of our project, one may do the following:

Clone this repository and ensure that the Opinosis Dataset is present. If not, download from the link above and extract into data/.
Run the run-project script.
```
$ sh +x run-project.sh
```
This script will clean the dataset, extract keywords, run the algorithms on the dataset, and print their respective running times and ROGUE-1 scores.
- Individual performances of each of the algorithms can be computed by simply first running the $algorithm/$algorithm.py script, followed by running the rogue_one script with:
```
$ python rogue_one.py --gold data/summaries_keywords --test $algorithm/results
```

Dependencies

python 2.7+
nltk
sumy
networkx

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
lexrank		lexrank
lsa		lsa
preprocessing		preprocessing
textrank		textrank
.gitignore		.gitignore
README.md		README.md
compute_rogue_scores.sh		compute_rogue_scores.sh
filter_data.sh		filter_data.sh
rogue_one.py		rogue_one.py
rogue_one_human.py		rogue_one_human.py
run_project.sh		run_project.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

lexrank

lexrank

lsa

lsa

preprocessing

preprocessing

textrank

textrank

.gitignore

.gitignore

README.md

README.md

compute_rogue_scores.sh

compute_rogue_scores.sh

filter_data.sh

filter_data.sh

rogue_one.py

rogue_one.py

rogue_one_human.py

rogue_one_human.py

run_project.sh

run_project.sh

Repository files navigation

Text Summarization Algorithms

A Comparative Study

DataSet

Performance Metric

Replicating project results

Dependencies

About

Releases

Packages

Languages

ivmarkp/text_summarization_major

Folders and files

Latest commit

History

Repository files navigation

Text Summarization Algorithms

A Comparative Study

DataSet

Performance Metric

Replicating project results

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Languages