Skip to content

d0r1h/ILC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



Website

DataSet

We have scraped and complied a corpus of 3k+ Indian legal judgments and their parallel summaries.

from datasets import load_dataset

dataset = load_dataset("d0r1h/ILC")

train_set = pd.DataFrame(dataset['train'])
test_set = pd.DataFrame(dataset['test'])

Code

git clone https://github.com/d0r1h/ILC.git
cd ILC
pip install -r requirement.txt

Summarzing using Extractive approach

!python Code/Models/extractive.py \
        --output_dir dir_name \
        --text_column text \
        --summary_column summary \
        --data_file data.csv \
        --sentence_count 3 

Training LED using Abstractive approach

!python Code/Models/led_summarization.py \
        --model_name  allenai/led-base-16384 \
        --text_column  Case \
        --summary_column Summary    \
        --max_input_length  8192 \
        --max_output_length  600 \
        --batch_size 2 \
        --num_beams 2 \
        --output_dir output_dir_name

Inference on test-set using led-base-ilc model

Notebook Colab
led-base-ilc Open In Colab

Results:

Following results are obtained on test-set with transformer based models and extractive methods

Algorithm / model Rouge-1 Rouge-2 Rouge-L
Extractive
SumBasics 15.69 6.02 14.48
LSA 21.20 7.37 19.76
KLSum 21.40 10.19 19.66
LexRank 33.09 16.81 22.99
TextRank 34.54 18.10 31.11
Abstractive
LedBase 4.31 1.08 4.11
Led-ilc 42.24 23.18 39.30

About

Indian Legal Corpus for Summarization

Resources

License

Stars

Watchers

Forks