ROUGE score is not matched with Pythonrouge when stemming=True #20

kan-bayashi · 2020-09-29T03:39:54Z

Hi @icoxfog417. Thank you for providing a great tool!
I found that the result is different between Pythonrouge and RougeCalculator when using stemming=True.
I attached the test code to reproduce (just change the option stemming):

class TestRouge(unittest.TestCase):
    DATA_DIR = os.path.join(os.path.dirname(__file__), "data/rouge")

    def load_test_data(self):
        test_file = os.path.join(self.DATA_DIR, "ROUGE-test.json")
        with open(test_file, encoding="utf-8") as f:
            data = json.load(f)
        return data

    def test_rouge_with_stemming(self):
        data = self.load_test_data()
        rouge = RougeCalculator(stopwords=False, stemming=True)
        for eval_id in data:
            summaries = data[eval_id]["summaries"]
            references = data[eval_id]["references"]
            for n in [1, 2]:
                for s in summaries:
                    baseline = Pythonrouge(
                                summary_file_exist=False,
                                summary=[[s]],
                                reference=[[[r] for r in references]],
                                n_gram=n, recall_only=False,
                                length_limit=False,
                                stemming=True, stopwords=False)
                    b1_v = baseline.calc_score()
                    b2_v = rouge_n(rouge.tokenize(s),
                                   [rouge.tokenize(r) for r in references],
                                   n, 0.5)
                    v = rouge.rouge_n(s, references, n)
                    self.assertLess(abs(b2_v - v), 1e-5)
                    self.assertLess(abs(b1_v["ROUGE-{}-F".format(n)] - v), 1e-5) # noqa

Is this expected?
If so, is there any solution to match the results?

The text was updated successfully, but these errors were encountered:

tjh1997 · 2022-02-28T12:59:41Z

@icoxfog417 I found the same problem, argument "stemming" actually didn't work:

from sumeval.metrics.rouge import RougeCalculator

rouge = RougeCalculator(stopwords=False, stemming=True, lang="en")
summary = 'a little long way away from the crowd'
reference = 'hidden getaway from crowds'
print(rouge.rouge_1(summary, reference))  # 0.16666666666666666, word "crowds" didn't be stemmed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROUGE score is not matched with Pythonrouge when stemming=True #20

ROUGE score is not matched with Pythonrouge when stemming=True #20

kan-bayashi commented Sep 29, 2020

tjh1997 commented Feb 28, 2022

ROUGE score is not matched with Pythonrouge when stemming=True #20

ROUGE score is not matched with Pythonrouge when stemming=True #20

Comments

kan-bayashi commented Sep 29, 2020

tjh1997 commented Feb 28, 2022