Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to do sub-sentence level extractive summarization? #41

Open
Hellisotherpeople opened this issue Dec 21, 2020 · 1 comment
Open

Comments

@Hellisotherpeople
Copy link

Hellisotherpeople commented Dec 21, 2020

After reading the documentation, it looks like the Extractive Summarization components only score sentences. While this is how the vast majority of extractive summarization papers work, some extractive summarization systems and datasets work at the word level of granularity (namely, my own work is exclusively word-level extractive summarization)

Is there some way to make TransformerSum work at the word level of granularity out of the box? When I trained extractive word-level models, I used a final token classification head for it. Maybe that can be implemented here alongside the current sentence scoring heads?

@HHousen
Copy link
Owner

HHousen commented Dec 22, 2020

@Hellisotherpeople Out of the box, TransformerSum only supports extractive summarization at the sentence level. It doesn't support word level granularity yet. This could be implemented into the library. However, there are no plans to integrate it yet since I'm not familiar with word-level extractive summarization. Possibly in the pooling module we could add another option that passes the token vectors through a classifier without condensing them into sentence vectors. We also may need to change the testing method to work at the word level. I will look into this sometime this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants