Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the generated sentence have little to do with the original? #4

Open
uhauha2929 opened this issue Apr 27, 2018 · 1 comment

Comments

@uhauha2929
Copy link

我用中文语料做的测试,词语和字符我都试过了,生成的句子貌似也有可读性,但是和原文出入较大,不知道是什么原因,是模型太简单了?

@uhauha2929 uhauha2929 changed the title 为什么生成的摘要和原文没有太大的关系? Why does the generated sentence have little to do with the original? Apr 27, 2018
@chen0040
Copy link
Owner

chen0040 commented May 3, 2018

@uhauha2929 the text body and summarized text use different vocabulary, which might explain what you observed. Also the max_sequence_length is set on the text body, meaning it does not read any text from the text body after the max_sequence_length of words are read from the text body. One way to address the issue u mentioned is to use a single vocabulary for both text body and summarized text or read more texts from the text body. Also depending on the language of the text body (for example, the chinese requires chinese tokenizer may give a better result i think)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants