Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1. I want to know whether "Preprocessed Data" is generated through "Generate Candidate Summaries". 2. If "Own Data" is processed through preprocess.py, then the test.source, test.source.tokenized and other files in src_dir Where did it come from, was it created manually? #27

Open
cq-cdy opened this issue Nov 16, 2022 · 1 comment

Comments

@cq-cdy
Copy link

cq-cdy commented Nov 16, 2022

No description provided.

@thaokimctu
Copy link

I think you have to created source and target files manually from raw data by using the code here facebookresearch/fairseq#1391 which was modified from the code provided by the author https://github.com/abisee/cnn-dailymail. After that, gen_candidate.py was used to create out files and then you create tokenized files by following the instruction of Evaluate section from README. Finally, use preprocess.py to create new dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants