Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Original corpora from which char-rnn models were created #39

Open
dhowe opened this issue Jun 19, 2018 · 1 comment
Open

Original corpora from which char-rnn models were created #39

dhowe opened this issue Jun 19, 2018 · 1 comment

Comments

@dhowe
Copy link

dhowe commented Jun 19, 2018

is this available somewhere (either as links or download)? thnx

@dhowe dhowe changed the title Original corpora from which models were created Original corpora from which char-rnn models were created Jun 19, 2018
@memo
Copy link
Owner

memo commented Jun 19, 2018

I didn't get round to posting the original corpora, still on my todo list, but in the meantime you can find most of it on http://www.gutenberg.org/. (though you may have to clean the files a bit, to remove the erroneous characters and disclaimers).

There are a number of Trump corpora around, e.g. http://www.thegrammarlab.com/?nor-portfolio=corpus-of-presidential-speeches-cops-and-a-clintontrump-corpus and https://github.com/ryanmcdermott/trump-speeches

Linux kernel source code at https://github.com/torvalds/linux

love song lyrics I scraped from a number of websites (can't remember which ones right now, will try and dig it up).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants