Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning #2110

Open
icoxfog417 opened this issue Jan 5, 2023 · 0 comments

Comments

@icoxfog417
Copy link
Member

icoxfog417 commented Jan 5, 2023

一言でいうと

大規模なモデルの学習に必要なデータが枯渇する可能性を提示した研究。SNSやCommon Crawlなど生のテキストの増加率が6-17%/年、それらを編集したコーパスの増加率が4-5%/年で、前者は20302040年、後者は20232027年頃にモデルの学習に必要な量を満たさなくなると推計している

論文リンク

https://arxiv.org/abs/2211.04325

著者/所属機関

Pablo Villalobos, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, Anson Ho

  • Epoch
  • University of Aberdeen
  • MIT Computer Science & Artificial Intelligence Laboratory
  • Centre for the Governance of AI
  • University of Tübingen

投稿日付(yyyy/MM/dd)

2022/10/26

概要

新規性・差分

手法

結果

コメント

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant