Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于对导入LMDB数据集在微调的时候出现并行的问题 #287

Open
jakeallen123 opened this issue Apr 9, 2024 · 1 comment
Open

Comments

@jakeallen123
Copy link

在训练数据集微调的时候,使用自己的数据集(train数据集规模为700条)跑bash run_scripts/muge_finetune_vit-b-16_rbt-base.sh,出现如下问题:
在默认的NUM_WORKERS情况下,出现在读取lmdb的时候(/Chinese-CLIP/cn_clip/training/data.py)出现多条两条数据混在一起的情况
140010034420 397 等,坐享公园纯氧,让您健康无忧. 140010026690140010044832 281 但又看着怀里活泼可爱的儿子,我只能默默流眼泪……我特别想离开但又555 宜春没有办理的抓紧时间! 140010045148 570 高杉真宙,间宫祥太朗加盟《tori girl》演情敌 ... 140010039142 465 140010039108雄鹿夺冠,金靴奖该给谁?字母哥名场面多,但洛佩兹才是 463 戴手表不仅彰显地位身份,更是成熟自信象征!
请问为什么会出现这种情况,以及怎么解决

@ChesonHuang
Copy link

很有可能你的这些数据之间没有换行符\n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants