Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add re-sft scripts #14

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Add re-sft scripts #14

wants to merge 12 commits into from

Conversation

so298
Copy link
Collaborator

@so298 so298 commented Feb 16, 2024

max_seq_length 4096
bf16
lr_scheduler_type cosine
warmup_ratio 0.1
learning_rate 2e-5 or 1e-4
global_batch_size = gradient_accumulation_steps * n_gpu * per_device_train_batch_size = 64

で設定

出力先は/model/7B_HF_RE/*

  • lora-all-jaster
  • lora-all-jaster-lr_1e-4
  • lora-all-gpt4-self-inst
  • lora-all-gpt4-self-inst-lr_1e-4

を追加

@so298 so298 changed the title Add exp1 sft scripts Add re-sft scripts Feb 16, 2024
@so298
Copy link
Collaborator Author

so298 commented Feb 16, 2024

exp2とexp3を追加

@so298
Copy link
Collaborator Author

so298 commented Feb 17, 2024

exp6とexp8のgpt4-self-instのデータセットに関してはsingle-GPU学習だとメモリ不足で学習ができなかったので、1 node 8 GPUを利用した学習へと変更

$\text{global batch size} = N_\text{GPUs} \times \text{per device batch size} \times \text{gradient accumulation steps} = 64$
となるようにper_device_batch_size = 1, gradient_accumulation_steps = 8に調整

@so298 so298 marked this pull request as ready for review May 22, 2024 13:53
@so298 so298 requested a review from a team as a code owner May 22, 2024 13:53
@so298
Copy link
Collaborator Author

so298 commented May 22, 2024

@llm-jp/modelwg
draftのまま放置してしまっていました
今更ですが、誰かreviewとマージを行っていただけるでしょうか?

@hiroshi-matsuda-rit
Copy link
Member

明日以降、私の方でレビューを行いたいと思いますが、実験の状況を理解していないものが多くあるので、いろいろ確認を入れさせてもらうかもしれません。ご了承ください。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants