Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU usage increasing as training progresses #7

Open
thechargedneutron opened this issue Jun 13, 2022 · 7 comments
Open

GPU usage increasing as training progresses #7

thechargedneutron opened this issue Jun 13, 2022 · 7 comments

Comments

@thechargedneutron
Copy link

Hi,

Thank you for the good work.

  1. What is the GPU size that is required to train this model?
  2. I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

Thank you!

@yixinL7
Copy link
Owner

yixinL7 commented Jun 15, 2022

Hi,

What is the GPU size that is required to train this model?

We used RTX 3090 with 24G GPU memory.

I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set.

BRIO/main.py

Line 370 in a32b78e

val_gen_dataloader = DataLoader(val_set, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn_val, sampler=val_sampler)

Please let me know if you have more questions.

@RoyZhanyi
Copy link

我用两块24G 的3090Ti 是否可以进行模型的训练;
数据来源于您的步骤--->Generate Candidate Summaries --> Preprocess Your Own Data --> Train
我在尝试的时候发现4块也不能训练;其中候选摘要can_num是16 @yixinL7

@tiennvcs
Copy link

Hi,

What is the GPU size that is required to train this model?

We used RTX 3090 with 24G GPU memory.

I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set.

BRIO/main.py

Line 370 in a32b78e

val_gen_dataloader = DataLoader(val_set, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn_val, sampler=val_sampler)

Please let me know if you have more questions.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

@yixinL7
Copy link
Owner

yixinL7 commented Jul 20, 2022

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
    https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

@tiennvcs
Copy link

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
    https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well.
I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.

Thank you again :)

@ruili33
Copy link

ruili33 commented Nov 1, 2022

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
    https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for the great work. Could you please kindly explain why we should increase the step of gradient accumulation while training on multiple GPUs?

@hoboyu11
Copy link

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
    https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.

Thank you again :)

Hi,I'd like to ask how long it takes you to train an epoch with 11GB GPU.Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants