GPU usage increasing as training progresses #7

thechargedneutron · 2022-06-13T03:08:14Z

Hi,

Thank you for the good work.

What is the GPU size that is required to train this model?
I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

Thank you!

The text was updated successfully, but these errors were encountered:

yixinL7 · 2022-06-15T03:36:58Z

Hi,

What is the GPU size that is required to train this model?

We used RTX 3090 with 24G GPU memory.

I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set.

BRIO/main.py

Line 370 in a32b78e

    
           val_gen_dataloader = DataLoader(val_set, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn_val, sampler=val_sampler)

Please let me know if you have more questions.

RoyZhanyi · 2022-07-18T10:55:23Z

我用两块24G 的3090Ti 是否可以进行模型的训练；
数据来源于您的步骤--->Generate Candidate Summaries --> Preprocess Your Own Data --> Train；
我在尝试的时候发现4块也不能训练；其中候选摘要can_num是16 @yixinL7

tiennvcs · 2022-07-20T07:44:45Z

Hi,

What is the GPU size that is required to train this model?

We used RTX 3090 with 24G GPU memory.

I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set.

BRIO/main.py

Line 370 in a32b78e

val_gen_dataloader = DataLoader(val_set, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn_val, sampler=val_sampler)

Please let me know if you have more questions.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

yixinL7 · 2022-07-20T16:53:31Z

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
https://github.com/yixinL7/BRIO/blob/main/config.py#L25
Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

tiennvcs · 2022-07-21T09:05:24Z

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
https://github.com/yixinL7/BRIO/blob/main/config.py#L25

Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well.
I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.

Thank you again :)

ruili33 · 2022-11-01T01:39:40Z

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
https://github.com/yixinL7/BRIO/blob/main/config.py#L25

Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for the great work. Could you please kindly explain why we should increase the step of gradient accumulation while training on multiple GPUs?

hoboyu11 · 2024-01-15T14:35:20Z

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance.
https://github.com/yixinL7/BRIO/blob/main/config.py#L25

Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.

Thank you again :)

Hi,I'd like to ask how long it takes you to train an epoch with 11GB GPU.Thanks

yixinL7 mentioned this issue Jul 20, 2022

RuntimeError: CUDA out of memory when running training command #11

Closed

hoboyu11 mentioned this issue Jan 15, 2024

> Hi,I'd like to ask how long it takes you to train an epoch.Thanks #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU usage increasing as training progresses #7

GPU usage increasing as training progresses #7

thechargedneutron commented Jun 13, 2022

yixinL7 commented Jun 15, 2022 •

edited

RoyZhanyi commented Jul 18, 2022

tiennvcs commented Jul 20, 2022

yixinL7 commented Jul 20, 2022

tiennvcs commented Jul 21, 2022

ruili33 commented Nov 1, 2022

hoboyu11 commented Jan 15, 2024

GPU usage increasing as training progresses #7

GPU usage increasing as training progresses #7

Comments

thechargedneutron commented Jun 13, 2022

yixinL7 commented Jun 15, 2022 • edited

RoyZhanyi commented Jul 18, 2022

tiennvcs commented Jul 20, 2022

yixinL7 commented Jul 20, 2022

tiennvcs commented Jul 21, 2022

ruili33 commented Nov 1, 2022

hoboyu11 commented Jan 15, 2024

yixinL7 commented Jun 15, 2022 •

edited