Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

执行finetune流程时报错 #305

Open
Cupies opened this issue Apr 23, 2024 · 1 comment
Open

执行finetune流程时报错 #305

Cupies opened this issue Apr 23, 2024 · 1 comment

Comments

@Cupies
Copy link

Cupies commented Apr 23, 2024

[2024-04-23 12:06:31,944] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
Traceback (most recent call last):
File "E:\JupyterNotebookSpace\Chinese-CLIP\cn_clip\training\main.py", line 17, in
from cn_clip.clip.model import convert_weights, convert_state_dict, resize_pos_embed, CLIP
ImportError: cannot import name 'convert_state_dict' from 'cn_clip.clip.model' (D:\Anaconda3\envs\Mlearn\lib\site-packages\cn_clip\clip\model.py)
[2024-04-23 12:06:37,013] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 27588) of binary: D:\Anaconda3\envs\Mlearn\python.exe
Traceback (most recent call last):
File "D:\Anaconda3\envs\Mlearn\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Anaconda3\envs\Mlearn\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py", line 198, in
main()
File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py", line 194, in main
launch(args)
File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py", line 179, in launch
run(args)
File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

cn_clip/training/main.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-04-23_12:06:37
host : Jarvis
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 27588)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

@byraid218
Copy link

有个报错是这个
ImportError: cannot import name 'convert_state_dict' from 'cn_clip.clip.model'
这里面有找不到convert_state_dict函数的解答
#185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants