Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about training, please #197

Open
glorioushonor opened this issue Apr 7, 2023 · 4 comments
Open

Some questions about training, please #197

glorioushonor opened this issue Apr 7, 2023 · 4 comments

Comments

@glorioushonor
Copy link

I have trained the normal and implicit networks separately and tested them on the cape data set. In this process, I have some questions to ask and confirm. I sincerely hope to get your help.

  1. The ModelCheckpoint of implicit networks saved after training is in the second epoch, so the last eight epochs are invalid training. Does this reflect the data efficiency of ICON training?
  2. When testing with cape data set, I found that there was no test.txt file, so I changed test150.txt to test.txt, I am not sure if this operation is correct. I found that test150.txt contains 150 models (Easy: 50, Hard: 100), but I found the following lines in the code:
accu_outputs = accumulate(
            outputs,
            rot_num=3,
            split={
                "cape-easy": (0, 50),
                "cape-hard": (50, 100)
            },
        )

So I may have done something wrong here. How many cape models did you use for testing? Also, I would like to know how to extrapolate cape-NC from cape-easy-NC and cape-hard-NC.
3. I also used the pre-training model you provided for testing on the cape data set, and found that the results were not the same every time. I wonder whether this is normal.
4. Your pre-training model is not trained on thuman2.0, I was wondering if you could update the Benchmark (train on THuman2.0, test on CAPE), because you mentioned that testing results are better than the reported results.https://github.com/YuliangXiu/ICON/issues/183#issuecomment-1445002583

@YuliangXiu
Copy link
Owner

YuliangXiu commented Apr 13, 2023

  1. the checkpoint will be automatically saved when the valid error is smallest
  2. You are doing right, I just fixed the bug (02cdce4), thanks for your feedback.
  3. error(cape-NC) = [2.0 * error(cape-easy-NC) + error(cape-hard-NC)] / 3.0
  4. I will update the benchmark later, after updating the correct CAPE, both PaMIR and ICON can achieve 9~8 mm chamfer distance.

@glorioushonor
Copy link
Author

  1. the checkpoint will be automatically saved when the valid error is smallest
  2. You are doing right, I just fixed the bug (02cdce4), thanks for your feedback.
  3. error(cape-NC) = [2.0 * error(cape-easy-NC) + error(cape-hard-NC)] / 3.0
  4. I will update the benchmark later, after updating the correct CAPE, both PaMIR and ICON can achieve 9~8 mm chamfer distance.

Hello, thanks for your kind reply and precise answer. But you seemed to miss my third question. In my experience, the reason for different test results on the same model is random sampling. However, the seed of the random number generator is fixed in the code to generate a random sequence that can be repeated.

    if not cfg.test_mode:
        trainer.fit(model=model, datamodule=datamodule)
        trainer.test(model=model, datamodule=datamodule)
    else:
        **np.random.seed(1993)**
        trainer.test(model=model, datamodule=datamodule)

@YuliangXiu
Copy link
Owner

@glorioushonor

The random seed implementation of PyTorch-Lightning changed frequently, sometimes it only works for trainer but not dataloader.

You can check trainer.html#reproducibility to set the fixed seed for both trainer and dataloader.

@msverma101
Copy link

msverma101 commented Jul 22, 2023

i also cant find the test.txt file not the test150.txt
i downloaded the files for thuman2 datset and followed the process for smplx fits as well.
one other problem is running headless
i keep getting this error

(base) root@3fb49ee304fe:/home# python -m scripts.render_batch -headless -out_dir data/
Start Rendering thuman2 with 36 views, 512x512 size.
Output dir: data//thuman2_36views
Rendering types: ['light', 'normal', 'depth']
  0%|                                                                                                                         | 0/2 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/scripts/render_batch.py", line 268, in <module>
    render_subject( subject=subject,
  File "/home/scripts/render_batch.py", line 86, in render_subject
    rndr_smpl = ColorRender(width=size, height=size, egl=egl)
  File "/home/lib/renderer/gl/color_render.py", line 34, in __init__
    CamRender.__init__(
  File "/home/lib/renderer/gl/cam_render.py", line 33, in __init__
    Render.__init__(
  File "/home/lib/renderer/gl/render.py", line 200, in __init__
    GLUT.glutDisplayFunc(self.display)
  File "/opt/conda/lib/python3.8/site-packages/OpenGL/GLUT/special.py", line 147, in __call__
    contextdata.setValue( self.CONTEXT_DATA_KEY, cCallback )
  File "/opt/conda/lib/python3.8/site-packages/OpenGL/contextdata.py", line 58, in setValue
    context = getContext( context )
  File "/opt/conda/lib/python3.8/site-packages/OpenGL/contextdata.py", line 40, in getContext
    raise error.Error(
OpenGL.error.Error: Attempt to retrieve context when no valid context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants