Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch3d versions #9

Closed
Yacovitch opened this issue May 2, 2024 · 21 comments
Closed

pytorch3d versions #9

Yacovitch opened this issue May 2, 2024 · 21 comments

Comments

@Yacovitch
Copy link

Hi whenever I tried to install pytorch3D by pip3 install pytorch3d==0.7.4, it gives following error,

ERROR: Could not find a version that satisfies the requirement pytorch3d==0.7.4 (from versions: 0.1.1, 0.2.0, 0.2.5, 0.3.0) ERROR: No matching distribution found for pytorch3d==0.7.4

Do you have any idea what is the problem?

@nuneslu
Copy link
Collaborator

nuneslu commented May 3, 2024

Hi! Actually, from pytorch3d, we just use the chamfer_distance loss, so you can try with an older version. In case it doesn't work, let me know, and we can try to figure it out.

@Yacovitch
Copy link
Author

Thank you for your reply, I used pytorch3d version 0.3.0, and pip3 does not let me install later versions due to compatibility between Pytorch version. Do you have anyway to work around with it?

@nuneslu
Copy link
Collaborator

nuneslu commented May 3, 2024

You could also install it from source. Which CUDA version are you using? From here you can download it and to install it is the same as to install this repo, by using: pip3 install -e .

@nuneslu
Copy link
Collaborator

nuneslu commented May 3, 2024

One further question, pytorch3d is only used in the refinement network. Did you manage to finish the training for the diffusion part?

@Yacovitch
Copy link
Author

Yacovitch commented May 3, 2024

I am using CUDA version 11.4.

And yes, I was able to finish the training of the diffusion part. Now I am trying to train the refinement network.

I am tried your suggested method, but it gives me error. The error is quite long, so I am attaching only the last part of the error.

`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/nas2/jacob/LiDiff/pytorch3d-0.7.4/setup.py", line 179, in <module>
    "": ["*.json"],
  File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/develop.py", line 114, in install_for_development
    self.run_command('build_ext')
  File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 709, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
    depends=ext.depends)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 539, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1360, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
----------------------------------------

`

@nuneslu
Copy link
Collaborator

nuneslu commented May 3, 2024

Try running pip3 install -e -v .

@Yacovitch
Copy link
Author

It gives me the same error. Is it because I am trying to set up environment on JupyterLab?

@nuneslu
Copy link
Collaborator

nuneslu commented May 3, 2024

The error doesn't seem to provide much information. Could you uninstall ninja and try running the command again? Ninja compile it in multithread and it gets hard to understand exactly where the error is.

@Yacovitch
Copy link
Author

Yacovitch commented May 3, 2024

I created a separate environment and followed the instructions here https://github.com/facebookresearch/pytorch3d/releases/tag/v0.7.1, and it worked. It seems like some dependencies are missing. Also, based on your requirement it seems like pytorch3d 0.7.1 is more suitable.

@nuneslu
Copy link
Collaborator

nuneslu commented May 4, 2024

Glad you managed to make it work. I will also update the requirements.txt since it seems that version 0.7.1 is easier to install with the other dependencies.

@Yacovitch
Copy link
Author

Yacovitch commented May 4, 2024

Hi, I was able to make train_refine.py running!
One additional note is that pytorch-lightning gave me an error, and I fixed it by downgrading setuptools to 58.2.0. I found this solution from pytorch/pytorch#69894

Now I am running train_refine.py, and it looks like it is stuck at Validation sanity check. Is it expected behaviour?

@nuneslu
Copy link
Collaborator

nuneslu commented May 4, 2024

It is not expected. You can try setting num_val_sanity_check=0 in train_refine.py file so it skips the sanity check

@nuneslu
Copy link
Collaborator

nuneslu commented May 4, 2024

Hi again, I figured out the problem with validation getting stuck, I was using the wrong collation function for the refinement net (some mistakes done during the refactoring of the code before releasing it). Sorry about that, you can pull the repo and get the correct version now, with the same collation used in the experiments reported in the paper.

@Yacovitch
Copy link
Author

Yacovitch commented May 5, 2024

It is working now and thank you! I have another question. How long did it take for you to complete each epoch?

@nuneslu
Copy link
Collaborator

nuneslu commented May 6, 2024

In our experiments, we used batch_size: 8 and two GPUs, so it took around 5 hours per epoch. But I rerun it now with batch_size: 2 in one GPU and it is taking around 20h per epoch. It takes long mainly because of the chamfer_distance computation.

@Yacovitch
Copy link
Author

In your config_refine, it looks like the refine network is trained up to 100 epochs. is that the case here?

@nuneslu
Copy link
Collaborator

nuneslu commented May 6, 2024

No, it just needs 5 epochs to get the results from the paper

@nuneslu
Copy link
Collaborator

nuneslu commented May 6, 2024

During development I let it train for longer to compare the results but with 5 epochs was already enough, that is why in the config there is max_epoch: 100

@Yacovitch
Copy link
Author

Thank you very much! Lastly, if I want to check diffusion results without the refine model, which part of the diff_completion_pipeline.py should I look at?

@nuneslu
Copy link
Collaborator

nuneslu commented May 6, 2024

You can use the provided refinement network weights and run the pipeline with your diffusion trained weights. In the pipeline output directory you should see two directories diff/ and refine/ the point clouds in diff/ will be the ones generated without the refinement

@Yacovitch
Copy link
Author

Thank you very much for all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants