Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Conda installs a redundant CPU build of torch #1943

Closed
2 tasks done
dagardner-nv opened this issue Oct 14, 2024 · 1 comment · Fixed by #1974
Closed
2 tasks done

[BUG]: Conda installs a redundant CPU build of torch #1943

dagardner-nv opened this issue Oct 14, 2024 · 1 comment · Fixed by #1974
Assignees
Labels
bug Something isn't working

Comments

@dagardner-nv
Copy link
Contributor

Version

24.10

Which installation method(s) does this occur on?

Source

Describe the bug.

We install torch via pip which is how we get the 2.4.0+cu124 version.
I believe the sentence-transformers package is pulling in torchvision which in turn pulls in libtorch.

$ conda list | grep torch
libtorch                  2.4.0           cpu_generic_h4a3044c_1    conda-forge
torch                     2.4.0+cu124              pypi_0    pypi
torchdata                 0.8.0                    pypi_0    pypi
torchvision               0.19.1          cpu_py310hd9679db_0    conda-forge

Minimum reproducible example

CONDA_ALWAYS_YES=true conda env create --solver=libmamba -n morpheus -y --file conda/environments/all_cuda-125_arch-x86_64.yaml

Relevant log output

Click here to see error details

[Paste the error here, it will be hidden by default]

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

  • I agree to follow Morpheus' Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@efajardo-nv
Copy link
Contributor

efajardo-nv commented Oct 22, 2024

@dagardner-nv The sentence-transformers conda install (via examples_cuda-125_arch-x86_64.yaml) in the release container results in the following pytorch packages in the container:

# packages in environment at /opt/conda/envs/morpheus:
#
# Name                    Version                   Build  Channel
libtorch                  2.4.1           cpu_generic_hb3b73e9_0    conda-forge
pytorch                   2.4.1           cpu_generic_py310hcbfaffa_0    conda-forge
torch                     2.4.0+cu124              pypi_0    pypi
torchvision               0.19.1          cpu_py310h0339c84_1    conda-forge

The VDB embedding stage then chooses to use CPU version. Replacing with pip package switches it back to GPU but it's not quite as fast (~3 min vs ~2 min in our example).

@morpheus-bot-test morpheus-bot-test bot moved this from Todo to Review - Ready for Review in Morpheus Boards Oct 24, 2024
rapids-bot bot pushed a commit that referenced this issue Oct 24, 2024
…encies (#1974)

- Update `dependencies.yaml` and re-generate environment yaml's
- Avoids install of `pytorch` cpu packages which causes examples like DFP try to use.

Closes #1943 

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Eli Fajardo (https://github.com/efajardo-nv)
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #1974
@github-project-automation github-project-automation bot moved this from Review - Ready for Review to Done in Morpheus Boards Oct 28, 2024
@dagardner-nv dagardner-nv added this to the 24.10 - Release milestone Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants