Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot work with lambdalabs gpu #1612

Open
abeatbeyondlab opened this issue Apr 9, 2024 · 5 comments
Open

cannot work with lambdalabs gpu #1612

abeatbeyondlab opened this issue Apr 9, 2024 · 5 comments

Comments

@abeatbeyondlab
Copy link

abeatbeyondlab commented Apr 9, 2024

I am following this tutorial
https://replicate.com/docs/guides/get-a-gpu-machine

I run
sudo cog predict r8.im/stability-ai/stable-diffusion@sha256:ac732df83cea7fff18b8472768c88ad041fa750ff7682a21affe81863cbe77e4 -i prompt="a pot of gold"

And getting the following error :


Starting Docker image r8.im/stability-ai/stable-diffusion@sha256:ac732df83cea7fff18b8472768c88ad041fa750ff7682a21affe81863cbe77e4 and running setup()...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py", line 354, in <module>
    app = create_app(
          ^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py", line 71, in create_app
    predictor = load_predictor_from_ref(predictor_ref)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/predictor.py", line 155, in load_predictor_from_ref
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/src/predict.py", line 17, in <module>
    from dynamic_sd.src.pipeline_stable_diffusion_ait_alt import StableDiffusionAITPipeline
  File "/src/dynamic_sd/src/pipeline_stable_diffusion_ait_alt.py", line 40, in <module>
    from .compile_lib.compile_vae_alt import map_vae
  File "/src/dynamic_sd/src/compile_lib/compile_vae_alt.py", line 21, in <module>
    from ..modeling.vae import AutoencoderKL as ait_AutoencoderKL
  File "/src/dynamic_sd/src/modeling/vae.py", line 22, in <module>
    from .unet_blocks import get_up_block, UNetMidBlock2D
  File "/src/dynamic_sd/src/modeling/unet_blocks.py", line 36, in <module>
    from .clip import SpatialTransformer
  File "/src/dynamic_sd/src/modeling/clip.py", line 24, in <module>
    USE_CUDA = detect_target().name() == "cuda"
               ^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/aitemplate/testing/detect_target.py", line 132, in detect_target
    raise RuntimeError("Unsupported platform")
RuntimeError: Unsupported platform
ⅹ Failed to get container status: exit status 1

image

any feedback?

@ReadCommitPush
Copy link

i am getting the same issue

@abeatbeyondlab
Copy link
Author

any feedback on this ?

@Jordan-Lambda
Copy link

Hi,

I am an SWE working for Lambda, and I decided to look into this problem. I know next to nothing about cog, and following the directions linked in the original report I can confirm that I can reproduce the problem.

I did find that the following steps on a freshly launched instance successfully generated a file output.0.png though:

  1. git clone https://github.com/replicate/cog-stable-diffusion.git
  2. cd cog-stable-diffusion/
  3. sudo cog run script/download-weights && clear (output from the script left my terminal in a bad state, hence the clear)
  4. sudo cog predict -i prompt="a pot of gold"

Is the version of CUDA provided by Lambda Stack not supported? I ask because the first line of output from that last command is the following:
⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 1.13.0. This might cause CUDA problems.

Note that I don't know where the "CUDA 11.8" is coming from:

Mon May  6 23:53:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10                     On  | 00000000:08:00.0 Off |                    0 |
|  0%   36C    P8              16W / 150W |      3MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

If there is anything that I can do to help troubleshoot this, or if there's a change to our on-demand VM base image that might prevent this in the future, please let me know.

@abeatbeyondlab
Copy link
Author

no news on this from replicate team?

@alessandromorandi
Copy link

alessandromorandi commented May 23, 2024

hey I have the same issue here! any news?
@Jordan-Lambda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants