Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda : clear error after buffer allocation failure #7376

Merged
merged 2 commits into from
May 19, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented May 18, 2024

Buffer allocation should be a recoverable error, but the CUDA error was not cleared, which may cause the next operation to fail.

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this error either occurs when ooming or when there is no CUDA device available.

Copy link
Contributor

github-actions bot commented May 19, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 537 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8696.17ms p(95)=21079.86ms fails=, finish reason: stop=484 truncated=53
  • Prompt processing (pp): avg=100.53tk/s p(95)=472.82tk/s
  • Token generation (tg): avg=32.15tk/s p(95)=45.02tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sl/cudamalloc-clear-error commit=f3803dcc9692623f3200a28ac03917710bc5f711

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 655.88, 655.88, 655.88, 655.88, 655.88, 650.66, 650.66, 650.66, 650.66, 650.66, 649.74, 649.74, 649.74, 649.74, 649.74, 669.21, 669.21, 669.21, 669.21, 669.21, 712.54, 712.54, 712.54, 712.54, 712.54, 712.32, 712.32, 712.32, 712.32, 712.32, 737.67, 737.67, 737.67, 737.67, 737.67, 754.74, 754.74, 754.74, 754.74, 754.74, 771.44, 771.44, 771.44, 771.44, 771.44, 773.11, 773.11, 773.11, 773.11, 773.11, 793.8, 793.8, 793.8, 793.8, 793.8, 792.92, 792.92, 792.92, 792.92, 792.92, 806.23, 806.23, 806.23, 806.23, 806.23, 811.01, 811.01, 811.01, 811.01, 811.01, 828.09, 828.09, 828.09, 828.09, 828.09, 822.58, 822.58, 822.58, 822.58, 822.58, 829.5, 829.5, 829.5, 829.5, 829.5, 826.96, 826.96, 826.96, 826.96, 826.96, 813.63, 813.63, 813.63, 813.63, 813.63, 814.84, 814.84, 814.84, 814.84, 814.84, 822.65, 822.65, 822.65, 822.65, 822.65, 822.11, 822.11, 822.11, 822.11, 822.11, 823.74, 823.74, 823.74, 823.74, 823.74, 821.55, 821.55, 821.55, 821.55, 821.55, 819.79, 819.79, 819.79, 819.79, 819.79, 821.87, 821.87, 821.87, 821.87, 821.87, 832.53, 832.53, 832.53, 832.53, 832.53, 837.01, 837.01, 837.01, 837.01, 837.01, 833.83, 833.83, 833.83, 833.83, 833.83, 834.44, 834.44, 834.44, 834.44, 834.44, 839.72, 839.72, 839.72, 839.72, 839.72, 839.39, 839.39, 839.39, 839.39, 839.39, 838.61, 838.61, 838.61, 838.61, 838.61, 840.74, 840.74, 840.74, 840.74, 840.74, 833.2, 833.2, 833.2, 833.2, 833.2, 837.05, 837.05, 837.05, 837.05, 837.05, 834.53, 834.53, 834.53, 834.53, 834.53, 832.28, 832.28, 832.28, 832.28, 832.28, 832.76, 832.76, 832.76, 832.76, 832.76, 835.6, 835.6, 835.6, 835.6, 835.6, 838.22, 838.22, 838.22, 838.22, 838.22, 843.78, 843.78, 843.78, 843.78, 843.78, 820.92, 820.92, 820.92, 820.92, 820.92, 811.71, 811.71, 811.71, 811.71, 811.71, 811.6, 811.6, 811.6, 811.6, 811.6, 811.04, 811.04, 811.04, 811.04, 811.04, 813.92, 813.92, 813.92, 813.92, 813.92, 815.93, 815.93, 815.93, 815.93, 815.93, 815.58, 815.58, 815.58, 815.58, 815.58, 822.16, 822.16, 822.16, 822.16, 822.16, 821.01, 821.01, 821.01, 821.01, 821.01, 825.85, 825.85, 825.85, 825.85, 825.85, 819.05, 819.05, 819.05, 819.05, 819.05, 823.55, 823.55, 823.55, 823.55, 823.55, 825.37, 825.37, 825.37, 825.37, 825.37, 826.43, 826.43, 826.43, 826.43, 826.43, 826.35, 826.35, 826.35, 826.35, 826.35, 826.73, 826.73, 826.73, 826.73, 826.73, 825.92, 825.92, 825.92, 825.92, 825.92, 827.77, 827.77, 827.77, 827.77, 827.77, 827.78]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.6, 41.6, 41.6, 41.6, 41.6, 33.32, 33.32, 33.32, 33.32, 33.32, 28.81, 28.81, 28.81, 28.81, 28.81, 27.42, 27.42, 27.42, 27.42, 27.42, 28.76, 28.76, 28.76, 28.76, 28.76, 30.7, 30.7, 30.7, 30.7, 30.7, 31.46, 31.46, 31.46, 31.46, 31.46, 32.05, 32.05, 32.05, 32.05, 32.05, 32.58, 32.58, 32.58, 32.58, 32.58, 32.52, 32.52, 32.52, 32.52, 32.52, 32.27, 32.27, 32.27, 32.27, 32.27, 31.89, 31.89, 31.89, 31.89, 31.89, 31.26, 31.26, 31.26, 31.26, 31.26, 31.22, 31.22, 31.22, 31.22, 31.22, 30.39, 30.39, 30.39, 30.39, 30.39, 29.33, 29.33, 29.33, 29.33, 29.33, 28.77, 28.77, 28.77, 28.77, 28.77, 29.14, 29.14, 29.14, 29.14, 29.14, 29.19, 29.19, 29.19, 29.19, 29.19, 29.17, 29.17, 29.17, 29.17, 29.17, 29.18, 29.18, 29.18, 29.18, 29.18, 29.13, 29.13, 29.13, 29.13, 29.13, 29.16, 29.16, 29.16, 29.16, 29.16, 29.35, 29.35, 29.35, 29.35, 29.35, 29.21, 29.21, 29.21, 29.21, 29.21, 29.36, 29.36, 29.36, 29.36, 29.36, 29.54, 29.54, 29.54, 29.54, 29.54, 29.65, 29.65, 29.65, 29.65, 29.65, 29.85, 29.85, 29.85, 29.85, 29.85, 30.2, 30.2, 30.2, 30.2, 30.2, 30.19, 30.19, 30.19, 30.19, 30.19, 30.28, 30.28, 30.28, 30.28, 30.28, 30.33, 30.33, 30.33, 30.33, 30.33, 30.51, 30.51, 30.51, 30.51, 30.51, 30.55, 30.55, 30.55, 30.55, 30.55, 30.36, 30.36, 30.36, 30.36, 30.36, 30.3, 30.3, 30.3, 30.3, 30.3, 29.84, 29.84, 29.84, 29.84, 29.84, 29.98, 29.98, 29.98, 29.98, 29.98, 30.15, 30.15, 30.15, 30.15, 30.15, 30.33, 30.33, 30.33, 30.33, 30.33, 30.39, 30.39, 30.39, 30.39, 30.39, 30.36, 30.36, 30.36, 30.36, 30.36, 30.18, 30.18, 30.18, 30.18, 30.18, 29.93, 29.93, 29.93, 29.93, 29.93, 28.85, 28.85, 28.85, 28.85, 28.85, 28.76, 28.76, 28.76, 28.76, 28.76, 28.74, 28.74, 28.74, 28.74, 28.74, 28.8, 28.8, 28.8, 28.8, 28.8, 28.89, 28.89, 28.89, 28.89, 28.89, 28.94, 28.94, 28.94, 28.94, 28.94, 29.07, 29.07, 29.07, 29.07, 29.07, 29.05, 29.05, 29.05, 29.05, 29.05, 29.06, 29.06, 29.06, 29.06, 29.06, 28.89, 28.89, 28.89, 28.89, 28.89, 28.93, 28.93, 28.93, 28.93, 28.93, 29.04, 29.04, 29.04, 29.04, 29.04, 29.11, 29.11, 29.11, 29.11, 29.11, 29.19, 29.19, 29.19, 29.19, 29.19, 29.24, 29.24, 29.24, 29.24, 29.24, 29.31]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.27, 0.27, 0.27, 0.27, 0.27, 0.4, 0.4, 0.4, 0.4, 0.4, 0.29, 0.29, 0.29, 0.29, 0.29, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.3, 0.3, 0.3, 0.3, 0.3, 0.35, 0.35, 0.35, 0.35, 0.35, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.34, 0.34, 0.34, 0.34, 0.34, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.39, 0.39, 0.39, 0.39, 0.39, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.51, 0.51, 0.51, 0.51, 0.51, 0.6, 0.6, 0.6, 0.6, 0.6, 0.53, 0.53, 0.53, 0.53, 0.53, 0.38, 0.38, 0.38, 0.38, 0.38, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0]
                    

@slaren
Copy link
Collaborator Author

slaren commented May 19, 2024

It should only happen when oom, but the goal is to let the applications load a different model or with fewer layers offloaded without crashing the process or creating more errors.

@slaren slaren merged commit ab33f7a into master May 19, 2024
46 of 52 checks passed
@slaren slaren deleted the sl/cudamalloc-clear-error branch May 19, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants