ggml-opencl, llama: using reserve() if count already known #7272

GermanAizek · 2024-05-14T01:35:01Z

It affects a lot ggml_cl_mul_mat_q_f32 function.

github-actions · 2024-05-14T02:04:31Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 547 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8563.44ms p(95)=20815.89ms fails=, finish reason: stop=478 truncated=69
Prompt processing (pp): avg=105.13tk/s p(95)=469.6tk/s
Token generation (tg): avg=33.15tk/s p(95)=46.6tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=reserve-vec commit=4ee29e5e1caf29e1bc7b094226faa890ae0e98d6

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 472.06, 472.06, 472.06, 472.06, 472.06, 525.02, 525.02, 525.02, 525.02, 525.02, 550.42, 550.42, 550.42, 550.42, 550.42, 588.73, 588.73, 588.73, 588.73, 588.73, 662.71, 662.71, 662.71, 662.71, 662.71, 665.31, 665.31, 665.31, 665.31, 665.31, 669.78, 669.78, 669.78, 669.78, 669.78, 698.16, 698.16, 698.16, 698.16, 698.16, 709.11, 709.11, 709.11, 709.11, 709.11, 725.4, 725.4, 725.4, 725.4, 725.4, 758.25, 758.25, 758.25, 758.25, 758.25, 770.08, 770.08, 770.08, 770.08, 770.08, 788.8, 788.8, 788.8, 788.8, 788.8, 841.7, 841.7, 841.7, 841.7, 841.7, 837.58, 837.58, 837.58, 837.58, 837.58, 840.07, 840.07, 840.07, 840.07, 840.07, 837.49, 837.49, 837.49, 837.49, 837.49, 853.86, 853.86, 853.86, 853.86, 853.86, 856.06, 856.06, 856.06, 856.06, 856.06, 862.22, 862.22, 862.22, 862.22, 862.22, 861.55, 861.55, 861.55, 861.55, 861.55, 866.24, 866.24, 866.24, 866.24, 866.24, 880.66, 880.66, 880.66, 880.66, 880.66, 882.05, 882.05, 882.05, 882.05, 882.05, 883.99, 883.99, 883.99, 883.99, 883.99, 895.05, 895.05, 895.05, 895.05, 895.05, 890.86, 890.86, 890.86, 890.86, 890.86, 886.13, 886.13, 886.13, 886.13, 886.13, 884.42, 884.42, 884.42, 884.42, 884.42, 888.12, 888.12, 888.12, 888.12, 888.12, 888.61, 888.61, 888.61, 888.61, 888.61, 886.54, 886.54, 886.54, 886.54, 886.54, 883.37, 883.37, 883.37, 883.37, 883.37, 893.4, 893.4, 893.4, 893.4, 893.4, 901.59, 901.59, 901.59, 901.59, 901.59, 909.05, 909.05, 909.05, 909.05, 909.05, 908.93, 908.93, 908.93, 908.93, 908.93, 902.53, 902.53, 902.53, 902.53, 902.53, 901.31, 901.31, 901.31, 901.31, 901.31, 902.46, 902.46, 902.46, 902.46, 902.46, 900.35, 900.35, 900.35, 900.35, 900.35, 893.79, 893.79, 893.79, 893.79, 893.79, 865.16, 865.16, 865.16, 865.16, 865.16, 864.17, 864.17, 864.17, 864.17, 864.17, 861.86, 861.86, 861.86, 861.86, 861.86, 860.66, 860.66, 860.66, 860.66, 860.66, 864.27, 864.27, 864.27, 864.27, 864.27, 866.95, 866.95, 866.95, 866.95, 866.95, 866.3, 866.3, 866.3, 866.3, 866.3, 870.9, 870.9, 870.9, 870.9, 870.9, 870.1, 870.1, 870.1, 870.1, 870.1, 875.37, 875.37, 875.37, 875.37, 875.37, 876.07, 876.07, 876.07, 876.07, 876.07, 874.88, 874.88, 874.88, 874.88, 874.88, 875.38, 875.38, 875.38, 875.38, 875.38, 875.58, 875.58, 875.58, 875.58, 875.58, 875.61, 875.61, 875.61, 875.61, 875.61, 875.77, 875.77, 875.77, 875.77, 875.77, 877.0, 877.0, 877.0, 877.0, 877.0, 877.51, 877.51, 877.51, 877.51, 877.51, 879.33, 879.33, 879.33, 879.33, 879.33, 879.33, 879.33]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.23, 41.23, 41.23, 41.23, 41.23, 42.01, 42.01, 42.01, 42.01, 42.01, 37.81, 37.81, 37.81, 37.81, 37.81, 36.5, 36.5, 36.5, 36.5, 36.5, 36.12, 36.12, 36.12, 36.12, 36.12, 35.43, 35.43, 35.43, 35.43, 35.43, 35.54, 35.54, 35.54, 35.54, 35.54, 36.17, 36.17, 36.17, 36.17, 36.17, 36.33, 36.33, 36.33, 36.33, 36.33, 35.78, 35.78, 35.78, 35.78, 35.78, 35.75, 35.75, 35.75, 35.75, 35.75, 35.62, 35.62, 35.62, 35.62, 35.62, 34.82, 34.82, 34.82, 34.82, 34.82, 34.24, 34.24, 34.24, 34.24, 34.24, 33.15, 33.15, 33.15, 33.15, 33.15, 33.34, 33.34, 33.34, 33.34, 33.34, 33.65, 33.65, 33.65, 33.65, 33.65, 33.46, 33.46, 33.46, 33.46, 33.46, 33.01, 33.01, 33.01, 33.01, 33.01, 32.91, 32.91, 32.91, 32.91, 32.91, 32.8, 32.8, 32.8, 32.8, 32.8, 32.88, 32.88, 32.88, 32.88, 32.88, 32.7, 32.7, 32.7, 32.7, 32.7, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.75, 32.75, 32.75, 32.75, 32.75, 32.09, 32.09, 32.09, 32.09, 32.09, 31.87, 31.87, 31.87, 31.87, 31.87, 31.85, 31.85, 31.85, 31.85, 31.85, 32.02, 32.02, 32.02, 32.02, 32.02, 32.16, 32.16, 32.16, 32.16, 32.16, 32.26, 32.26, 32.26, 32.26, 32.26, 32.31, 32.31, 32.31, 32.31, 32.31, 32.33, 32.33, 32.33, 32.33, 32.33, 32.17, 32.17, 32.17, 32.17, 32.17, 32.01, 32.01, 32.01, 32.01, 32.01, 31.66, 31.66, 31.66, 31.66, 31.66, 31.64, 31.64, 31.64, 31.64, 31.64, 31.77, 31.77, 31.77, 31.77, 31.77, 31.96, 31.96, 31.96, 31.96, 31.96, 31.98, 31.98, 31.98, 31.98, 31.98, 32.12, 32.12, 32.12, 32.12, 32.12, 31.94, 31.94, 31.94, 31.94, 31.94, 31.28, 31.28, 31.28, 31.28, 31.28, 31.21, 31.21, 31.21, 31.21, 31.21, 30.23, 30.23, 30.23, 30.23, 30.23, 29.92, 29.92, 29.92, 29.92, 29.92, 29.95, 29.95, 29.95, 29.95, 29.95, 30.1, 30.1, 30.1, 30.1, 30.1, 30.13, 30.13, 30.13, 30.13, 30.13, 30.23, 30.23, 30.23, 30.23, 30.23, 30.3, 30.3, 30.3, 30.3, 30.3, 30.28, 30.28, 30.28, 30.28, 30.28, 30.07, 30.07, 30.07, 30.07, 30.07, 30.05, 30.05, 30.05, 30.05, 30.05, 30.04, 30.04, 30.04, 30.04, 30.04, 30.2, 30.2, 30.2, 30.2, 30.2, 30.33, 30.33, 30.33, 30.33, 30.33, 30.39, 30.39, 30.39, 30.39, 30.39, 30.45, 30.45, 30.45, 30.45, 30.45, 30.55, 30.55, 30.55, 30.55, 30.55, 30.58, 30.58]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.32, 0.32, 0.32, 0.32, 0.32, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.33, 0.33, 0.33, 0.33, 0.33, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.38, 0.38, 0.38, 0.38, 0.38, 0.36, 0.36, 0.36, 0.36, 0.36, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.16, 0.16, 0.16, 0.16, 0.16, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.34, 0.34, 0.34, 0.34, 0.34, 0.56, 0.56, 0.56, 0.56, 0.56, 0.62, 0.62, 0.62, 0.62, 0.62, 0.48, 0.48, 0.48, 0.48, 0.48, 0.42, 0.42, 0.42, 0.42, 0.42, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.24, 0.24, 0.24, 0.24, 0.24, 0.27, 0.27, 0.27, 0.27, 0.27, 0.11, 0.11, 0.11, 0.11, 0.11, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0]

ggerganov · 2024-05-14T07:01:47Z

llama.cpp

@@ -6116,6 +6116,7 @@ static bool llm_load_tensors(
                mlock_buf->init   (ggml_backend_buffer_get_base(buf));
                mlock_buf->grow_to(ggml_backend_buffer_get_size(buf));
            }
+            bufs.reserve(ml.files.size());


Already reserved on line 6060

fix it 4ee29e5

ggerganov · 2024-05-14T07:01:59Z

ggml-opencl.cpp

+                int64_t i12 = i02 * r2;
+                int64_t e12 = i12 + r2;
+                events.reserve(e12 - i12);
+                while (i12 < e12) {


Better to keep the for loop

fix it 4ee29e5

mofosyne added refactoring Refactoring review complexity : high Generally require indepth knowledge of LLMs or GPUs labels May 14, 2024

ggerganov reviewed May 14, 2024

View reviewed changes

mofosyne marked this pull request as draft May 14, 2024 07:32

GermanAizek force-pushed the reserve-vec branch from f5aef46 to 4ee29e5 Compare May 20, 2024 02:25

ggml-opencl, llama: using reserve() if count already known

4ee29e5

GermanAizek marked this pull request as ready for review May 20, 2024 02:25

ggerganov merged commit 213e90e into ggerganov:master May 20, 2024
66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-opencl, llama: using reserve() if count already known #7272

ggml-opencl, llama: using reserve() if count already known #7272

GermanAizek commented May 14, 2024

github-actions bot commented May 14, 2024 •

edited

ggerganov May 14, 2024

GermanAizek May 20, 2024

ggerganov May 14, 2024

GermanAizek May 20, 2024

ggml-opencl, llama: using reserve() if count already known #7272

ggml-opencl, llama: using reserve() if count already known #7272

Conversation

GermanAizek commented May 14, 2024

github-actions bot commented May 14, 2024 • edited

ggerganov May 14, 2024

Choose a reason for hiding this comment

GermanAizek May 20, 2024

Choose a reason for hiding this comment

ggerganov May 14, 2024

Choose a reason for hiding this comment

GermanAizek May 20, 2024

Choose a reason for hiding this comment

github-actions bot commented May 14, 2024 •

edited