Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml-opencl, llama: using reserve() if count already known #7272

Merged
merged 1 commit into from
May 20, 2024

Conversation

GermanAizek
Copy link
Contributor

It affects a lot ggml_cl_mul_mat_q_f32 function.

@mofosyne mofosyne added refactoring Refactoring review complexity : high Generally require indepth knowledge of LLMs or GPUs labels May 14, 2024
Copy link
Contributor

github-actions bot commented May 14, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 547 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8563.44ms p(95)=20815.89ms fails=, finish reason: stop=478 truncated=69
  • Prompt processing (pp): avg=105.13tk/s p(95)=469.6tk/s
  • Token generation (tg): avg=33.15tk/s p(95)=46.6tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=reserve-vec commit=4ee29e5e1caf29e1bc7b094226faa890ae0e98d6

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 472.06, 472.06, 472.06, 472.06, 472.06, 525.02, 525.02, 525.02, 525.02, 525.02, 550.42, 550.42, 550.42, 550.42, 550.42, 588.73, 588.73, 588.73, 588.73, 588.73, 662.71, 662.71, 662.71, 662.71, 662.71, 665.31, 665.31, 665.31, 665.31, 665.31, 669.78, 669.78, 669.78, 669.78, 669.78, 698.16, 698.16, 698.16, 698.16, 698.16, 709.11, 709.11, 709.11, 709.11, 709.11, 725.4, 725.4, 725.4, 725.4, 725.4, 758.25, 758.25, 758.25, 758.25, 758.25, 770.08, 770.08, 770.08, 770.08, 770.08, 788.8, 788.8, 788.8, 788.8, 788.8, 841.7, 841.7, 841.7, 841.7, 841.7, 837.58, 837.58, 837.58, 837.58, 837.58, 840.07, 840.07, 840.07, 840.07, 840.07, 837.49, 837.49, 837.49, 837.49, 837.49, 853.86, 853.86, 853.86, 853.86, 853.86, 856.06, 856.06, 856.06, 856.06, 856.06, 862.22, 862.22, 862.22, 862.22, 862.22, 861.55, 861.55, 861.55, 861.55, 861.55, 866.24, 866.24, 866.24, 866.24, 866.24, 880.66, 880.66, 880.66, 880.66, 880.66, 882.05, 882.05, 882.05, 882.05, 882.05, 883.99, 883.99, 883.99, 883.99, 883.99, 895.05, 895.05, 895.05, 895.05, 895.05, 890.86, 890.86, 890.86, 890.86, 890.86, 886.13, 886.13, 886.13, 886.13, 886.13, 884.42, 884.42, 884.42, 884.42, 884.42, 888.12, 888.12, 888.12, 888.12, 888.12, 888.61, 888.61, 888.61, 888.61, 888.61, 886.54, 886.54, 886.54, 886.54, 886.54, 883.37, 883.37, 883.37, 883.37, 883.37, 893.4, 893.4, 893.4, 893.4, 893.4, 901.59, 901.59, 901.59, 901.59, 901.59, 909.05, 909.05, 909.05, 909.05, 909.05, 908.93, 908.93, 908.93, 908.93, 908.93, 902.53, 902.53, 902.53, 902.53, 902.53, 901.31, 901.31, 901.31, 901.31, 901.31, 902.46, 902.46, 902.46, 902.46, 902.46, 900.35, 900.35, 900.35, 900.35, 900.35, 893.79, 893.79, 893.79, 893.79, 893.79, 865.16, 865.16, 865.16, 865.16, 865.16, 864.17, 864.17, 864.17, 864.17, 864.17, 861.86, 861.86, 861.86, 861.86, 861.86, 860.66, 860.66, 860.66, 860.66, 860.66, 864.27, 864.27, 864.27, 864.27, 864.27, 866.95, 866.95, 866.95, 866.95, 866.95, 866.3, 866.3, 866.3, 866.3, 866.3, 870.9, 870.9, 870.9, 870.9, 870.9, 870.1, 870.1, 870.1, 870.1, 870.1, 875.37, 875.37, 875.37, 875.37, 875.37, 876.07, 876.07, 876.07, 876.07, 876.07, 874.88, 874.88, 874.88, 874.88, 874.88, 875.38, 875.38, 875.38, 875.38, 875.38, 875.58, 875.58, 875.58, 875.58, 875.58, 875.61, 875.61, 875.61, 875.61, 875.61, 875.77, 875.77, 875.77, 875.77, 875.77, 877.0, 877.0, 877.0, 877.0, 877.0, 877.51, 877.51, 877.51, 877.51, 877.51, 879.33, 879.33, 879.33, 879.33, 879.33, 879.33, 879.33]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.23, 41.23, 41.23, 41.23, 41.23, 42.01, 42.01, 42.01, 42.01, 42.01, 37.81, 37.81, 37.81, 37.81, 37.81, 36.5, 36.5, 36.5, 36.5, 36.5, 36.12, 36.12, 36.12, 36.12, 36.12, 35.43, 35.43, 35.43, 35.43, 35.43, 35.54, 35.54, 35.54, 35.54, 35.54, 36.17, 36.17, 36.17, 36.17, 36.17, 36.33, 36.33, 36.33, 36.33, 36.33, 35.78, 35.78, 35.78, 35.78, 35.78, 35.75, 35.75, 35.75, 35.75, 35.75, 35.62, 35.62, 35.62, 35.62, 35.62, 34.82, 34.82, 34.82, 34.82, 34.82, 34.24, 34.24, 34.24, 34.24, 34.24, 33.15, 33.15, 33.15, 33.15, 33.15, 33.34, 33.34, 33.34, 33.34, 33.34, 33.65, 33.65, 33.65, 33.65, 33.65, 33.46, 33.46, 33.46, 33.46, 33.46, 33.01, 33.01, 33.01, 33.01, 33.01, 32.91, 32.91, 32.91, 32.91, 32.91, 32.8, 32.8, 32.8, 32.8, 32.8, 32.88, 32.88, 32.88, 32.88, 32.88, 32.7, 32.7, 32.7, 32.7, 32.7, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.87, 32.75, 32.75, 32.75, 32.75, 32.75, 32.09, 32.09, 32.09, 32.09, 32.09, 31.87, 31.87, 31.87, 31.87, 31.87, 31.85, 31.85, 31.85, 31.85, 31.85, 32.02, 32.02, 32.02, 32.02, 32.02, 32.16, 32.16, 32.16, 32.16, 32.16, 32.26, 32.26, 32.26, 32.26, 32.26, 32.31, 32.31, 32.31, 32.31, 32.31, 32.33, 32.33, 32.33, 32.33, 32.33, 32.17, 32.17, 32.17, 32.17, 32.17, 32.01, 32.01, 32.01, 32.01, 32.01, 31.66, 31.66, 31.66, 31.66, 31.66, 31.64, 31.64, 31.64, 31.64, 31.64, 31.77, 31.77, 31.77, 31.77, 31.77, 31.96, 31.96, 31.96, 31.96, 31.96, 31.98, 31.98, 31.98, 31.98, 31.98, 32.12, 32.12, 32.12, 32.12, 32.12, 31.94, 31.94, 31.94, 31.94, 31.94, 31.28, 31.28, 31.28, 31.28, 31.28, 31.21, 31.21, 31.21, 31.21, 31.21, 30.23, 30.23, 30.23, 30.23, 30.23, 29.92, 29.92, 29.92, 29.92, 29.92, 29.95, 29.95, 29.95, 29.95, 29.95, 30.1, 30.1, 30.1, 30.1, 30.1, 30.13, 30.13, 30.13, 30.13, 30.13, 30.23, 30.23, 30.23, 30.23, 30.23, 30.3, 30.3, 30.3, 30.3, 30.3, 30.28, 30.28, 30.28, 30.28, 30.28, 30.07, 30.07, 30.07, 30.07, 30.07, 30.05, 30.05, 30.05, 30.05, 30.05, 30.04, 30.04, 30.04, 30.04, 30.04, 30.2, 30.2, 30.2, 30.2, 30.2, 30.33, 30.33, 30.33, 30.33, 30.33, 30.39, 30.39, 30.39, 30.39, 30.39, 30.45, 30.45, 30.45, 30.45, 30.45, 30.55, 30.55, 30.55, 30.55, 30.55, 30.58, 30.58]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.32, 0.32, 0.32, 0.32, 0.32, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.33, 0.33, 0.33, 0.33, 0.33, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.38, 0.38, 0.38, 0.38, 0.38, 0.36, 0.36, 0.36, 0.36, 0.36, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.16, 0.16, 0.16, 0.16, 0.16, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.34, 0.34, 0.34, 0.34, 0.34, 0.56, 0.56, 0.56, 0.56, 0.56, 0.62, 0.62, 0.62, 0.62, 0.62, 0.48, 0.48, 0.48, 0.48, 0.48, 0.42, 0.42, 0.42, 0.42, 0.42, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.24, 0.24, 0.24, 0.24, 0.24, 0.27, 0.27, 0.27, 0.27, 0.27, 0.11, 0.11, 0.11, 0.11, 0.11, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 547 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716181689 --> 1716182321
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0]
                    

llama.cpp Outdated
@@ -6116,6 +6116,7 @@ static bool llm_load_tensors(
mlock_buf->init (ggml_backend_buffer_get_base(buf));
mlock_buf->grow_to(ggml_backend_buffer_get_size(buf));
}
bufs.reserve(ml.files.size());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already reserved on line 6060

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it 4ee29e5

ggml-opencl.cpp Outdated
int64_t i12 = i02 * r2;
int64_t e12 = i12 + r2;
events.reserve(e12 - i12);
while (i12 < e12) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to keep the for loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it 4ee29e5

@GermanAizek GermanAizek marked this pull request as ready for review May 20, 2024 02:25
@ggerganov ggerganov merged commit 213e90e into ggerganov:master May 20, 2024
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring review complexity : high Generally require indepth knowledge of LLMs or GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants