Tokenizer SPM fixes for phi-3 and llama-spm #7375

jaime-m-p · 2024-05-18T22:34:17Z

Modifications to make SPM tokenizer match AutoTokenizer.

Tested with vocabs from models phi-3 and llama-spm.

The file 'added_tokens.json' does not exist for phi-3 or llama-spm. Read from 'tokenizer_config.json'. Then read from 'tokenizer.json'.

github-actions · 2024-05-19T13:17:36Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 534 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8766.06ms p(95)=22403.73ms fails=, finish reason: stop=481 truncated=53
Prompt processing (pp): avg=106.84tk/s p(95)=485.61tk/s
Token generation (tg): avg=34.39tk/s p(95)=48.1tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=tokenizer-spm-fixes commit=7fb66eb58cdd417552c55d992e17c49e4af22440

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 534 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716193882 --> 1716194506
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 409.5, 409.5, 409.5, 409.5, 409.5, 598.29, 598.29, 598.29, 598.29, 598.29, 602.0, 602.0, 602.0, 602.0, 602.0, 627.62, 627.62, 627.62, 627.62, 627.62, 702.45, 702.45, 702.45, 702.45, 702.45, 713.04, 713.04, 713.04, 713.04, 713.04, 716.02, 716.02, 716.02, 716.02, 716.02, 734.48, 734.48, 734.48, 734.48, 734.48, 750.43, 750.43, 750.43, 750.43, 750.43, 766.9, 766.9, 766.9, 766.9, 766.9, 769.26, 769.26, 769.26, 769.26, 769.26, 795.51, 795.51, 795.51, 795.51, 795.51, 835.87, 835.87, 835.87, 835.87, 835.87, 842.15, 842.15, 842.15, 842.15, 842.15, 864.4, 864.4, 864.4, 864.4, 864.4, 869.99, 869.99, 869.99, 869.99, 869.99, 875.37, 875.37, 875.37, 875.37, 875.37, 868.98, 868.98, 868.98, 868.98, 868.98, 867.15, 867.15, 867.15, 867.15, 867.15, 837.08, 837.08, 837.08, 837.08, 837.08, 835.58, 835.58, 835.58, 835.58, 835.58, 830.42, 830.42, 830.42, 830.42, 830.42, 835.25, 835.25, 835.25, 835.25, 835.25, 838.95, 838.95, 838.95, 838.95, 838.95, 859.0, 859.0, 859.0, 859.0, 859.0, 844.42, 844.42, 844.42, 844.42, 844.42, 845.54, 845.54, 845.54, 845.54, 845.54, 845.11, 845.11, 845.11, 845.11, 845.11, 842.89, 842.89, 842.89, 842.89, 842.89, 842.16, 842.16, 842.16, 842.16, 842.16, 842.66, 842.66, 842.66, 842.66, 842.66, 846.34, 846.34, 846.34, 846.34, 846.34, 845.96, 845.96, 845.96, 845.96, 845.96, 847.33, 847.33, 847.33, 847.33, 847.33, 854.6, 854.6, 854.6, 854.6, 854.6, 854.5, 854.5, 854.5, 854.5, 854.5, 862.15, 862.15, 862.15, 862.15, 862.15, 861.78, 861.78, 861.78, 861.78, 861.78, 858.07, 858.07, 858.07, 858.07, 858.07, 858.2, 858.2, 858.2, 858.2, 858.2, 862.17, 862.17, 862.17, 862.17, 862.17, 861.13, 861.13, 861.13, 861.13, 861.13, 838.35, 838.35, 838.35, 838.35, 838.35, 842.88, 842.88, 842.88, 842.88, 842.88, 840.86, 840.86, 840.86, 840.86, 840.86, 839.39, 839.39, 839.39, 839.39, 839.39, 838.16, 838.16, 838.16, 838.16, 838.16, 843.28, 843.28, 843.28, 843.28, 843.28, 842.19, 842.19, 842.19, 842.19, 842.19, 841.69, 841.69, 841.69, 841.69, 841.69, 846.22, 846.22, 846.22, 846.22, 846.22, 845.59, 845.59, 845.59, 845.59, 845.59, 849.09, 849.09, 849.09, 849.09, 849.09, 850.29, 850.29, 850.29, 850.29, 850.29, 851.68, 851.68, 851.68, 851.68, 851.68, 851.08, 851.08, 851.08, 851.08, 851.08, 856.95, 856.95, 856.95, 856.95, 856.95, 858.09, 858.09, 858.09, 858.09, 858.09, 858.59, 858.59, 858.59, 858.59, 858.59, 859.78, 859.78, 859.78, 859.78, 859.78, 861.95, 861.95, 861.95]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 534 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716193882 --> 1716194506
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 44.39, 44.39, 44.39, 44.39, 44.39, 44.39, 44.39, 44.39, 44.39, 44.39, 28.65, 28.65, 28.65, 28.65, 28.65, 29.14, 29.14, 29.14, 29.14, 29.14, 30.44, 30.44, 30.44, 30.44, 30.44, 30.32, 30.32, 30.32, 30.32, 30.32, 31.02, 31.02, 31.02, 31.02, 31.02, 31.76, 31.76, 31.76, 31.76, 31.76, 32.22, 32.22, 32.22, 32.22, 32.22, 32.4, 32.4, 32.4, 32.4, 32.4, 32.32, 32.32, 32.32, 32.32, 32.32, 32.41, 32.41, 32.41, 32.41, 32.41, 31.92, 31.92, 31.92, 31.92, 31.92, 31.51, 31.51, 31.51, 31.51, 31.51, 31.42, 31.42, 31.42, 31.42, 31.42, 30.14, 30.14, 30.14, 30.14, 30.14, 29.14, 29.14, 29.14, 29.14, 29.14, 28.99, 28.99, 28.99, 28.99, 28.99, 29.19, 29.19, 29.19, 29.19, 29.19, 29.4, 29.4, 29.4, 29.4, 29.4, 28.94, 28.94, 28.94, 28.94, 28.94, 28.71, 28.71, 28.71, 28.71, 28.71, 28.57, 28.57, 28.57, 28.57, 28.57, 28.75, 28.75, 28.75, 28.75, 28.75, 29.03, 29.03, 29.03, 29.03, 29.03, 28.86, 28.86, 28.86, 28.86, 28.86, 29.36, 29.36, 29.36, 29.36, 29.36, 29.35, 29.35, 29.35, 29.35, 29.35, 29.31, 29.31, 29.31, 29.31, 29.31, 29.43, 29.43, 29.43, 29.43, 29.43, 29.73, 29.73, 29.73, 29.73, 29.73, 29.79, 29.79, 29.79, 29.79, 29.79, 29.95, 29.95, 29.95, 29.95, 29.95, 30.21, 30.21, 30.21, 30.21, 30.21, 30.27, 30.27, 30.27, 30.27, 30.27, 30.2, 30.2, 30.2, 30.2, 30.2, 30.14, 30.14, 30.14, 30.14, 30.14, 29.97, 29.97, 29.97, 29.97, 29.97, 29.54, 29.54, 29.54, 29.54, 29.54, 29.65, 29.65, 29.65, 29.65, 29.65, 29.89, 29.89, 29.89, 29.89, 29.89, 29.99, 29.99, 29.99, 29.99, 29.99, 30.09, 30.09, 30.09, 30.09, 30.09, 29.81, 29.81, 29.81, 29.81, 29.81, 29.45, 29.45, 29.45, 29.45, 29.45, 29.43, 29.43, 29.43, 29.43, 29.43, 28.32, 28.32, 28.32, 28.32, 28.32, 28.27, 28.27, 28.27, 28.27, 28.27, 28.26, 28.26, 28.26, 28.26, 28.26, 28.25, 28.25, 28.25, 28.25, 28.25, 28.26, 28.26, 28.26, 28.26, 28.26, 28.31, 28.31, 28.31, 28.31, 28.31, 28.36, 28.36, 28.36, 28.36, 28.36, 28.39, 28.39, 28.39, 28.39, 28.39, 28.31, 28.31, 28.31, 28.31, 28.31, 28.25, 28.25, 28.25, 28.25, 28.25, 28.23, 28.23, 28.23, 28.23, 28.23, 28.33, 28.33, 28.33, 28.33, 28.33, 28.5, 28.5, 28.5, 28.5, 28.5, 28.63, 28.63, 28.63, 28.63, 28.63, 28.65, 28.65, 28.65]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 534 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716193882 --> 1716194506
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.14, 0.14, 0.14, 0.14, 0.14, 0.42, 0.42, 0.42, 0.42, 0.42, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.27, 0.27, 0.27, 0.27, 0.27, 0.23, 0.23, 0.23, 0.23, 0.23, 0.41, 0.41, 0.41, 0.41, 0.41, 0.34, 0.34, 0.34, 0.34, 0.34, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.33, 0.33, 0.33, 0.33, 0.33, 0.39, 0.39, 0.39, 0.39, 0.39, 0.18, 0.18, 0.18, 0.18, 0.18, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.36, 0.36, 0.36, 0.36, 0.36, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.3, 0.3, 0.3, 0.3, 0.3, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.36, 0.36, 0.36, 0.36, 0.36, 0.39, 0.39, 0.39, 0.39, 0.39, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.36, 0.36, 0.36, 0.36, 0.36, 0.65, 0.65, 0.65, 0.65, 0.65, 0.54, 0.54, 0.54, 0.54, 0.54, 0.56, 0.56, 0.56, 0.56, 0.56, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.23, 0.23, 0.23, 0.23, 0.23, 0.25, 0.25, 0.25, 0.25, 0.25, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 534 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716193882 --> 1716194506
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0]

llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov · 2024-05-20T06:25:20Z

The server tests are now failing:

 Then 8 tokens are predicted matching (read|going)+                        # features/steps/steps.py:254
      Assertion Failed: /(read|going)+/ must match ``` pretty. E```
      Captured stderr:

Seems like the tokenizer fixes have changed the contents of the generated text and the regex might need some adjusting

jaime-m-p · 2024-05-21T00:03:28Z

Seems like the tokenizer fixes have changed the contents of the generated text and the regex might need some adjusting

Oh, sorry. I found the problem.
When calling tokenizer(add_special=True), the pre-inserted BOS needs rtrim too.

prompt = "Write a joke about AI from a very long prompt which will not be truncated"


Before	After
len: 46 1 -> '<s>' 410 -> ' ' 448 -> 'W' 420 -> 'r' 275 -> 'it' 411 -> 'e' 261 -> ' a' 410 -> ' ' ...	len: 45 1 -> '<s>' ERROR 448 -> 'W' 420 -> 'r' 275 -> 'it' 411 -> 'e' 261 -> ' a' 410 -> ' ' ...

I'm fixing...

jaime-m-p added 3 commits May 19, 2024 00:13

Update brute force test: special tokens

04aad94

Fix added tokens

5b61c04

The file 'added_tokens.json' does not exist for phi-3 or llama-spm. Read from 'tokenizer_config.json'. Then read from 'tokenizer.json'.

Fix special tokens rtrim

dd0d159

github-actions bot added testing Everything test related python python script changes labels May 18, 2024

Type fix

a46dfcf

ggerganov approved these changes May 19, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

llama.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

0ae2860

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 20, 2024

github-actions bot added examples server labels May 20, 2024

server : fix test regexes

7fb66eb

ggerganov force-pushed the tokenizer-spm-fixes branch from a0704bf to 7fb66eb Compare May 20, 2024 07:27

jaime-m-p merged commit 917dc8c into ggerganov:master May 20, 2024
77 checks passed

jaime-m-p mentioned this pull request May 21, 2024

Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) #7425

Merged

ggerganov mentioned this pull request May 21, 2024

Add Smaug 70B support to conversion #7402

Merged

giladgd mentioned this pull request May 24, 2024

Old GGUF have broken tokenization and there is no warning #7476

Open

giladgd mentioned this pull request Jun 2, 2024

fix: don't add space after special tokens in SPM #7697

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizer SPM fixes for phi-3 and llama-spm #7375

Tokenizer SPM fixes for phi-3 and llama-spm #7375

jaime-m-p commented May 18, 2024

github-actions bot commented May 19, 2024 •

edited

ggerganov commented May 20, 2024

jaime-m-p commented May 21, 2024

Tokenizer SPM fixes for phi-3 and llama-spm #7375

Tokenizer SPM fixes for phi-3 and llama-spm #7375

Conversation

jaime-m-p commented May 18, 2024

github-actions bot commented May 19, 2024 • edited

ggerganov commented May 20, 2024

jaime-m-p commented May 21, 2024

github-actions bot commented May 19, 2024 •

edited