Hugging Face Fast Tokenizer

Specification

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("llm-jp/llm-jp-13b-v1.0")

The tokenizer configuration files are bundled with LLM-jp models which are distributed from Hugging Face Hub
The tokenizer can be instantiated using the usual method AutoTokenizer.from_pretrained(model_name_or_path)
The minimum set of HF tokenizer is placed in /hf/ver2.1/code10k_en20k_ja30k.ver2.1_hf_fast

from sentencepiece import SentencePieceProcessor
sp = SentencePieceProcessor("models/ver2.1/code10k_en20k_ja30k.ver2.1.model")