Skip to main content

load_tokenizer

Function load_tokenizer 

Source
pub(super) fn load_tokenizer(
    tokenizer_path: &Path,
    max_seq_length: usize,
) -> Result<Tokenizer>
Expand description

Loads and configures the BGE-M3 tokenizer with truncation at max_seq_length but no padding. Padding is applied per-chunk in build_chunk_arrays.