Skip to main content

build_chunk_arrays

Function build_chunk_arrays 

Source
pub(super) fn build_chunk_arrays(
    all_encodings: &[Encoding],
    indices: &[usize],
    pad_to: usize,
) -> Result<(Array2<i64>, Array2<i64>)>
Expand description

Builds input_ids and attention_mask arrays for a single chunk.

indices selects which encodings from all_encodings belong to this chunk. pad_to is the chunk-local maximum sequence length; all sequences are right-padded with pad_id = 1 (XLM-RoBERTa <pad> token).