pub(super) fn build_chunk_arrays(
all_encodings: &[Encoding],
indices: &[usize],
pad_to: usize,
) -> Result<(Array2<i64>, Array2<i64>)>Expand description
Builds input_ids and attention_mask arrays for a single chunk.
indices selects which encodings from all_encodings belong to this chunk.
pad_to is the chunk-local maximum sequence length; all sequences are
right-padded with pad_id = 1 (XLM-RoBERTa <pad> token).