Function synthesize_texts

pub(super) fn synthesize_texts(
    corpus: &[String],
    batch: usize,
    target_seq: usize,
) -> Vec<String>

Expand description

Synthesizes batch texts each of approximately target_seq tokens.

Token estimation: ~4 chars/token for natural English text. We repeat/trim corpus texts to hit the target character count.

synthesize_texts