Skip to main content

synthesize_texts

Function synthesize_texts 

Source
pub(super) fn synthesize_texts(
    corpus: &[String],
    batch: usize,
    target_seq: usize,
) -> Vec<String>
Expand description

Synthesizes batch texts each of approximately target_seq tokens.

Token estimation: ~4 chars/token for natural English text. We repeat/trim corpus texts to hit the target character count.