Module binpack

Expand description

Bin-packing algorithm that groups tokenized sequences into session.run() calls that each fit within the per-worker workspace budget.

The central type is CostModel, which captures the quadratic memory scaling of BGE-M3 attention and is used by bin_pack to partition an incoming batch into chunks that are safe to run in a single ORT call.