pub(super) const PROBE_SHAPES: &[(usize, usize)];Expand description
Shapes swept by the probe.
6 static shapes plus a dynamic (1, max_seq) shape added at runtime for
the quadratic anchor at the configured upper bound:
(1, 64)and(1, 256)anchor the linear term at low seq.(4, 64)sharesx1 = batch*seq = 256with(1, 256)but has a differentx2 = batch*seq² = 16384vs65536, giving a near-direct measurement ofbindependent ofa.(1, 1024)and(1, 2048)provide mid-range leverage.(1, 4096)anchors the quadratic regime.
§Safety against OOM
ORT’s memory arena retains pages across session.run() calls, so
cumulative process RSS grows with each successive probe shape. Three
independent mechanisms keep the sweep within the container’s cgroup limit:
- Arena warm-up at the start of
run_proberuns a(1, 64)session.run()BEFORE the sweep, so the lazy ORT arena initialisation does not appear as a ~1 GB constant offset on every per-shape delta. - Conservative
fits()gate rejects any shape whose per-call workspace estimate exceedsrss_ceiling(the safety-discounted budget). - Absolute-RSS guard rejects any shape whose projected arena growth would push process RSS above 87.5% of the cgroup ceiling, regardless of the conservative model’s estimate.
The dynamic (1, max_seq) shape is added at runtime by run_probe. If
the model variant cannot run at max_seq, the shape is skipped and the
error surfaces on the first real embedding request.
Estimated probe time: ~120 s on aarch64 MLAS fp16 at max_seq=8192.