Skip to main content

PROBE_SHAPES

Constant PROBE_SHAPES 

Source
pub(super) const PROBE_SHAPES: &[(usize, usize)];
Expand description

Shapes swept by the probe.

6 static shapes plus a dynamic (1, max_seq) shape added at runtime for the quadratic anchor at the configured upper bound:

  • (1, 64) and (1, 256) anchor the linear term at low seq.
  • (4, 64) shares x1 = batch*seq = 256 with (1, 256) but has a different x2 = batch*seq² = 16384 vs 65536, giving a near-direct measurement of b independent of a.
  • (1, 1024) and (1, 2048) provide mid-range leverage.
  • (1, 4096) anchors the quadratic regime.

§Safety against OOM

ORT’s memory arena retains pages across session.run() calls, so cumulative process RSS grows with each successive probe shape. Three independent mechanisms keep the sweep within the container’s cgroup limit:

  1. Arena warm-up at the start of run_probe runs a (1, 64) session.run() BEFORE the sweep, so the lazy ORT arena initialisation does not appear as a ~1 GB constant offset on every per-shape delta.
  2. Conservative fits() gate rejects any shape whose per-call workspace estimate exceeds rss_ceiling (the safety-discounted budget).
  3. Absolute-RSS guard rejects any shape whose projected arena growth would push process RSS above 87.5% of the cgroup ceiling, regardless of the conservative model’s estimate.

The dynamic (1, max_seq) shape is added at runtime by run_probe. If the model variant cannot run at max_seq, the shape is skipped and the error surfaces on the first real embedding request.

Estimated probe time: ~120 s on aarch64 MLAS fp16 at max_seq=8192.