pub struct CostModel {
pub a: f64,
pub b: f64,
pub max_workspace_bytes: usize,
}Expand description
Quadratic-aware workspace cost model for ONNX attention inference.
BGE-M3 uses multi-head attention whose intermediate tensor footprint scales
as O(batch * seq^2) (attention score matrix) plus O(batch * seq)
(FFN intermediates, projection matrices). The total peak workspace is
approximately:
peak ≈ a * (batch * seq) + b * (batch * seq^2)where a (bytes/token-position) captures the FFN / projection contribution
and b (bytes/token-position^2) captures the attention contribution.
At sequence length 512 attention is small relative to FFN, so a linear
approximation works. At 8192, b * N^2 dominates by ~16×, so using only
a would under-budget by that same factor.
Coefficients are derived at startup by crate::probe or set
conservatively from compile-time defaults when measurement is unavailable.
Fields§
§a: f64Bytes per token-position (linear term: FFN intermediates, projections).
b: f64Bytes per token-position-squared (quadratic term: attention scores).
max_workspace_bytes: usizeMaximum workspace bytes available per worker for a single session.run() call.
Implementations§
Source§impl CostModel
impl CostModel
Sourcepub const CONSERVATIVE_A: f64 = 16_384.0
pub const CONSERVATIVE_A: f64 = 16_384.0
Conservative static defaults calibrated so a (16, 512) chunk lands at
~140 MB workspace — matching the old static budget at the previous default
BGE_M3_ONNX_BATCH_SIZE = 16, MAX_SEQ_LENGTH = 512.
These are used when the probe cannot run (no ORT, no model, macOS without
cgroup support) or when BGE_M3_DISABLE_AUTO_BUDGET is set.
Formula check: 16 KiB/token × 16 × 512 + 8 B/token² × 16 × 512² = 16384 × 8192 + 8 × 16 × 262144 = 134 217 728 + 33 554 432 = 167 772 160 ≈ 160 MB per chunk (workers run sequentially inside one worker).
Sourcepub const CONSERVATIVE_B: f64 = 8.0
pub const CONSERVATIVE_B: f64 = 8.0
Conservative quadratic coefficient (bytes per token-position squared).
Sourcepub const DEFAULT_MAX_WORKSPACE: usize
pub const DEFAULT_MAX_WORKSPACE: usize
Default maximum workspace per worker when memory cannot be detected.
2 GiB is conservatively safe for the Fargate 28 GiB task with 7 workers
(28 GB * 0.7 safety / 7 workers ≈ 2.8 GB); we round down for headroom.
Sourcepub fn conservative(max_workspace_bytes: usize) -> Self
pub fn conservative(max_workspace_bytes: usize) -> Self
Constructs a CostModel with conservative defaults and the given workspace ceiling.
Sourcepub fn chunk_cost(&self, count: usize, max_seq: usize) -> u128
pub fn chunk_cost(&self, count: usize, max_seq: usize) -> u128
Estimated peak workspace (bytes) for a single session.run() call with
count texts and max_seq as the padded sequence length.
Uses saturating arithmetic on u128 to avoid overflow at large inputs.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for CostModel
impl RefUnwindSafe for CostModel
impl Send for CostModel
impl Sync for CostModel
impl Unpin for CostModel
impl UnsafeUnpin for CostModel
impl UnwindSafe for CostModel
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more