pub(crate) fn fit_cost_model(data: &[DataPoint]) -> Option<(f64, f64)>Expand description
Fits peak = a * (batch * seq) + b * (batch * seq^2) via ordinary least
squares (no intercept — workspace at batch=0 is 0 by definition).
The design matrix X has columns [batch*seq, batch*seq^2] and the
response y is rss_delta for each observation.
Normalization: columns are scaled to [0, 1] before solving
(ξ1 = x1 / max(x1), ξ2 = x2 / max(x2)). Without this, x2 at
max_seq=8192 exceeds x1 by ~8000×, making the Gram matrix effectively
rank-1 under the naïve det threshold and causing the fit to silently fall
back to conservative defaults despite valid data.
Normal equations solved in normalized space via 2×2 matrix inverse
(Cramer’s rule), then unscaled: a = α / x1_max, b = β / x2_max.
Returns None when:
- Fewer than 2 data points (under-determined system).
x1_maxorx2_maxis zero (degenerate data).- The normalized Gram matrix is nearly singular (det < 1e-6 of max diagonal²).
- Either coefficient is negative (physically impossible workspace).