pub struct AppState {
pub pool: EmbedPool,
pub ready: AtomicBool,
pub max_batch: usize,
pub total_workers: usize,
pub max_seq_length: usize,
pub tuning: OnceLock<TuningInfo>,
pub cost_model: Arc<ArcSwap<CostModel>>,
pub probe_status: AtomicU8,
pub request_permits: Arc<Semaphore>,
}Expand description
Shared application state injected into every request handler via [axum::extract::State].
Fields§
§pool: EmbedPoolThe embedding worker pool. Handles dense and sparse embedding requests.
ready: AtomicBoolAtomic flag set to true once model warm-up and readiness probes complete.
Handlers check this before dispatching to the pool to return 503
while models are still loading.
max_batch: usizeMaximum batch size enforced by the handler layer.
total_workers: usizeTotal number of workers configured at startup.
Used by the /health endpoint to report degraded state when
live_workers < total_workers.
max_seq_length: usizeMaximum tokenized sequence length in use.
tuning: OnceLock<TuningInfo>Static memory-detection info written once before the probe starts.
Written to OnceLock as soon as memory detection completes (before the
background probe finishes), so /health can show memory_source,
available_bytes, and model_rss_bytes_per_worker even while the probe
is still running.
cost_model: Arc<ArcSwap<CostModel>>Live cost-model coefficients.
Initialized to conservative defaults at startup. Updated atomically by
the background probe (or cache-hit path) once fitted coefficients are
available. All workers share this same handle and observe the update
lock-free on their next session.run() call.
probe_status: AtomicU8Current state of the background memory probe.
Updated atomically from the background probe task. Read by /health
to expose probe_status in the tuning block.
request_permits: Arc<Semaphore>Concurrency gate for in-flight embedding requests.
Initialized to max(cfg_workers - 1, 1) permits, reserving one worker
slot for the background auto-budget probe. Raised to cfg_workers
atomically on every terminal probe-status transition (Disabled,
CacheHit, Complete, Failed) so full concurrency is available once
the probe no longer needs a reserved worker.
Test helpers set this to usize::MAX (effectively uncapped) so that
existing tests do not need to acquire a permit.
Auto Trait Implementations§
impl !Freeze for AppState
impl RefUnwindSafe for AppState
impl Send for AppState
impl Sync for AppState
impl Unpin for AppState
impl UnsafeUnpin for AppState
impl UnwindSafe for AppState
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more