pub struct EmbedPool {
tx: Sender<EmbedRequest>,
live_workers: Arc<AtomicUsize>,
loaded_workers: Arc<AtomicUsize>,
model_rss_per_worker_bytes: Arc<AtomicUsize>,
}Expand description
Async handle to the embedding worker thread pool.
Wraps a bounded mpsc channel shared by n spawn_blocking worker threads.
Each worker owns its own ORT session and tokenizer; the pool dispatches
EmbedRequest variants to whichever worker is free next.
Clone is cheap — the underlying channel sender and atomic counters are reference-counted.
Fields§
§tx: Sender<EmbedRequest>§live_workers: Arc<AtomicUsize>§loaded_workers: Arc<AtomicUsize>Number of workers that currently have model instances loaded in memory.
model_rss_per_worker_bytes: Arc<AtomicUsize>Median RSS delta (bytes) measured across all workers during sequential model load.
Workers load one at a time (leader first, then followers in sequence).
Each reports its own RSS before/after load_models() via ready_tx.
The pool stores the median once all workers have signaled ready — robust
to one outlier from page-cache settling or ORT arena init jitter.
Used by run_readiness_probe to correctly deduct the model-weight
footprint from the available workspace before computing per-worker
budget. Returns 0 on non-Linux targets where RSS measurement is
unavailable, or before the init task has completed.
Implementations§
Source§impl EmbedPool
impl EmbedPool
Sourcepub fn spawn(
n: usize,
cache_dir: PathBuf,
config: WorkerConfig,
) -> (Self, JoinHandle<Result<()>>)
pub fn spawn( n: usize, cache_dir: PathBuf, config: WorkerConfig, ) -> (Self, JoinHandle<Result<()>>)
Spawns n embedding worker threads and returns the pool plus an init
handle that resolves once all workers have finished loading their models.
Sourcepub async fn dense(
&self,
texts: Vec<String>,
) -> Result<(Vec<Vec<f32>>, EmbedStats)>
pub async fn dense( &self, texts: Vec<String>, ) -> Result<(Vec<Vec<f32>>, EmbedStats)>
Runs dense (float32) embedding inference on texts.
§Errors
- Returns
Errif the worker channel has closed (pool shut down). - Returns
Errif the worker drops the reply sender before responding. - Returns
Errif the ORT session fails during inference.
Sourcepub async fn sparse(
&self,
texts: Vec<String>,
) -> Result<(Vec<SparseEmbedding>, EmbedStats)>
pub async fn sparse( &self, texts: Vec<String>, ) -> Result<(Vec<SparseEmbedding>, EmbedStats)>
Runs sparse (SPLADE-style) embedding inference on texts.
§Errors
- Returns
Errif the worker channel has closed (pool shut down). - Returns
Errif the worker drops the reply sender before responding. - Returns
Errif the ORT session fails during inference.
Sourcepub async fn both(
&self,
texts: Vec<String>,
) -> Result<(Vec<DualEmbedding>, EmbedStats)>
pub async fn both( &self, texts: Vec<String>, ) -> Result<(Vec<DualEmbedding>, EmbedStats)>
Runs a single forward pass that yields both dense and sparse embeddings.
Equivalent to calling Self::dense and Self::sparse back-to-back,
but uses one session.run() per chunk instead of two — at near-zero
marginal GPU cost.
§Errors
- Returns
Errif the worker channel has closed (pool shut down). - Returns
Errif the worker drops the reply sender before responding. - Returns
Errif the ORT session fails during inference.
Sourcepub(crate) async fn probe(&self, texts: Vec<String>) -> Result<ProbeResult>
pub(crate) async fn probe(&self, texts: Vec<String>) -> Result<ProbeResult>
Sends a probe request to a single worker and returns the result.
Only called during init before ready is set.
Sourcepub fn live_worker_count(&self) -> usize
pub fn live_worker_count(&self) -> usize
Returns the number of worker threads currently alive (not yet exited).
Sourcepub fn loaded_worker_count(&self) -> usize
pub fn loaded_worker_count(&self) -> usize
Returns the number of workers that currently have model instances loaded in memory.
A worker transitions from loaded to unloaded after the crate::config::Config::idle_timeout
elapses with no incoming requests, and back to loaded on the next request.
Sourcepub fn queue_depth(&self) -> usize
pub fn queue_depth(&self) -> usize
Returns the number of requests currently queued but not yet picked up by a worker. Uses the channel’s current vs max capacity.
Sourcepub fn model_rss_per_worker_bytes(&self) -> usize
pub fn model_rss_per_worker_bytes(&self) -> usize
Returns the median RSS delta (bytes) measured across all workers during sequential model load.
This is the per-worker model-weight footprint used by
run_readiness_probe to compute the per-worker workspace budget.
Returns 0 on non-Linux targets where RSS measurement is unavailable,
or before the init task has completed.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for EmbedPool
impl RefUnwindSafe for EmbedPool
impl Send for EmbedPool
impl Sync for EmbedPool
impl Unpin for EmbedPool
impl UnsafeUnpin for EmbedPool
impl UnwindSafe for EmbedPool
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more