Skip to main content

Config

Struct Config 

Source
pub struct Config {
    pub cache_dir: String,
    pub bind_addr: String,
    pub workers: usize,
    pub intra_threads: usize,
    pub max_batch: usize,
    pub max_seq_length: usize,
    pub idle_timeout: Option<Duration>,
    pub model_variant: ModelVariant,
    pub memory_safety_factor: f64,
    pub cost_model_override: Option<CostModel>,
    pub heartbeat_secs: u64,
}
Expand description

Runtime configuration loaded from environment variables.

All fields are read once at startup via Config::from_env. Changes to environment variables after startup have no effect.

Fields§

§cache_dir: String

Path to the directory where ONNX model files are cached.

Set with BGE_M3_CACHE_DIR. Defaults to /cache.

§bind_addr: String

TCP bind address for the HTTP server.

Set with BGE_M3_BIND. Defaults to 0.0.0.0:8081. The 0.0.0.0 default is intentional for Docker container deployments.

§workers: usize

Number of embedding worker threads to spawn.

Set with BGE_M3_WORKERS. Defaults to 2. Minimum effective value is 1. Each worker loads its own model instance.

§intra_threads: usize

Number of intra-op threads each ORT session may use for a single session.run() call (matmul / attention kernels).

Set with BGE_M3_INTRA_THREADS. Defaults to 1. Minimum effective value is 1.

The default of 1 preserves predictable per-worker RSS (the workspace probe and quadratic cost model are calibrated against single-threaded MLAS runs). Raise this on under-utilized hosts where BGE_M3_WORKERS * intra_threads <= num_cpus: e.g. on an 8 vCPU task with workers=2, setting intra_threads=4 lets each worker fan out to four cores during inference, taking CPU utilization from ~25% to ~100% under load. Going above floor(num_cpus / workers) causes thread oversubscription and hurts throughput.

Re-run the startup probe (do not pin coefficients) after changing this value so the cost model captures any new scratch-buffer overhead.

§max_batch: usize

Maximum number of input texts accepted in a single request.

Set with BGE_M3_MAX_BATCH. Defaults to 256. Minimum effective value is 1.

§max_seq_length: usize

Maximum sequence length (tokens) for a single text.

Set with BGE_M3_MAX_SEQ_LENGTH. Defaults to 8192 (BGE-M3’s published max). Range: [1, 8192]. Set lower to reduce memory footprint on constrained hardware.

The tokenizer will silently truncate any input exceeding this length. The probe and bin-packer use this as the upper bound when computing workspace costs.

§idle_timeout: Option<Duration>

Duration of inactivity after which workers unload their model instances from memory.

Set with BGE_M3_IDLE_TIMEOUT_SECS. Defaults to 300 (5 minutes). Set to 0 to disable idle unloading entirely.

When unloaded, models are automatically reloaded on the next incoming request. The reload blocks the request until complete (~5–10 s from CoreML compiled cache; ~15–30 s cold).

§model_variant: ModelVariant

ONNX model variant to load.

Set with BGE_M3_MODEL. Accepts "fp32", "fp16", or "int8". Defaults to "fp16" for fleet-wide embedding consistency and reduced RAM on Linux/Intel deployments. Set BGE_M3_MODEL=fp32 on Apple Silicon to recover CoreML GPU acceleration. See ModelVariant for per-variant performance and memory trade-offs.

§memory_safety_factor: f64

Fraction of estimated available workspace to actually use per worker.

Set with BGE_M3_MEMORY_SAFETY_FACTOR. Defaults to 0.7 (30% headroom for ORT arena fragmentation and spike overhead not captured by the probe). Range: 0.1..=1.0.

§cost_model_override: Option<CostModel>

If Some, skip the startup probe and use this cost model directly.

Populated when:

  • BGE_M3_DISABLE_AUTO_BUDGET=1 is set (uses conservative defaults), or
  • BGE_M3_TOKEN_BUDGET is set (translates the legacy token count to a max_workspace_bytes using conservative a/b coefficients), or
  • BGE_M3_COST_MODEL_A and BGE_M3_COST_MODEL_B are both set with BGE_M3_AVAILABLE_MEMORY_BYTES (full explicit override).
§heartbeat_secs: u64

Interval (seconds) between periodic heartbeat log events.

Set with BGE_M3_HEARTBEAT_SECS. Defaults to 60. Set to 0 to disable heartbeat logging entirely.

Heartbeat events log RSS, live/loaded worker counts, queue depth, available request permits, and current probe status — useful for detecting slow memory leaks or queue saturation between requests.

Implementations§

Source§

impl Config

Source

pub fn from_env() -> Self

Creates a Config by reading environment variables.

Unrecognized or missing variables fall back to their defaults.

Source

pub(crate) fn from_lookup<F: Fn(&str) -> Option<String>>(lookup: F) -> Self

Creates a Config by resolving each setting through lookup.

lookup receives an env-var name and returns its value if set, or None to fall back to the default for that setting. Used by Config::from_env with the real environment and in tests with a closure over a HashMap.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more