Skip to main content

ModelVariant

Enum ModelVariant 

Source
pub enum ModelVariant {
    Fp32,
    Fp16,
    Int8,
}
Expand description

ONNX model variant to load.

Controlled by BGE_M3_MODEL. Defaults to ModelVariant::Fp16.

Variants§

§

Fp32

BAAI/bge-m3 FP32 model (~2.16 GB per session).

Set BGE_M3_MODEL=fp32 to enable. Recommended for Apple Silicon CoreML deployments where latency is the primary constraint: the FP32 ONNX graph contains no Cast nodes, so ORT can dispatch the entire multi-head attention + FFN block as one contiguous CoreML subgraph to the GPU — delivering 20–61% lower latency than the MLAS CPU baseline.

Not the default. Linux/Intel (MLAS-only) deployments should prefer ModelVariant::Fp16 for lower RAM and fleet-wide embedding consistency.

§

Fp16

Xenova/bge-m3 FP16 model (~1.08 GB per session). Default. Halves per-session memory vs FP32 (~50% reduction; ~1.08 GB vs ~2.16 GB).

This is the fleet default: all Apple Silicon LaunchAgent deployments set BGE_M3_MODEL=fp16 explicitly, and the server default matches so that Linux/Docker deployments produce consistent embeddings without any additional configuration.

Latency caveat (CoreML only). The Xenova FP16 ONNX model contains FP16↔FP32 Cast nodes at every transformer-layer boundary. ORT’s CoreML EP cannot fuse these into the attention/FFN subgraphs; each Cast executes on CPU and the transformer block never forms a single contiguous GPU subgraph. Result: FP16 + CoreML EP runs 6–10× slower than FP32 + CoreML. On MLAS/CPU EP (Linux, Intel), this Cast overhead is similarly present but the MLAS FP16 penalty (~6–9×) is the accepted trade-off for lower RAM and fleet consistency. Use BGE_M3_MODEL=fp32 on Apple Silicon to recover CoreML GPU acceleration.

§

Int8

Xenova/bge-m3 INT8 quantized model (~568 MB per session). Weights-only quantization; ORT dequantizes to f32 internally. Reduces peak memory by ~74% per worker vs FP32.

Embedding quality validated: dense cosine similarity ≥ 0.963 vs FP32 reference across a 184-text corpus — suitable for ANN search and semantic ranking. Avoid for applications requiring ranking precision within very small similarity margins (< 0.05 apart).

Use with MLAS (CPU EP) only. DequantizeLinear nodes fragment the CoreML execution plan identically to FP16 Cast nodes; INT8 + CoreML EP runs 42–79% slower than INT8 + MLAS with no GPU benefit.

Trait Implementations§

Source§

impl Clone for ModelVariant

Source§

fn clone(&self) -> ModelVariant

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ModelVariant

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for ModelVariant

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for ModelVariant

Source§

fn eq(&self, other: &ModelVariant) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for ModelVariant

Source§

impl Eq for ModelVariant

Source§

impl StructuralPartialEq for ModelVariant

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> FromRef<T> for T
where T: Clone,

§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
§

impl<T> ToCompactString for T
where T: Display,

§

fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>

Fallible version of [ToCompactString::to_compact_string()] Read more
§

fn to_compact_string(&self) -> CompactString

Converts the given value to a [CompactString]. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more