Skip to main content

Crate bge_m3_embedding_server

Crate bge_m3_embedding_server 

Source
Expand description

Library crate for the bge-m3 embedding server.

main.rs is a 20–30 line entry point that calls run; all real orchestration logic lives here so it can be unit-tested and reused from integration tests without spawning the binary.

Modules§

binpack
Bin-packing algorithm that groups tokenized sequences into session.run() calls that each fit within the per-worker workspace budget.
bootstrap
Server-startup orchestration: routing, workspace budget, readiness probe, and the background probe task that fits the cost model on first start.
config
Server configuration loaded from environment variables at startup.
embedder
Worker-pool–driven BGE-M3 embedding service.
error
Application-level error types that map to HTTP status codes.
handler
HTTP handlers for the embedding service.
models
Request and response model types for the embedding API endpoints.
probe
Startup memory probe and cost-model coefficient fitter.
state
Shared application state threaded through Axum handlers via Arc<AppState>.
sysinfo
Memory detection for auto-budget computation.
weights
Bundled BGE-M3 sparse-linear projection weights.

Functions§

run
Runs the embedding server end-to-end: load config, spawn the worker pool, install the readiness probe, start the heartbeat, and serve HTTP traffic.