Crate bge_m3_embedding_server

Expand description

Library crate for the bge-m3 embedding server.

main.rs is a 20–30 line entry point that calls run; all real orchestration logic lives here so it can be unit-tested and reused from integration tests without spawning the binary.

Modules§

binpack: Bin-packing algorithm that groups tokenized sequences into session.run() calls that each fit within the per-worker workspace budget.
bootstrap: Server-startup orchestration: routing, workspace budget, readiness probe, and the background probe task that fits the cost model on first start.
config: Server configuration loaded from environment variables at startup.
embedder: Worker-pool–driven BGE-M3 embedding service.
error: Application-level error types that map to HTTP status codes.
handler: HTTP handlers for the embedding service.
models: Request and response model types for the embedding API endpoints.
probe: Startup memory probe and cost-model coefficient fitter.
state: Shared application state threaded through Axum handlers via Arc<AppState>.
sysinfo: Memory detection for auto-budget computation.
weights: Bundled BGE-M3 sparse-linear projection weights.

Functions§

run: Runs the embedding server end-to-end: load config, spawn the worker pool, install the readiness probe, start the heartbeat, and serve HTTP traffic.

Crate bge_m3_embedding_server

Crate bge_m3_embedding_server Copy item path

Modules§

Functions§

Crate bge_m3_embedding_server