Expand description
Library crate for the bge-m3 embedding server.
main.rs is a 20–30 line entry point that calls run; all real
orchestration logic lives here so it can be unit-tested and reused from
integration tests without spawning the binary.
Modules§
- binpack
- Bin-packing algorithm that groups tokenized sequences into
session.run()calls that each fit within the per-worker workspace budget. - bootstrap
- Server-startup orchestration: routing, workspace budget, readiness probe, and the background probe task that fits the cost model on first start.
- config
- Server configuration loaded from environment variables at startup.
- embedder
- Worker-pool–driven BGE-M3 embedding service.
- error
- Application-level error types that map to HTTP status codes.
- handler
- HTTP handlers for the embedding service.
- models
- Request and response model types for the embedding API endpoints.
- probe
- Startup memory probe and cost-model coefficient fitter.
- state
- Shared application state threaded through Axum handlers via
Arc<AppState>. - sysinfo
- Memory detection for auto-budget computation.
- weights
- Bundled BGE-M3 sparse-linear projection weights.
Functions§
- run
- Runs the embedding server end-to-end: load config, spawn the worker pool, install the readiness probe, start the heartbeat, and serve HTTP traffic.