Expand description
Worker-poolβdriven BGE-M3 embedding service.
Submodules:
types: public DTOs and the internalEmbedRequestenum.error: smallort::Error β anyhow::Erroradapter.model_files: hf-hub download / cache layout for the ONNX model files.tokenize: tokenizer load + no-pad tokenization + chunk-array build.session: ORT execution-provider config and session loading.math: pure dense/sparse math helpers (testable without ORT).dense: dense embedding pipeline.sparse: BGE-M3 SPLADE-style sparse embedding pipeline.dual: paired dense + sparse embedding pipeline (one forward pass).worker: blocking worker thread, request dispatch, probe wiring.pool:EmbedPoolasync wrapper and test helpers.
ModulesΒ§
- dense π
- Dense embedding pipeline.
- dual π
- Paired dense + sparse embedding pipeline (one forward pass per chunk).
- error π
- Error helpers for the embedder.
- math π
- Pure dense/sparse math helpers (testable without ORT).
- model_
files π HuggingFaceHub download + cache-layout helpers for the BGE-M3 model files.- pool π
EmbedPoolasync wrapper around the worker thread pool.- session π
- ORT execution-provider configuration and session loading.
- sparse π
- BGE-M3 SPLADE-style sparse embedding pipeline.
- tokenize π
- Tokenizer load + no-pad tokenization + chunk-array build helpers.
- types π
- Public DTOs and the internal
EmbedRequestenum exchanged between the pool and the worker threads. - worker π
- Blocking worker thread, request dispatch, and probe wiring.
StructsΒ§
- Embed
Pool - Async handle to the embedding worker thread pool.