Experiments (#68)
* feat: finance bench * docs: results * chore: ignroe data README * feat: fix financebench * feat: laion, also required idmaps support * style: format * style: format * fix: resolve ruff linting errors - Remove unused variables in benchmark scripts - Rename unused loop variables to follow convention * feat: enron email bench * experiments for running DiskANN & BM25 on Arch 4090 * style: format * chore(ci): remove paru-bin submodule and config to fix checkout --recurse-submodules * docs: data * docs: data updated * fix: as package * fix(ci): only run pre-commit * chore: use http url of astchunk; use group for some dev deps * fix(ci): should checkout modules as well since `uv sync` checks * fix(ci): run with lint only * fix: find links to install wheels available * CI: force local wheels in uv install step * CI: install local wheels via file paths * CI: pick wheels matching current Python tag * CI: handle python tag mismatches for local wheels * CI: use matrix python venv and set macOS deployment target * CI: revert install step to match main * CI: use uv group install with local wheel selection * CI: rely on setup-uv for Python and tighten group install * CI: install build deps with uv python interpreter * CI: use temporary uv venv for build deps * CI: add build venv scripts path for wheel repair
This commit is contained in:
@@ -454,6 +454,17 @@ class LeannBuilder:
|
||||
provider_options=self.embedding_options,
|
||||
)
|
||||
string_ids = [chunk["id"] for chunk in self.chunks]
|
||||
# Persist ID map alongside index so backends that return integer labels can remap to passage IDs
|
||||
try:
|
||||
idmap_file = (
|
||||
index_dir
|
||||
/ f"{index_name[: -len('.leann')] if index_name.endswith('.leann') else index_name}.ids.txt"
|
||||
)
|
||||
with open(idmap_file, "w", encoding="utf-8") as f:
|
||||
for sid in string_ids:
|
||||
f.write(str(sid) + "\n")
|
||||
except Exception:
|
||||
pass
|
||||
current_backend_kwargs = {**self.backend_kwargs, "dimensions": self.dimensions}
|
||||
builder_instance = self.backend_factory.builder(**current_backend_kwargs)
|
||||
builder_instance.build(embeddings, string_ids, index_path, **current_backend_kwargs)
|
||||
@@ -573,6 +584,17 @@ class LeannBuilder:
|
||||
|
||||
# Build the vector index using precomputed embeddings
|
||||
string_ids = [str(id_val) for id_val in ids]
|
||||
# Persist ID map (order == embeddings order)
|
||||
try:
|
||||
idmap_file = (
|
||||
index_dir
|
||||
/ f"{index_name[: -len('.leann')] if index_name.endswith('.leann') else index_name}.ids.txt"
|
||||
)
|
||||
with open(idmap_file, "w", encoding="utf-8") as f:
|
||||
for sid in string_ids:
|
||||
f.write(str(sid) + "\n")
|
||||
except Exception:
|
||||
pass
|
||||
current_backend_kwargs = {**self.backend_kwargs, "dimensions": self.dimensions}
|
||||
builder_instance = self.backend_factory.builder(**current_backend_kwargs)
|
||||
builder_instance.build(embeddings, string_ids, index_path)
|
||||
|
||||
Reference in New Issue
Block a user