Coverage for astrocyte/pipeline/spreading_activation.py: 0%
18 statements
« prev ^ index » next coverage.py v7.15.0, created at 2026-07-04 05:24 +0000
« prev ^ index » next coverage.py v7.15.0, created at 2026-07-04 05:24 +0000
1"""Fix 3 (conv-run-4) — entity spreading activation at retrieval time.
3After ``section_recall`` produces an initial top-K of fused hits, we
4expand by one hop through entity co-occurrence: every retrieved
5section's entities become a probe for other sections in the same bank
6that share at least one of those entities. The expanded candidates
7are appended to the fused list BEFORE the cross-encoder rerank, so the
8rerank sees the full neighborhood and can promote a correct section
9that the initial strategies missed.
11Why this exists: the failure case is Denver/Disneyland conflation in
12LME — the initial top-K surfaces "Disneyland" because it shares
13keywords with "Denver" via a noisy LLM-extracted graph edge, but the
14correct session ("Red Rocks", which is also in Denver) is not
15keyword-adjacent to the question. Both sections share the entity
16"Denver" though, so entity co-occurrence bridges them.
18Distinct from ``expand_section_links`` (which uses the
19``astrocyte_pi_section_links`` table populated by semantic_knn +
20LLM-extracted edges): the link table is sparse in conversation ingest,
21so graph_expand can't bridge entity-coincident sections. Entity spread
22uses the dense ``astrocyte_pi_section_entities`` table that retain
23already populates per section.
25See:
26- ``docs/_design/recall.md`` §6 (recall pipeline)
27- ``astrocyte.pipeline.section_recall``
28"""
30from __future__ import annotations
32import logging
33from typing import TYPE_CHECKING
35if TYPE_CHECKING:
36 from astrocyte.provider import PageIndexStore
38_logger = logging.getLogger("astrocyte.pipeline.spreading_activation")
41async def expand_via_shared_entities(
42 *,
43 store: PageIndexStore,
44 bank_id: str,
45 seeds: list[tuple[str, int]],
46 top_k: int = 20,
47 max_seeds: int = 10,
48 exclude_seeds: bool = True,
49) -> list[tuple[str, int, float]]:
50 """One-hop entity-co-occurrence spread from a list of seed sections.
52 Args:
53 store: PageIndexStore SPI handle.
54 bank_id: Scope to the user's bank.
55 seeds: ``(document_id, line_num)`` pairs — typically the top
56 ``recall.fused`` hits.
57 top_k: Maximum number of expanded sections to return.
58 max_seeds: Cap the seed count to keep the SQL bounded; the
59 top entries in a fused list carry the strongest recall
60 signal, so trimming the tail rarely costs precision.
61 exclude_seeds: When True (default), filter the seeds themselves
62 out of the result — the caller already has them in
63 ``recall.fused`` and doesn't want duplicates.
65 Returns:
66 ``[(document_id, line_num, score), ...]`` where score is the
67 count of distinct shared entities with any seed. Empty list
68 on store errors or when the store doesn't implement the
69 ``expand_sections_by_shared_entities`` SPI (older test fixtures).
70 """
71 if not seeds:
72 return []
73 if len(seeds) > max_seeds:
74 seeds = seeds[:max_seeds]
75 expander = getattr(store, "expand_sections_by_shared_entities", None)
76 if expander is None:
77 _logger.debug(
78 "spreading_activation: store=%s has no expand_sections_by_shared_entities, skip",
79 type(store).__name__,
80 )
81 return []
82 try:
83 return await expander(
84 bank_id,
85 seeds,
86 top_k=top_k,
87 exclude_seeds=exclude_seeds,
88 )
89 except Exception as exc: # noqa: BLE001
90 _logger.warning(
91 "spreading_activation: bank=%s seeds=%d failed (%s)",
92 bank_id,
93 len(seeds),
94 exc,
95 )
96 return []