Coverage for astrocyte/pipeline/spreading_activation.py: 0%

18 statements  

« prev     ^ index     » next       coverage.py v7.15.0, created at 2026-07-04 05:24 +0000

1"""Fix 3 (conv-run-4) — entity spreading activation at retrieval time. 

2 

3After ``section_recall`` produces an initial top-K of fused hits, we 

4expand by one hop through entity co-occurrence: every retrieved 

5section's entities become a probe for other sections in the same bank 

6that share at least one of those entities. The expanded candidates 

7are appended to the fused list BEFORE the cross-encoder rerank, so the 

8rerank sees the full neighborhood and can promote a correct section 

9that the initial strategies missed. 

10 

11Why this exists: the failure case is Denver/Disneyland conflation in 

12LME — the initial top-K surfaces "Disneyland" because it shares 

13keywords with "Denver" via a noisy LLM-extracted graph edge, but the 

14correct session ("Red Rocks", which is also in Denver) is not 

15keyword-adjacent to the question. Both sections share the entity 

16"Denver" though, so entity co-occurrence bridges them. 

17 

18Distinct from ``expand_section_links`` (which uses the 

19``astrocyte_pi_section_links`` table populated by semantic_knn + 

20LLM-extracted edges): the link table is sparse in conversation ingest, 

21so graph_expand can't bridge entity-coincident sections. Entity spread 

22uses the dense ``astrocyte_pi_section_entities`` table that retain 

23already populates per section. 

24 

25See: 

26- ``docs/_design/recall.md`` §6 (recall pipeline) 

27- ``astrocyte.pipeline.section_recall`` 

28""" 

29 

30from __future__ import annotations 

31 

32import logging 

33from typing import TYPE_CHECKING 

34 

35if TYPE_CHECKING: 

36 from astrocyte.provider import PageIndexStore 

37 

38_logger = logging.getLogger("astrocyte.pipeline.spreading_activation") 

39 

40 

41async def expand_via_shared_entities( 

42 *, 

43 store: PageIndexStore, 

44 bank_id: str, 

45 seeds: list[tuple[str, int]], 

46 top_k: int = 20, 

47 max_seeds: int = 10, 

48 exclude_seeds: bool = True, 

49) -> list[tuple[str, int, float]]: 

50 """One-hop entity-co-occurrence spread from a list of seed sections. 

51 

52 Args: 

53 store: PageIndexStore SPI handle. 

54 bank_id: Scope to the user's bank. 

55 seeds: ``(document_id, line_num)`` pairs — typically the top 

56 ``recall.fused`` hits. 

57 top_k: Maximum number of expanded sections to return. 

58 max_seeds: Cap the seed count to keep the SQL bounded; the 

59 top entries in a fused list carry the strongest recall 

60 signal, so trimming the tail rarely costs precision. 

61 exclude_seeds: When True (default), filter the seeds themselves 

62 out of the result — the caller already has them in 

63 ``recall.fused`` and doesn't want duplicates. 

64 

65 Returns: 

66 ``[(document_id, line_num, score), ...]`` where score is the 

67 count of distinct shared entities with any seed. Empty list 

68 on store errors or when the store doesn't implement the 

69 ``expand_sections_by_shared_entities`` SPI (older test fixtures). 

70 """ 

71 if not seeds: 

72 return [] 

73 if len(seeds) > max_seeds: 

74 seeds = seeds[:max_seeds] 

75 expander = getattr(store, "expand_sections_by_shared_entities", None) 

76 if expander is None: 

77 _logger.debug( 

78 "spreading_activation: store=%s has no expand_sections_by_shared_entities, skip", 

79 type(store).__name__, 

80 ) 

81 return [] 

82 try: 

83 return await expander( 

84 bank_id, 

85 seeds, 

86 top_k=top_k, 

87 exclude_seeds=exclude_seeds, 

88 ) 

89 except Exception as exc: # noqa: BLE001 

90 _logger.warning( 

91 "spreading_activation: bank=%s seeds=%d failed (%s)", 

92 bank_id, 

93 len(seeds), 

94 exc, 

95 ) 

96 return []