What BOTWAVEBOMBA Is
BOTWAVEBOMBA is the public face of a private journalism research
infrastructure called TELOS+PAI. Every article shown here was
ingested by global_ingestor.py, clustered across information
blocs, and tagged with a per-source five-axis bias fingerprint. The
pipeline runs every six hours on a systemd timer.
(bomba_pipeline.sh, botwave-bomba-pipeline.timer: OnCalendar=*-*-* 00,06,12,18:00:00)
492 sources are ingested every six hours
(book_arm/memory/sources_global.json).
496 carry full five-axis bias fingerprints
(data/source_registry.json).
Every source is classified by bloc (Western / Adversarial / Non-Aligned),
factuality (high / mixed / low), and primary-vs-launderer status.
It is not a news aggregator in the Google News sense. It is not a fact-checker. It is a bloc-level coverage-and-bias comparison engine: which information bloc covered each story, where the volume gaps are, and how each source's five-axis bias profile maps across blocs. Sentence-level framing analysis (verb-choice, propaganda lexicon) is on the decomposition roadmap below — it is not in the live pipeline yet.
Why Not Left / Center / Right?
Ground News, AllSides, and similar services organize sources on a left–center–right axis calibrated to American domestic politics. That axis is useful in a country where the main conflict is between two parties managing the same system. It is useless when you are trying to understand what the Iranian press says about the Strait of Hormuz, what the Swedish press says about NATO, or what the Chinese press says about Taiwan.
BOTWAVEBOMBA uses five independent axes, each scored from -1.0 to +1.0
(per-source, in data/source_registry.json under each entry's axis block):
- Atlanticist — does this outlet assume US-led world order is legitimate?
- Interventionist — does this outlet support military intervention?
- Zionist — how does this outlet frame Israeli state action?
- Statist — does this outlet reinforce or challenge state authority?
- Financialized — does this outlet treat financial capitalism as natural?
These axes produce a five-dimensional fingerprint per source — not a simple left/right label. A source can be anti-interventionist and pro-state simultaneously (much of the RT editorial line). A source can be pro-market and anti-zionist simultaneously (some European financial press on Israel/Palestine coverage). The fingerprint captures that complexity. A label cannot.
What The Pipeline Does Today
The pipeline currently runs in two analytical stages, plus deployment plumbing. The full seven-stage decomposition described historically on this page is in progress — see the roadmap below.
-
Stage 1 — Ingest.
global_ingestor.pypulls RSS feeds and full article text (viahttpx+readability-lxml) from the 492-source ingest set, deduplicates by content hash, writes tonews_cache.jsonl. (book_arm/pai_modules/global_ingestor.py:46-55) -
Stage 2 — Generate feed (monolith).
generate_feed.pyreadsnews_cache.jsonl+data/source_registry.json, clusters articles by multi-entity overlap, attaches per-source five-axis bias data via lookup, computes per-cluster bloc-coverage percentages (Western / Neutral / Adversarial), flags blindspots (Blackout / Spotlight thresholds), and serializes the result toapi/latest.json— the JSON this site reads. Story-card PNGs are rendered bygenerate_cards.pyfor the top stories. After that the staging Pages repo is committed and pushed. Discord and Telegram digests follow. (zombie760.github.io/scripts/generate_feed.py+scripts/bomba_pipeline.sh)
The monolith does inline what the seven-stage decomposition will do as
discrete modules. Behaviors present (entity clustering, per-source bias
lookup, bloc-coverage computation, blindspot flagging) are all in
generate_feed.py. Behaviors named on the historical version
of this page but not present anywhere in the live pipeline (per-article
propaganda-lexicon scoring, agency-verb analysis, sentence-level verb
choice diff) are roadmap, not current.
Decomposition Roadmap
The plan: extract one analytical stage from generate_feed.py
per calendar week, with fixture-equivalence to the monolith as the
acceptance test. After six weeks the monolith is six discrete modules;
a seventh week replaces the glue with a proper orchestrator.
Full plan: PLAN_HYBRID.md. Live state: pipeline_state.json.
The roadmap entries below reflect pipeline_state.json at
page-render time. The status badges flip as each weekly extraction
lands. Status page shows the same data live and auto-refreshing.
-
discrete
global_ingestor.py— pulls RSS + article full-text from 492 sources, dedup by hash, writesnews_cache.jsonl. Pre-existing; not part of the decomposition. (book_arm/pai_modules/global_ingestor.py) -
extracted
event_clusterer.py— groups articles about the same event using entity-pair overlap (primary path, ≥3 articles ≥2 sources) and single-entity fallback. Extracted 2026-05-10 fromgenerate_feed.py:538-623; fixture-equivalence verified (160/160 stories identical between pre- and post-extraction outputs). Module lives atbook_arm/pai_modules/event_clusterer.py; schemas atschemas/event_clusterer_{input,output}.schema.json; fixture-equivalence test attests/test_event_clusterer.py. -
extracted
bias_scorer.py— attaches per-source bias enrichment to each cluster (bias_tier, bias_bucket, bloc, geo_cluster, atlanticist_norm) and computes per-cluster bias_variance (stdev of atlanticist scores) + five-axis averages (interventionist, zionist, atlanticist, statist, financialized). Per-source LOOKUP fromdata/source_registry.json, not per-article propaganda-lexicon scoring. Extracted 2026-05-10 fromgenerate_feed.py:589-622+:667-677; fixture-equivalence verified (0/160 stories diverged in bias data). Original single-axis -6..+6 propaganda-lexicon standalone reference preserved atbook_arm/pai_modules/_reference/bias_scorer.py.original— sentence-level lexicon scoring remains a methodology-roadmap item, separate from this extraction. -
extracted
framing_differ.py— for each cluster, picks the consensus framing (article whose headline shares the most meaningful words with the rest of the cluster), builds per-article framing cards (headline + lede viaget_snippet+ source metadata), and selects the cluster's hero image and video. Extracted 2026-05-10 fromgenerate_feed.py:474-513+:29-44+:644-657. Module lives atbook_arm/pai_modules/framing_differ.py; tests attests/test_framing_differ.py. Sentence-level verb-choice and named-entity diff (richer analysis described historically) remain a methodology-roadmap item — not in this extraction. Fixture-equivalence verified (0/160 stories diverged). -
extracted
blindspot_analyzer.py— applies the Ground News-style left/center/right coverage breakdown and the Ground News blindspot formula (one side <17% AND other side ≥33%) per cluster. Also runs the Western Mono-Frame / Blackout geo-frame detection over geopolitical watchlists (Middle East, cartel/intel, US foreign policy, Africa-suppressed). Extracted 2026-05-10 fromgenerate_feed.py:291-401+:51-101+:606-614+:681-689. Writesblindspots.jsonl. Fixture-equivalence verified (0/160 diff on is_blindspot/blindspot_score/coverage/geo_frame). -
extracted
coverage_mapper.py— pure projection of the heatmap-distribution fields (left_pct, center_pct, right_pct, state_count, dominant_bucket, dominant_pct) fromblindspot_analyzer's combined output into a standalonecoverage.jsonlartifact. Sharing compute_coverage across the two modules avoids duplicate code paths; coverage_mapper depends on blindspot_analyzer's output, not its own independent computation. Extracted 2026-05-10 fromgenerate_feed.py:702-709. -
extracted
broadcast.py— serializes pipeline output toapi/latest.jsonandapi/blindspots.json. Extracted 2026-05-11 fromgenerate_feed.py:770-778(output dict build + JSON write). Thin wrapper — all analytical computation happens upstream; this module only writes. Fixture-equivalence verified: JSON serialization output identical between baseline and extracted module. (book_arm/pai_modules/broadcast.py)
All six modules have flipped to extracted. Next: wire the modules behind a proper orchestrator (run_pipeline.py) that emits a per-cycle pipeline_run.json the status page reads for live per-stage health. The orchestrator schema is already drafted: schemas/pipeline_run.schema.json.
What We Do Not Do
- We do not editorialize. The bloc-coverage comparison shows the delta; you read it.
- We do not write summaries. Every
summaryfield is the article's own description, truncated. (generate_feed.py:631:summary = headline_art.get('description', '')) - We do not rate factuality of individual claims. Source-level factuality labels appear in
source_registry.jsonunder each source'sfactualityfield (values:high/mixed/low). The labels draw on MBFC where available, supplemented by hand-curation. Per-source provenance trace is on the methodology-audit roadmap. - We do not suppress adversarial sources.
rt,rt_arabic,sputnik,tass, and other adversarial-bloc outlets are insource_registry.jsonwithbloc: "adversarial"labels visible. - We do not invent. Every claim traces to an article URL surfaced under
sources[].urlinapi/latest.json.
The Journalism Connection
BOTWAVEBOMBA is the public surface of a private investigative research
substrate that supports long-form journalism. The same ingest layer
(global_ingestor.py → book_arm/memory/news_cache.jsonl)
feeds both the public site and the book-arm's primary-source discovery
for book-length investigation — the kind that requires knowing not just
what AP reported, but what IRNA, Lenta.ru, and SCMP said on the same day
about the same event, in their original framing, before any translation
layer added editorial distance. Both sides consume from the 492-source
ingest set.
The methodology is the same as the journalism: primary sources or nothing. Every source URL is live. Every score is computed or looked up from a versioned registry, not assigned post-hoc. Every blindspot is measured against the cache, not asserted.
BE UNDENIABLE. Every claim filed. Every source named. A single unanchored assertion is the lever a critic uses to dismiss the whole work. We do not write one.
Bias Scoring Baseline
The five-axis fingerprints in source_registry.json were
hand-curated, drawing on AllSides and MBFC bias ratings where available.
Per-source provenance (which axis value for which source came from which
input) is not yet machine-traceable as a single artifact — documenting
it is on the methodology-audit roadmap.
The current registry ships 496 deep-fingerprinted sources
with full 5-axis bias data, bloc classification, factuality ratings,
and primary-vs-launderer tags
(data/source_registry.json).
The original 244-source registry was introduced in commit c6a088bf;
the enriched 496-source version merges data from
source_fingerprints.json. Subsequent updates are logged in
pipeline_state.json.
Hand-curation methodology — editorial stance analysis, entity framing
patterns, named-state-alignment patterns, institutional-alignment
signals — is summarized in the methodology audit
(audit/about_audit_2026-05-10.md). Not duplicated here to
avoid drift between the page and the audit.