Testing
Source: docs/testing.md
Brief summary of the A/B test harness under tests/ab/. Detail
intentionally not duplicated from individual port test READMEs — read
tests/ab/run-all.sh and the per-port run.sh files for the source
of truth.
Two test modes
The harness can run in two modes:
- A/B mode (
tests/ab/run-all.sh) — runs each port's legacy PHP/Perl/Bash script and the newhzmetrics.pyequivalent side by side, diffs every output table. Requirestests/legacy/to be present. - Golden mode (
tests/ab/run-all-golden.sh) — runs only the new code, diffs against a frozen snapshot of legacy output captured at parity time (tests/ab/port_*/golden/*.tsv). Does not requiretests/legacy/. Simulates the world where the legacy reference has been removed.
Both modes produce the same pass/fail outcome on a current codebase. Both modes pass in the current tree.
What's tested
26 test directories under tests/ab/port_*:
Per-port A/B (16): port_andmore_usage, port_clean_bots,
port_fill_domain, port_fill_ipcountry, port_fill_user_info,
port_gen_tool_stats, port_gen_tool_toplists,
port_gen_tool_tops, port_identify_bots, port_import_apache,
port_import_auth, port_import_hub_data, port_import_webhits,
port_logfix_session, port_middleware, port_whoisonline.
Integration (2): port_pipeline (full analyze + summarize chain
on synthetic data), port_realdata (same chain on a captured
production-data slice — gated by snapshot presence).
Coverage tests (3): port_summarize_month (the most
metric-dense single port), port_period_sweep (24 anchor-port
combinations exercising period boundary arithmetic),
port_invariants (cross-table rules like
summary_user_vals[rowid=1] = SUM([6,7,8])).
Defensive tests (5): port_fuzz (4 fuzz harnesses with 2000+
randomized cases each), port_idempotency (re-runs analyze+summarize
on the same DB), port_dryrun (every --dry-run writes zero rows),
port_empty_input (each port no-ops cleanly on empty input),
port_determinism (two fresh-DB runs are byte-identical).
Running
# Bootstrap once per host (creates test DBs, loads reference data)
tests/ab/setup_test_dbs.sh --bootstrap
# Run the full A/B suite
tests/ab/run-all.sh
# Or the golden-mode round (no legacy needed)
tests/ab/run-all-golden.sh
# Run a single port
tests/ab/port_fill_domain/run.sh
tests/ab/port_fill_domain/run_golden.sh
setup_test_dbs.sh --reset truncates everything and reloads
reference data — used between tests.
When the harness catches things
Real bugs surfaced during the port, with their commit messages preserved in the log for reference:
fill-domainday-before-month-start — legacyfindWeeks()starts a week-chunk on the day BEFORE the month begins (so2025-06-30 23:59:00belongs to July 2025's first chunk). Caught byport_fill_domain; commitA/B test: fill-domain — caught & fixed day-before-month-start divergence.xlogfix_middleware_cpu.pl— four real divergences in one commit (A/B test: middleware-{wall,cpu} — caught three real divergences):- MariaDB
ROUND()is banker's rounding, Perlint($x + 0.5)is round-half-up → fixed toFLOOR(x + 0.5). cpu.plonly UPDATEs existing toolstart rows, never INSERTs (the wall version does both).cpu.pl's UPDATE check is<= 0(includescputime=0), not< 0.cpu.pldoes not filterjoblog.event = '[waiting]'; wall does. Caught when both ports were initially symmetric.
- MariaDB
andmore-usagedatetime suffix — legacy stores'-01', summarize uses'-00'; new port was using'-00'for both. CommitA/B test: andmore-usage — caught datetime suffix divergence.logfix-sessioncross-week state — Perl declares session state vars at script scope, so an in-flight session persists across the 4 week-chunks of a month. The Python port initially reset state per chunk. CommitA/B test: logfix-session — caught cross-week state divergence.summarize-monthreg_users col=1 missing-JOIN — legacy queriesuserlogin_litedirectly (no JOIN) for col=1; my port was unconditionally joiningjos_xprofiles_metricsfor every col, so whenxprofiles_metricsis empty it under-counted. CommitA/B test: summarize-month — caught reg_users col=1 missing-JOIN divergence.import-authbracket-strip —[user[sub]]should produceuser, notuser[sub]. PHPltrim($x, '[') + rtrim($x, ']')use charlist semantics (strip ALL leading[and trailing]); the port's regex was capturing the inner-bracketed content literally. CommitA/B: deepen 5 fixtures — caught import-auth bracket-strip bug.gen-tool-statsfloat→int rounding — Python float bound as numeric literal hits MariaDB's banker's rounding; PHP stringifies first and hits half-away-from-zero.488.5 → 488vs488.5 → 489. Fix: stringify floats before binding. CommitA/B: deepen 5 more fixtures, caught gen-tool-stats float→int rounding bug.download_usersrowid=4 vs rowid=8 filter mismatch — the two rowids use DIFFERENT WHERE filters in legacy (rowid=4 doesn't excludelogin_ipsor capduration < 900), but my port was reusingdl_users_period_tmpbuilt for rowid=8. Caught by deepeningport_summarize_month's fixture with a registered-user downloader. CommitA/B: deepen summarize-month, caught download_users rowid=4 filter mismatch.summary_misc_valsrowid=3 NULL handling —SUM(duration)returns NULL on an empty period; legacydb_fetchreturns NULL →dbquote(NULL)writes empty string; the port coerced to0. Caught byport_period_sweepat anchor months with no data. CommitA/B: period sweep test + fix misc_usage NULL → empty- string parity.
Plus the "A/B re-baseline" commit (Roll back dnload-at-import and action-filter from hzmetrics.py) — the most important harness
catch. An initial legacy snapshot included two post-aa245f7
behaviors that had been absorbed into the new port: import-apache
setting dnload=1 inline, and import-auth filtering
action IN ('login','simulation') at insert time. Re-baselining
the harness against the true pre-refactor snapshot revealed that
the port had unintentionally inherited those changes; both got
rolled back. This is the divergence the docs talk about under
"bug-for-bug parity is hard to verify when your baseline is wrong."
Documented in commit history under A/B test: <port> — caught …
and A/B: … messages.
What can't be tested locally
port_realdata requires a captured production-data snapshot
(tests/ab/port_realdata/snapshot/*.sql.gz). The snapshot directory
is gitignored because the raw data contains real usernames, emails,
and IPs; the test skips gracefully when the snapshot isn't present.
See tests/ab/port_realdata/capture.sh for how to capture one when
you have read access to a production database.
Some tests touch network resources (fill-ipcountry hits
help.hubzero.org/ipinfo/v1, resolve-dns uses the local resolver
which forwards out). These work fine offline against the cached
results in tests/ab/fixtures/, but require network for fresh data.