Summary table cheat sheet
Source: docs/usage-tables.md
The five summary_*_vals tables in the metrics database are the
output of the pipeline and the input to the usage-overview UI. This
document is a decoder for what their fields mean.
It is closely adapted from J.M. Sperhac's
"Hub usage data overview and table translator" (Jan 2025), which
remains the most readable reference for the legacy xlogfix_summary.php
that the rewrite ports. The output values are identical — the
rowid/colid/period/datetime/value/valfmt shape is preserved bit-for-bit
by the rewrite.
Common shape
All summary tables have the same six columns:
( rowid TINYINT, -- metric index, table-specific meaning (see below)
colid TINYINT, -- breakdown index (1=total, others=continent/orgtype)
datetime DATETIME, -- 'YYYY-MM-00 00:00:00' for monthly anchors;
-- '0000-00-00 00:00:00' for all-time
period TINYINT, -- 0=year, 1=month, 3=quarter, 12=rolling-12mo,
-- 13=fiscal year (Oct-Sep), 14=all-time
value VARCHAR(200), -- the metric value (numeric-as-string)
valfmt TINYINT ) -- 1=count, 2=percent, 4=jobs, 5=duration in seconds
The denormalized shape is awkward — the same colid column means
"residence" for one breakdown and "organization type" for another —
but every reporting tool is built around it. The rewrite preserves
this exactly.
Period codes
| Period | Meaning |
|---|---|
0 |
Calendar year containing datetime |
1 |
Just that month |
3 |
Quarter containing datetime |
12 |
Rolling 12 months ending at datetime |
13 |
Fiscal year (Oct–Sep) containing datetime |
14 |
All time (since 1995-01-01) |
Period 14 is the historically-expensive one — see motivations.md for what the rewrite did about it.
colid (breakdown axis)
Same meaning across summary_user_vals and summary_simusage_vals:
| colid | Meaning |
|---|---|
1 |
Total (always — used by single-value metrics) |
2 |
Residence "identified" — total with known continent |
3 |
Residence — United States |
4 |
Residence — Asia |
5 |
Residence — Europe |
6 |
Residence — Other |
7 |
Organization "identified" — total with known org type |
8 |
Organization — Educational |
9 |
Organization — Industry |
10 |
Organization — Government |
11 |
Organization — Other |
summary_misc_vals rows are total-only and all use colid=1.
When valfmt=2 (percent) the value is the column count, and the UI
divides by the colid=2 or colid=7 "identified" total to render
percentages.
summary_user_vals rowids
User counts. Critical invariant: rowid=1 (total) equals
SUM(rowid IN (6, 7, 8)) for every (datetime, period, colid, valfmt) cell. This is asserted by tests/ab/port_invariants/.
| rowid | Metric | colid=1 means | Breakdowns |
|---|---|---|---|
1 |
Total users (= 6+7+8) | All user-IP-host visits | Residence, Org |
2 |
Simulation users | Distinct sim users | Residence, Org |
3 |
Unregistered user IPs (raw) | Pre-filter unregistered count | Total only |
4 |
Unique download user IPs | Pre-filter download count | Total only |
5 |
UNUSED | — | — |
6 |
Registered users | Distinct users in userlogin_lite |
Residence, Org |
7 |
Unregistered users (visitors) | Distinct visitors from websessions (duration ≥ 900s, jobs=0, ip ∉ login_ips) |
Residence, Org |
8 |
Download users | Distinct downloaders (filtered) | Residence, Org |
summary_simusage_vals rowids
Tool / simulation usage. Different meanings of rowid from
summary_user_vals.
| rowid | Metric | valfmt | Notes |
|---|---|---|---|
1 |
Total simulation users | 1 count | — |
2 |
Simulation jobs | 4 jobs | COUNT(*) over toolstart with success=1 |
3 |
CPU time | 5 sec | SUM(cputime) |
4 |
Wall time | 5 sec | SUM(walltime) |
5 |
View time | 5 sec | SUM(viewtime) |
6 |
Users with ≥10 min CPU | 1 count | — |
7 |
Average jobs per user | 1 count | Computed from #2/#1 |
8 |
Average wall per user | 5 sec | Computed from #4/#1 |
9 |
Repeat users with ≥10 sims | 1 count | — |
10 |
Repeat users >3 months | 1 count | — |
summary_misc_vals rowids
Miscellany. All colid=1.
| rowid | Metric | valfmt | Notes |
|---|---|---|---|
1 |
Domains served | 1 count | COUNT(DISTINCT domain) |
2 |
Cumulative user sessions | 1 count | — |
3 |
Cumulative session time | 5 sec | SUM(duration) — NULL is written as empty string (legacy quirk preserved by the port) |
4 |
Visitor count | 1 count | COUNT(DISTINCT ip, host) |
5 |
Visit count | 1 count | COUNT(datetime) |
6 |
New user accounts | 1 count | From jos_xprofiles_metrics.registerDate |
7 |
Max user logins on a day | 1 count | 'N users on YYYY-MM-DD' — value is a formatted string |
8 |
Web server hits | 1 count | SUM(hits) from webhits |
summary_andmore_vals
Per-resource user counts, written by andmore-usage (the only
summary writer that has its own table).
( resid INT, -- jos_resources.id (the resource being counted)
period TINYINT, -- 1, 12, 14 only — see below
datetime DATETIME, -- 'YYYY-MM-00 00:00:00'
users INT, -- count of distinct users
valfmt TINYINT )
andmore-usage only writes periods 1, 12, and 14. It doesn't
produce a full period grid like the other summary tables — that's
inherited from the legacy xlogfix_andmore_usage.php and preserved
by the port.
Hub-side: jos_stats_topvals and jos_resource_stats_tools_topvals
The per-tool ranked toplists, written into the hub DB (not the
metrics DB) by gen-tool-toplists / gen-tool-tops.
( id INT (auto),
top TINYINT, -- which tool metric (see table below)
datetime DATETIME, -- 'YYYY-MM-00 00:00:00'
period TINYINT, -- one of the six period codes
rank SMALLINT, -- 0 = total across all tools; 1, 2, 3 ... ranked tools
name VARCHAR(255),-- 'Total Simulation Jobs' at rank=0; '<resid> ~ <title>' otherwise
value BIGINT )
The rank=0 row is the special "total across all tools" row for
that (top, period, datetime) triple. Reporting UIs (e.g.,
hub.org/usage/tools/12) display the totals as the headline number
and the ranks 1+ as the per-tool list below.
top codes
top |
Tool metric |
|---|---|
2 |
Number of users |
5 |
Number of jobs |
6 |
Walltime |
7 |
Simulation CPU time |
8 |
Simulation interaction time |
10 |
Number of courses |
11 |
Course user count |
(From J.M. Sperhac's "Hub tool stats summarized" reference.)
Reference: domainclass table
The 6-bucket categorization that drives the orgtype breakdown in
summary_user_vals colid 8–11 and summary_simusage_vals same.
Stored in <hub>_metrics.domainclass:
| Class | Meaning |
|---|---|
0 |
Unknown (no domain information) |
1 |
Educational institution |
2 |
Industrial / corporate |
3 |
Governmental |
4 |
Internet service provider |
5 |
Search engine |
6 |
Press / media / publication |
The mapping is hand-maintained. Most of the entries date to 2015 (see Sperhac's "Organization type and location" reference). Periodic refresh is recommended but rarely happens — adding a new educational, government, or industry domain to this table is what makes that domain show up in the right org-type bucket on the usage page.
How registered vs guest users are classified
Two paths into the colid 2–11 breakdowns:
Registered users (reg_users, rowid=6). Identified by
appearing in userlogin_lite (= filtered view of userlogin for
login + simulation actions). Org type and residence taken from
their jos_xprofiles_metrics profile (orgtype and
countryresident columns). If they didn't fill out the profile,
they roll up into "unknown" (colid=2 and 7 — i.e., they don't
contribute to any of 3–6 or 8–11, only to the colid=1 total).
Guest / unregistered users (int_users, rowid=7;
download_users, rowid=8). Identified by (ip, host) in
websessions with no matching login row. Org type from
domainclass lookup on the resolved domain. Residence from
fill-ipcountry's GeoIP, mapped to continent via country_continent.
Some hub deployments are a special case: registered accounts in the single digits (just the maintenance staff), with almost all visitors anonymous. In those cases summary_user_vals is essentially all rowid=7 (unregistered) and rowid=8 (download users), with rowid=6 (registered) nearly zero. Most hubs have a richer registered-user population.
Example queries
These work against the live <hub>_metrics database and are exactly
what the usage-overview UI does behind the scenes.
Total user visits, monthly time series:
SELECT datetime, value
FROM summary_user_vals
WHERE rowid = 1 -- total users
AND colid = 1 -- total breakdown
AND period = 1 -- monthly
ORDER BY datetime;
All-time totals with continent breakdown for March 2025:
SELECT colid, value, valfmt
FROM summary_user_vals
WHERE rowid = 1
AND period = 14
AND datetime = '2025-03-00'
ORDER BY colid;
(colid=3 is the US count, colid=4..6 are Asia/Europe/Other. Any
of those that are valfmt=2 should be displayed as value / colid=2_total * 100%, see the legacy default.php view for the
exact rendering.)
Tool jobs (period 12 = rolling 12mo):
SELECT datetime, value
FROM summary_simusage_vals
WHERE rowid = 2 -- simulation jobs
AND colid = 1
AND period = 12
ORDER BY datetime;
Things that surprise people
colid=2andcolid=7are "identified" rolled-up counts — they represent the population for which residence or org type is known, and act as denominators for the percent breakdowns atcolid 3..6and8..11.- The same
colidcolumn number means different things depending on whether you're looking at3..6(residence) or8..11(organization). Both axes share the same column. summary_misc_vals.rowid=7stores a formatted string like'42 users on 2025-07-15'. The integer value is parseable from the first token; the date is parseable from the last.summary_misc_vals.rowid=3(cum session time) writes an empty string whenSUM(duration)returns NULL on an empty window — a legacy quirk preserved for bug-for-bug parity (seeport_period_sweepin the test harness for the case that surfaced it).userlogin_lite(input toint_usersand similar) has no date filter in legacy. Counts derived from it grow monotonically over time independent of the period being summarized. This is a known design issue preserved for parity.summary_andmore_valsonly has periods 1/12/14 (not the full 6).summary_andmore_vals.datetimeuses'YYYY-MM-01'(day 01); the other summary tables use'YYYY-MM-00'(zero day). Caught by an A/B test during the port — the convention difference comes from legacyxlogfix_andmore_usage.phpusing$processed_on = $dthis_.'-01'explicitly.
Deliberately-preserved legacy parity quirks
Edge cases the rewrite mirrors exactly to keep summary_*_vals
bug-for-bug compatible with the legacy pipeline. Each is verified
by an A/B test and would FAIL if the new code diverged.
| Quirk | Where | Why preserved |
|---|---|---|
userlogin_lite has no date filter on int_users / download_users queries |
summary_user_vals rowid=3, rowid=4 |
Counts inflate across runs; any reporting downstream is already calibrated to this |
jos_xprofiles_metrics reflects CURRENT profile state, not state-at-activity-time |
reg_users / sim_users org-type breakdowns |
Same reason — the table is fully rebuilt every run; any time-travel rewrite of historic months would shift old breakdowns |
summary_misc_vals.rowid=3 writes '' when SUM is NULL |
empty windows | Legacy db_fetch(NULL) → dbquote(NULL) → '' |
summary_misc_vals.rowid=7 is a formatted string, not a number |
max-logins-on-day | '42 users on 2025-07-15' |
summary_andmore_vals.datetime ends in -01, others in -00 |
andmore-usage |
Legacy $processed_on = $dthis_.'-01' |
| Week-chunks start the day BEFORE the month | fill-domain, clean-bots, gen-tool-stats |
Legacy findWeeks() returns [$mon-1d, $mon+1w) for chunk 0 |
Banker's rounding via FLOOR(x+0.5), not ROUND(x) |
middleware-wall, middleware-cpu |
Perl int($x+0.5) is round-half-up; MariaDB ROUND() is banker's |
cpu.pl UPDATE-only (never INSERT) |
middleware-cpu |
Legacy carries the comment "Do nothing as are just importing CPUtimes" |
cpu.pl includes event='[waiting]', wall excludes |
middleware-cpu |
Filters differ by metric; preserved per-metric |
gridstat is exact-match; hctest% is LIKE |
Multiple ports | So gridstatx is NOT excluded; hctester IS excluded |
download_users rowid=4 uses a DIFFERENT WHERE filter than rowid=8 |
summary_user_vals |
rowid=4 has no login_ips exclusion and no duration < 900 cap |
Last in-flight session of a logfix-session run is never emitted |
websessions |
Coalescer flushes only on session-end-detected; tail of run has none |
INSERT IGNORE on userlogin collapses duplicate (datetime, user, ip, action) tuples |
import-auth |
Legacy uses INSERT IGNORE; we preserve same semantics |
| Floats stringified before binding into SQL to hit half-away-from-zero rounding | gen-tool-stats propagate |
Otherwise pymysql's numeric-literal path hits banker's rounding |
All preserved quirks are documented in commit messages of the form
A/B test: <port> — caught … divergence and the test suite would
break if any port regressed against them.