Reference

Summary table cheat sheet

Source: docs/usage-tables.md

The five summary_*_vals tables in the metrics database are the output of the pipeline and the input to the usage-overview UI. This document is a decoder for what their fields mean.

It is closely adapted from J.M. Sperhac's "Hub usage data overview and table translator" (Jan 2025), which remains the most readable reference for the legacy xlogfix_summary.php that the rewrite ports. The output values are identical — the rowid/colid/period/datetime/value/valfmt shape is preserved bit-for-bit by the rewrite.

Common shape

All summary tables have the same six columns:

( rowid  TINYINT,    -- metric index, table-specific meaning (see below)
  colid  TINYINT,    -- breakdown index (1=total, others=continent/orgtype)
  datetime DATETIME, -- 'YYYY-MM-00 00:00:00' for monthly anchors;
                     -- '0000-00-00 00:00:00' for all-time
  period TINYINT,    -- 0=year, 1=month, 3=quarter, 12=rolling-12mo,
                     -- 13=fiscal year (Oct-Sep), 14=all-time
  value  VARCHAR(200), -- the metric value (numeric-as-string)
  valfmt TINYINT )   -- 1=count, 2=percent, 4=jobs, 5=duration in seconds

The denormalized shape is awkward — the same colid column means "residence" for one breakdown and "organization type" for another — but every reporting tool is built around it. The rewrite preserves this exactly.

Period codes

Period Meaning
0 Calendar year containing datetime
1 Just that month
3 Quarter containing datetime
12 Rolling 12 months ending at datetime
13 Fiscal year (Oct–Sep) containing datetime
14 All time (since 1995-01-01)

Period 14 is the historically-expensive one — see motivations.md for what the rewrite did about it.

colid (breakdown axis)

Same meaning across summary_user_vals and summary_simusage_vals:

colid Meaning
1 Total (always — used by single-value metrics)
2 Residence "identified" — total with known continent
3 Residence — United States
4 Residence — Asia
5 Residence — Europe
6 Residence — Other
7 Organization "identified" — total with known org type
8 Organization — Educational
9 Organization — Industry
10 Organization — Government
11 Organization — Other

summary_misc_vals rows are total-only and all use colid=1.

When valfmt=2 (percent) the value is the column count, and the UI divides by the colid=2 or colid=7 "identified" total to render percentages.

summary_user_vals rowids

User counts. Critical invariant: rowid=1 (total) equals SUM(rowid IN (6, 7, 8)) for every (datetime, period, colid, valfmt) cell. This is asserted by tests/ab/port_invariants/.

rowid Metric colid=1 means Breakdowns
1 Total users (= 6+7+8) All user-IP-host visits Residence, Org
2 Simulation users Distinct sim users Residence, Org
3 Unregistered user IPs (raw) Pre-filter unregistered count Total only
4 Unique download user IPs Pre-filter download count Total only
5 UNUSED
6 Registered users Distinct users in userlogin_lite Residence, Org
7 Unregistered users (visitors) Distinct visitors from websessions (duration ≥ 900s, jobs=0, ip ∉ login_ips) Residence, Org
8 Download users Distinct downloaders (filtered) Residence, Org

summary_simusage_vals rowids

Tool / simulation usage. Different meanings of rowid from summary_user_vals.

rowid Metric valfmt Notes
1 Total simulation users 1 count
2 Simulation jobs 4 jobs COUNT(*) over toolstart with success=1
3 CPU time 5 sec SUM(cputime)
4 Wall time 5 sec SUM(walltime)
5 View time 5 sec SUM(viewtime)
6 Users with ≥10 min CPU 1 count
7 Average jobs per user 1 count Computed from #2/#1
8 Average wall per user 5 sec Computed from #4/#1
9 Repeat users with ≥10 sims 1 count
10 Repeat users >3 months 1 count

summary_misc_vals rowids

Miscellany. All colid=1.

rowid Metric valfmt Notes
1 Domains served 1 count COUNT(DISTINCT domain)
2 Cumulative user sessions 1 count
3 Cumulative session time 5 sec SUM(duration) — NULL is written as empty string (legacy quirk preserved by the port)
4 Visitor count 1 count COUNT(DISTINCT ip, host)
5 Visit count 1 count COUNT(datetime)
6 New user accounts 1 count From jos_xprofiles_metrics.registerDate
7 Max user logins on a day 1 count 'N users on YYYY-MM-DD' — value is a formatted string
8 Web server hits 1 count SUM(hits) from webhits

summary_andmore_vals

Per-resource user counts, written by andmore-usage (the only summary writer that has its own table).

( resid  INT,        -- jos_resources.id (the resource being counted)
  period TINYINT,    -- 1, 12, 14 only — see below
  datetime DATETIME, -- 'YYYY-MM-00 00:00:00'
  users  INT,        -- count of distinct users
  valfmt TINYINT )

andmore-usage only writes periods 1, 12, and 14. It doesn't produce a full period grid like the other summary tables — that's inherited from the legacy xlogfix_andmore_usage.php and preserved by the port.

Hub-side: jos_stats_topvals and jos_resource_stats_tools_topvals

The per-tool ranked toplists, written into the hub DB (not the metrics DB) by gen-tool-toplists / gen-tool-tops.

( id        INT (auto),
  top       TINYINT,     -- which tool metric (see table below)
  datetime  DATETIME,    -- 'YYYY-MM-00 00:00:00'
  period    TINYINT,     -- one of the six period codes
  rank      SMALLINT,    -- 0 = total across all tools; 1, 2, 3 ... ranked tools
  name      VARCHAR(255),-- 'Total Simulation Jobs' at rank=0; '<resid> ~ <title>' otherwise
  value     BIGINT )

The rank=0 row is the special "total across all tools" row for that (top, period, datetime) triple. Reporting UIs (e.g., hub.org/usage/tools/12) display the totals as the headline number and the ranks 1+ as the per-tool list below.

top codes

top Tool metric
2 Number of users
5 Number of jobs
6 Walltime
7 Simulation CPU time
8 Simulation interaction time
10 Number of courses
11 Course user count

(From J.M. Sperhac's "Hub tool stats summarized" reference.)

Reference: domainclass table

The 6-bucket categorization that drives the orgtype breakdown in summary_user_vals colid 8–11 and summary_simusage_vals same. Stored in <hub>_metrics.domainclass:

Class Meaning
0 Unknown (no domain information)
1 Educational institution
2 Industrial / corporate
3 Governmental
4 Internet service provider
5 Search engine
6 Press / media / publication

The mapping is hand-maintained. Most of the entries date to 2015 (see Sperhac's "Organization type and location" reference). Periodic refresh is recommended but rarely happens — adding a new educational, government, or industry domain to this table is what makes that domain show up in the right org-type bucket on the usage page.

How registered vs guest users are classified

Two paths into the colid 2–11 breakdowns:

Registered users (reg_users, rowid=6). Identified by appearing in userlogin_lite (= filtered view of userlogin for login + simulation actions). Org type and residence taken from their jos_xprofiles_metrics profile (orgtype and countryresident columns). If they didn't fill out the profile, they roll up into "unknown" (colid=2 and 7 — i.e., they don't contribute to any of 3–6 or 8–11, only to the colid=1 total).

Guest / unregistered users (int_users, rowid=7; download_users, rowid=8). Identified by (ip, host) in websessions with no matching login row. Org type from domainclass lookup on the resolved domain. Residence from fill-ipcountry's GeoIP, mapped to continent via country_continent.

Some hub deployments are a special case: registered accounts in the single digits (just the maintenance staff), with almost all visitors anonymous. In those cases summary_user_vals is essentially all rowid=7 (unregistered) and rowid=8 (download users), with rowid=6 (registered) nearly zero. Most hubs have a richer registered-user population.

Example queries

These work against the live <hub>_metrics database and are exactly what the usage-overview UI does behind the scenes.

Total user visits, monthly time series:

SELECT datetime, value
FROM summary_user_vals
WHERE rowid = 1     -- total users
  AND colid = 1     -- total breakdown
  AND period = 1    -- monthly
ORDER BY datetime;

All-time totals with continent breakdown for March 2025:

SELECT colid, value, valfmt
FROM summary_user_vals
WHERE rowid = 1
  AND period = 14
  AND datetime = '2025-03-00'
ORDER BY colid;

(colid=3 is the US count, colid=4..6 are Asia/Europe/Other. Any of those that are valfmt=2 should be displayed as value / colid=2_total * 100%, see the legacy default.php view for the exact rendering.)

Tool jobs (period 12 = rolling 12mo):

SELECT datetime, value
FROM summary_simusage_vals
WHERE rowid = 2     -- simulation jobs
  AND colid = 1
  AND period = 12
ORDER BY datetime;

Things that surprise people

  • colid=2 and colid=7 are "identified" rolled-up counts — they represent the population for which residence or org type is known, and act as denominators for the percent breakdowns at colid 3..6 and 8..11.
  • The same colid column number means different things depending on whether you're looking at 3..6 (residence) or 8..11 (organization). Both axes share the same column.
  • summary_misc_vals.rowid=7 stores a formatted string like '42 users on 2025-07-15'. The integer value is parseable from the first token; the date is parseable from the last.
  • summary_misc_vals.rowid=3 (cum session time) writes an empty string when SUM(duration) returns NULL on an empty window — a legacy quirk preserved for bug-for-bug parity (see port_period_sweep in the test harness for the case that surfaced it).
  • userlogin_lite (input to int_users and similar) has no date filter in legacy. Counts derived from it grow monotonically over time independent of the period being summarized. This is a known design issue preserved for parity.
  • summary_andmore_vals only has periods 1/12/14 (not the full 6).
  • summary_andmore_vals.datetime uses 'YYYY-MM-01' (day 01); the other summary tables use 'YYYY-MM-00' (zero day). Caught by an A/B test during the port — the convention difference comes from legacy xlogfix_andmore_usage.php using $processed_on = $dthis_.'-01' explicitly.

Deliberately-preserved legacy parity quirks

Edge cases the rewrite mirrors exactly to keep summary_*_vals bug-for-bug compatible with the legacy pipeline. Each is verified by an A/B test and would FAIL if the new code diverged.

Quirk Where Why preserved
userlogin_lite has no date filter on int_users / download_users queries summary_user_vals rowid=3, rowid=4 Counts inflate across runs; any reporting downstream is already calibrated to this
jos_xprofiles_metrics reflects CURRENT profile state, not state-at-activity-time reg_users / sim_users org-type breakdowns Same reason — the table is fully rebuilt every run; any time-travel rewrite of historic months would shift old breakdowns
summary_misc_vals.rowid=3 writes '' when SUM is NULL empty windows Legacy db_fetch(NULL)dbquote(NULL) → ''
summary_misc_vals.rowid=7 is a formatted string, not a number max-logins-on-day '42 users on 2025-07-15'
summary_andmore_vals.datetime ends in -01, others in -00 andmore-usage Legacy $processed_on = $dthis_.'-01'
Week-chunks start the day BEFORE the month fill-domain, clean-bots, gen-tool-stats Legacy findWeeks() returns [$mon-1d, $mon+1w) for chunk 0
Banker's rounding via FLOOR(x+0.5), not ROUND(x) middleware-wall, middleware-cpu Perl int($x+0.5) is round-half-up; MariaDB ROUND() is banker's
cpu.pl UPDATE-only (never INSERT) middleware-cpu Legacy carries the comment "Do nothing as are just importing CPUtimes"
cpu.pl includes event='[waiting]', wall excludes middleware-cpu Filters differ by metric; preserved per-metric
gridstat is exact-match; hctest% is LIKE Multiple ports So gridstatx is NOT excluded; hctester IS excluded
download_users rowid=4 uses a DIFFERENT WHERE filter than rowid=8 summary_user_vals rowid=4 has no login_ips exclusion and no duration < 900 cap
Last in-flight session of a logfix-session run is never emitted websessions Coalescer flushes only on session-end-detected; tail of run has none
INSERT IGNORE on userlogin collapses duplicate (datetime, user, ip, action) tuples import-auth Legacy uses INSERT IGNORE; we preserve same semantics
Floats stringified before binding into SQL to hit half-away-from-zero rounding gen-tool-stats propagate Otherwise pymysql's numeric-literal path hits banker's rounding

All preserved quirks are documented in commit messages of the form A/B test: <port> — caught … divergence and the test suite would break if any port regressed against them.