Documentation
Policy
Allow lists, rate limits, block paths, robots.txt, the trigger families, and flag triggers.
Policy
On this page
- Allow list — verified crawlers
- Verified vs fake
- Rate limits and block paths
- Repeated-429 escalation
- Path-pattern semantics
- Robots.txt enforcement
- Triggers — predicate-action engine
- Shared action keys
- Path triggers
- Cookie triggers
- Env triggers
- Feedback triggers
- Load triggers
- Flag-trigger family
- Compiled-in defaults
- Per-Apache-scope triggers — `BotShieldTrigger`
- Safeguard
- Where to next
mod_botshield decides which requests deserve friction by composing six policy families on top of the score-driven tier ladder. This page covers each family's directive, predicate shape, side effects, and where it sits in the runtime walk.
The runtime order (one pass per request, first short-circuit wins):
- Cookie triggers — pre-handler state. Cookie family accumulates pass-with-credit across multiple matches.
- Env triggers — predicate on Apache environment
variables (
SetEnvIfExpr, mod_rewrite[E=…]). First-match wins; gated onap_is_initial_reqso internal-redirect legs don't double-apply. - Load triggers — predicate on the global load_state sampled by the load watchdog.
- Path triggers — predicate on the request URI.
- Block-path — cohort + path-glob → 403.
- Robots.txt Disallow — RFC 9309 matcher → 403.
- Rate-limit — cohort + budget → 429 with Retry-After.
- Robots.txt Crawl-delay — per-group rate cap.
The policy walk runs before the built-in heuristics, so a
matching policy rule can short-circuit the request even when the
client also has a valid _bs_session cookie. Allow-list checks
and built-in heuristics (missing-UA, missing-Accept-Language,
scraper-pattern UA) run after the policy walk if the walk
returns OK; flag-trigger effects are applied last, against the
IP's accumulated flag bitmap.
Allow list — verified crawlers
BotShieldAllowBot registers a UA pattern + IP-range pair that
the allow-list classifier checks during the heuristics phase.
Verified crawlers (UA matches AND IP is in the published range)
get a hard pass — they bypass the score ladder entirely.
BotShieldAllowVerifiedBots on
BotShieldAllowBot googlebot "Googlebot/" /var/lib/botshield/bots/googlebot.txt
BotShieldAllowBot bingbot "bingbot/" /var/lib/botshield/bots/bingbot.txt
BotShieldAllowBot internal-monitor "MonitorBot/" 10.0.0.0/8,2001:db8::/32
The third arg is shape-inspected — no separate flag or sentinel:
- starts with
/→ absolute path to a CIDR file (one CIDR per line,#comments, blank lines OK, IPv4 + IPv6). - contains
/or:→ comma-separated inline CIDRs. *alone → UA-match only; no IP check. Logged with reasonallow-bot-ua:<name>instead ofallow-bot:<name>.- omitted → default file path
/var/lib/botshield/bots/<name>.txt.
The CIDR file is read once at config-parse time (size cap 1 MiB) and cached on the per-server config. Refresh with a reload.
A built-in seed list covers Googlebot, Bingbot, Applebot, Yandex,
DuckDuckBot, and a handful of others — all installed under
/var/lib/botshield/bots/. The tools/refresh-bot-ranges.sh script
fetches each provider's published JSON and rewrites the CIDR files
in place.
Verified vs fake
A request whose UA matches a registered bot pattern is one of three states:
- verified — UA matches, IP is in the range. Hard pass with
reason
verified-<name>. - fake — UA matches, IP is NOT in the range. Strong penalty
with reason
fake-<name>. Fake-bot detection is one of the most reliable signals — bot operators love claiming Googlebot. - unverified — UA matches, classifier hit but ranges aren't
loaded for this name. Logged with reason
bot-unverifiedfor visibility, no score effect.
Rate limits and block paths
BotShieldRateLimit caps requests-per-window for a cohort. Hits
return 429 with Retry-After and add 50 to the score. Cohorts pair
a UA-substring matcher with an IP spec:
BotShieldRateLimit api-burst 60 min "" 10.0.0.0/8,2001:db8::/48
BotShieldRateLimit scrapers 10 min "wget|curl|python" *
Args: <name> <budget> <per> <ua-pattern> <ipspec>.
<budget>requests are allowed per<per>(fixed-window counter, atomic CAS-updated SHM slot).<per>acceptssec/min/hour(ors/m/h) — never a bare integer; the parser rejects plain numbers.<ua-pattern>is a substring or""for "any UA".<ipspec>is the same shape asBotShieldAllowBot— a path to a CIDR file, comma-separated inline CIDRs, or*for "any IP".
Not both axes can be "" / * — that would rate-limit every
request, rejected at config time.
BotShieldBlockPath is the same cohort + a path glob → 403:
BotShieldBlockPath legacy-admin "/wp-admin/*" "" *
BotShieldBlockPath aggressive-scraper "/" "AhrefsBot|SEMrushBot" *
Args: <name> <path-glob> <ua-pattern> <ipspec>.
Repeated-429 escalation
BotShieldRateLimitEscalate upgrades a rule that's already been
firing — repeated 429s on the same IP escalate to 403 (or any
configurable status):
BotShieldRateLimitEscalate api-burst 5 min status=403 ttl=3600
Args: <rate-rule> <strikes> <per> [status=N] [ttl=N]. <per>
accepts sec/min/hour (same as BotShieldRateLimit). If a
rate-limited cohort triggers <strikes> 429s within the window,
the IP is upgraded to the configured status for ttl seconds
(lives in the strike SHM table). The original rate-limit rule
still runs; the escalation is a separate decision applied on top.
Path-pattern semantics
Path globs use a single * wildcard at the trailing edge. A
non-trailing * (e.g. /api/*/v2/) emits a NOTICE at config-parse
time — the v1 matcher would have treated the inner * as a
literal byte; the current matcher follows RFC 9309's leftmost
greedy semantics. The NOTICE warns that intent may have
shifted; existing configs aren't broken, just verified.
Robots.txt enforcement
BotShieldRobotsTxt plugs in a parsed RFC 9309 robots.txt file as
a policy source. Disallow rules become block-path:robots:<group>
matches; Crawl-delay rules become per-group rate limits.
BotShieldRobotsTxt /etc/botshield/robots.txt
BotShieldRobotsRefreshInterval 60
BotShieldRobotsWildcardScope heuristic
Args:
BotShieldRobotsTxt <path>— path to the robots.txt file. A background watchdog re-parses on mtime change.BotShieldRobotsRefreshInterval <sec>— how often the watchdog checks mtime. Default 60. Set to 0 to disable hot-reload (mtime change won't be picked up until next restart).BotShieldRobotsWildcardScope <mode>— how strict the matcher is onUser-agent: *rules:heuristic(default): wildcard rules apply only when no more specific group matches. Closest to your intent — a*block doesn't override a tighter Googlebot allow.strict: RFC-9309 strict semantics. Wildcard rules participate in matching like any other group. May produce surprising overrides when wildcard and named groups conflict.off: ignore wildcard groups entirely. Only named-group rules apply.
Group iteration is exposed at <prefix>/policy-status for
inspection (see observability).
Triggers — predicate-action engine
Five trigger families share one config-time action engine and one
request-time executor. Each family differs only in its predicate;
they all funnel through the same bs_trigger_action struct and
the same shared action keys.
| Family | Directive | Predicate |
|---|---|---|
| Path | BotShieldPathTrigger |
URI glob |
| Cookie | BotShieldCookieTrigger |
Cookie name + value (or bulk shape) |
| Env | BotShieldEnvTrigger |
Apache env var |
| Feedback | BotShieldFeedbackTrigger |
App-emitted event name (response path) |
| Load | BotShieldLoadTrigger |
Global load_state |
Shared action keys
Every family parses <predicate-args> <action-key>=<value>.... The
action keys are:
| Key | Effect |
|---|---|
status=<code> |
HTTP status to return. pass lets the request continue (cookie/env families accumulate; path family declines to real handler) |
redirect=<url> |
Send an HTTP redirect with the chosen status (default 302) |
log=<tag> |
Stash a tag in r->notes for the access log (%{BS-…}n) and the decision-log line |
flag=<name> |
Add a flag bit on the IP's flagged-IP entry (e.g. flag=honeypot_hit) |
ttl=<sec> |
TTL on the flag-IP entry. Required when flag= is set |
penalty=N |
Add N to the request score |
credit=N |
Subtract N from the request score (rejected on the path family — paths can't credit) |
mode=observe |
Per-rule observe mode: predicate evaluates, side-effects suppressed. See staging |
Path triggers
BotShieldPathTrigger admin-honeypot "/admin/.env" \
status=403 flag=honeypot_hit ttl=3600 log=admin-trap
BotShieldPathTrigger api-burst-trap "/api/*/burst" \
penalty=30 log=api-burst
First-match wins (declaration order). On match, the path family's
status=pass short-circuits to DECLINED (real handler runs); any
other status is the response code.
Cookie triggers
BotShieldCookieTrigger session-active cookie=sessionid \
status=pass credit=10
BotShieldCookieTrigger weak-session cookie=sessionid=guest \
penalty=15 log=guest-session
BotShieldCookieTrigger no-cookies cookies=none \
penalty=5 log=cookieless
Predicate shapes:
cookie=<name>— named cookie present (any value).!cookie=<name>— named cookie absent.cookie=<name>=<value>— exact value match.cookie=<name>!<value>— value mismatch.cookie=<name>~<substring>— value contains substring.cookies=none/cookies=any— bulk: empty cookie map / any cookie set.cookies=session— bulk: any of the names declared viaBotShieldSessionCookieNameis set.bs-cookie=verified/bs-cookie=missing/bs-cookie=invalid— the BotShield-cookie-state note set bybs_handler(no double HMAC check). Predicates against the module's own_bs_sessioncookie name are rejected — use these instead.
Cookie family accumulates: status=pass keeps walking and
collecting credits/penalties from later cookie triggers. First
non-pass status short-circuits.
Env triggers
SetEnvIfExpr "%{HTTP:CF-Connecting-IP} =~ /:/" BS_IPV6=1
BotShieldEnvTrigger ipv6-hint env=BS_IPV6 status=pass credit=2
SetEnvIf User-Agent "(?i)\bcurl\b" BS_CLI=1
BotShieldEnvTrigger curl-hint env=BS_CLI penalty=10 log=cli
Predicate shapes:
env=<name>— env var present (any value, including empty).!env=<name>— env var absent.env=<name>=<value>— exact value match.
Narrower than cookie by design — no substring/contains shape and
no bulk-state analog. If you need rich matching, set a
coarse bucket upstream (SetEnvIfExpr, ModSecurity rule, etc.)
and consume the bucket here. redirect= is not a valid action
key on env or load triggers.
Env triggers gate on ap_is_initial_req(r) to prevent double-
application on internal redirect legs (ErrorDocument, RewriteRule
without R). The env producer would otherwise fire a second time
and double-count score/flag.
Feedback triggers
App emits a response header X-BotShield-Feedback: event=<name>;sig=<hmac>;
the module verifies the HMAC and looks up the event name in the
configured feedback-trigger table:
BotShieldAppFeedback on
BotShieldAppIntegrationSecretFile /etc/botshield/app-integration-secret
BotShieldFeedbackTrigger scanner-hit flag=honeypot_hit ttl=3600 log=app-trap
BotShieldFeedbackTrigger human-pass flag=app_verified_human ttl=3600
The event-name → action indirection is the security property: a compromised app can emit any event name, but only configured mappings reach module memory. Wire format details and signing are covered in captcha.
Feedback runs on the response path but its side effect is
future-request state (the flagged-IP write). Both
BotShieldEnabled LogOnly and per-trigger mode=observe apply —
either gates the filter into logging feedback-trigger:<event>: observe and skipping the SHM mutation. See
staging.
Load triggers
BotShieldLoadStateFile /run/botshield/load-state
BotShieldLoadRefreshInterval 1
BotShieldLoadWarmThreshold 65
BotShieldLoadHotThreshold 85
BotShieldLoadTrigger be-strict state>=warm penalty=20 log=brownout
BotShieldLoadTrigger drop-noise state=hot status=503 log=hot-shed
Predicates: state=<name> (exact match) or state>=<name> (at
least). State names: normal, warm, hot. Hysteresis settles
the state machine over a few sample ticks before rules fire — a
single load spike doesn't flip the global state.
Load state is sampled from the Apache scoreboard plus an optional external state file (set by an out-of-band collector — e.g. collectd writing a single-word state every second). The external file lets you key load decisions on whatever metric makes sense for your deployment, not just Apache's busy-worker count.
Flag-trigger family
Flag triggers map flag bits → actions, applied after the policy
walk against the IP's accumulated flag bitmap (IP-side via
flagged-IP table + cookie-side via prior _bs_session). They
have a different action surface than the five trigger families
above:
BotShieldFlagTrigger honeypot_hit action=tier_floor min=captcha
BotShieldFlagTrigger honeypot_hit action=score add=60
BotShieldFlagTrigger app_verified_human action=score add=-80
Args: <flag> [reset] [action=<verb> args...]. The first arg names
a flag bit (honeypot_hit, scanner_probe, fake_bot,
pow_fail_streak, app_verified_human, app_verified_session,
app_trust_signal); each invocation appends one trigger entry for
that bit.
Two action verbs:
action=score add=N— accumulate signed N into the request's score (positive penalty / negative credit). SUM accumulates across triggers.action=tier_floor min=<tier>— set a minimum tier;<tier>ispass/silent/form/captcha. MAX accumulates (strictest wins).
The reset keyword is directive-level (not an action verb): a
line of the form BotShieldFlagTrigger <flag> reset clears every
prior trigger (compiled-in default + earlier declarations)
for that flag at post-config time. reset may appear with or
without a trailing action=...:
# Wipe defaults, install only one tier-floor rule:
BotShieldFlagTrigger honeypot_hit reset action=tier_floor min=form
# Disarm a flag entirely:
BotShieldFlagTrigger pow_fail_streak reset
Compiled-in defaults
mod_botshield seeds the flag-trigger table at config-parse time with sensible defaults so a fresh install gets honeypot / fake-bot detection without any additional config. Each detection-signal flag is seeded as paired score + tier_floor rows:
| Flag | Default action |
|---|---|
honeypot_hit |
score add=+60, tier_floor min=captcha |
fake_bot |
score add=+80, tier_floor min=captcha |
scanner_probe |
score add=+50, tier_floor min=form |
pow_fail_streak |
score add=+30, tier_floor min=silent |
app_verified_human |
score add=-80 |
app_verified_session |
score add=-40 |
app_trust_signal |
score add=-20 |
Trust signals (credits) are score-only by design; no credit ever forces tier down. A verified-human flag can't unlock a request that already tripped a different tier_floor.
Configured BotShieldFlagTrigger directives override the
defaults for the matching flag bit + action verb pair (later
declarations win, same as every other trigger family).
Per-Apache-scope triggers — BotShieldTrigger
For everything else you want to do at a specific Apache scope — flag the IP, add a penalty, return a status, observe in log-only mode — there's a single per-scope directive:
<Location "/admin/.env">
BotShieldTrigger flag=honeypot_hit ttl=3600 log=admin-trap
</Location>
<LocationMatch "(?i)/wp-(login|admin)">
BotShieldTrigger flag=scanner_probe ttl=3600 penalty=20 log=wp-trap
</LocationMatch>
<Files "*.php">
<If "%{REQUEST_URI} =~ m#/uploads/#">
BotShieldTrigger status=403 log=php-in-uploads
</If>
</Files>
The Apache scope match IS the predicate — no separate path glob,
because Apache already evaluated the scope. Action keys mirror the
cookie family: status / redirect / log / flag / ttl /
penalty / credit / mode. Multiple BotShieldTrigger lines
in one scope each append a separate action; they all fire on a
pass, the first non-pass status short-circuits.
BotShieldTrigger works in any Apache container the parser
accepts (server config, <VirtualHost>, <Directory>,
<Location>, <LocationMatch>, <Files>, <If>, etc.) — the
same set that Require and Header work in. This is the
recommended way to express anything <Location>-shaped.
Reset semantics — opting out of inherited triggers
By default a child scope inherits its parent's BotShieldTrigger
list and concatenates its own. To drop the inherited list,
declare BotShieldTrigger reset in the child:
<Location "/api">
BotShieldTrigger penalty=10 log=api-tax
</Location>
<Location "/api/health">
BotShieldTrigger reset
</Location>
<Location "/api/internal">
BotShieldTrigger reset
BotShieldTrigger status=pass log=internal-allow
</Location>
reset as the first arg drops triggers inherited from outer
scopes (and clears any earlier BotShieldTrigger entries
appended in the same scope before the reset).
Safeguard
The safeguard suppresses a challenge loop: a client that has been
issued challenges repeatedly within the safeguard window without
ever returning a verified cookie gets a 302 redirect
(tier=safeguard outcome=redirect) to a configured
BotShieldSafeguardRedirectURL or to the built-in explainer at
<BotShieldEndpointPrefix>/safeguard-info. The original URI is
appended as ?return=<urlencoded path>. The per-IP counter clears
on redirect so a fresh failure cycle starts after the client
engages with the redirect target.
BotShieldSafeguard on
BotShieldSafeguardThreshold 5
BotShieldSafeguardWindow 600
BotShieldSafeguardTTL 900
# Optional. When unset, the redirect points at
# /botshield/safeguard-info (the module's built-in explainer).
BotShieldSafeguardRedirectURL /help/auto-check-failed
Defaults: 5 missed verifications in 600 seconds → 900-second pass- through window. The IP's flagged-IP entry is preserved so the suspicious behavior is still recorded for downstream signals; only the in-line challenge is suppressed.
Sites staging a fresh deployment with aggressive thresholds
are the most likely to trip this. Watch the
tier_pass_total counter for an unusual climb under "safeguard"
reasons in the decision log (safeguard rolls into pass for metric
binning; the decision log reason challenge-safeguard is the
filter).
Where to next
- Captcha and app-bridge protocols: captcha.
- Safe rule rollout: staging.
- Metrics and dashboards: observability.
- Full directive reference: directives.