Behaviour Intelligence from Web AnalyticsStrategic Analytics: v1.0, March 2026
1. Orientation
What this is: An open-source framework and reference implementation for classifying website visitors into behavioural states using GA4 data. The full system is available on GitHub under MIT licence.
This document defines a behavioural intelligence system for websites and digital products.
Its purpose is to answer a practical question:
What is this visitor trying to do, how confident are we, and what should we change as a result?
This system transforms analytics from passive reporting into an active decision framework.
This framework has been implemented as a production-ready classification engine: a JavaScript scoring and classification library, a BigQuery SQL pipeline, GTM client-side scripts, deployment automation, and a full test suite. The implementation is open source under the MIT licence.
Why this is useful
Traditional analytics answers:
- what happened
- how many users
- where they clicked
This system answers:
- what state the user is in
- how that state evolves over time
- what action should be taken
Core principle
Optional refinement layer:
Key terms used in this document
- Signal
- A measurable indicator of user behaviour (for example, how many pages someone viewed, how long they spent reading, or whether they clicked a call-to-action button). The system uses four core signals: breadth, depth, progression, and clustering.
- State
- A classification category that describes where a visitor currently sits on the journey from awareness to action. Examples: "Scanner" (browsing widely but shallowly) or "Evaluator" (reading deeply and moving toward a decision).
- Confidence
- A score (0–10) that measures how certain the system is about a state classification. Low confidence means the evidence is thin; high confidence means the signals are strong and consistent. Confidence controls what actions the system is allowed to take.
- Cluster / Clustering
- A group of related pages tagged with the same topic (e.g. "pricing", "case studies", "product A"). Clustering measures how concentrated a visitor's behaviour is within one topic group versus scattered across many.
- Taxonomy
- A structured register that tags every page on the site with a page type (e.g. "service page", "blog post") and topic cluster (e.g. "pricing", "proof"). This tagging is what makes clustering measurable.
- GA4
- Google Analytics 4: the current version of Google's web analytics platform, used here as the default data collection tool.
- GTM
- Google Tag Manager: a tool that manages tracking code on a website without requiring direct code changes. Used to send events (like clicks and form submissions) to GA4.
- CTA
- Call-to-action: a button or link designed to prompt a specific user action, such as "Get a quote" or "Book a demo".
- Motivation
- An optional inference about what a visitor may be seeking (e.g. "risk-sensitive" or "value-driven"), based on their observed behaviour. Motivation is always secondary to state classification and is only assigned when confidence is medium or high.
2. System Overview
The system consists of six core layers (plus one optional refinement layer):
What each layer does
Layer 1: Data Collection (GA4)
The raw input. GA4 captures events from the website: page views, scroll depth, engagement time, clicks, form interactions, and more. If the right events are not collected here, everything downstream is guessing. It feeds raw event-level data into the next layer.
In the reference implementation: six client-side JavaScript modules deployed via GTM detect rage clicks, dead clicks, form errors, layout shifts, traffic source groups, and element-level intent signals, then push structured events and custom dimensions into the GA4 data layer.
Layer 2: Signal Construction
Takes the raw GA4 events and transforms them into four structured scores (each 0–10): Breadth (how widely the user explores), Depth (how deeply they engage), Progression (how far they move toward conversion actions), and Clustering (how focused their browsing is on a single topic). Raw events on their own are too noisy to compare. This layer gives every session a common shape so the classifier can tell visitors apart.
In the reference implementation: signals.js exposes a scorer for each signal (breadth, depth, progression, clustering) that converts raw session metrics into a 0–10 score. The same logic is mirrored in 01-signal-scores.sql for batch processing in BigQuery.
Layer 3: State Classification
Uses the four signal scores to assign the visitor to a named behavioural state (e.g. Scanner, Explorer, Evaluator, Engaged). Each state has defined signal thresholds. For example, high breadth combined with low depth and no progression produces a "Scanner" classification. Scores alone do not tell a team what to do. A named state gives everyone a shared word for the visitor's situation and makes the system actionable.
In the reference implementation: classifier.js walks a priority-ordered rule set from config.js and returns the first state whose signal thresholds are met, with a continuous fit-score fallback for ambiguous cases. The SQL equivalent is 02-state-classification.sql.
Layer 4: Confidence Scoring
Evaluates how reliable the classification is. A visitor with two page views and ten seconds of data gets a low confidence score; someone with fifteen pages, deep scroll, and multiple CTA clicks gets a high one. Confidence (0–10, bucketed into low, medium, and high) acts as a gate that controls what the system is allowed to do next. Without it, a two-page bounce would carry the same weight as a fifteen-page deep session. The system needs to know when it has enough evidence before it acts. Low confidence means observe only; high confidence means act.
In the reference implementation: confidence.js sums five factors (signal count, signal strength, state clarity, session depth, and temporal consistency), applies contradiction penalties, then buckets the result into low, medium, or high. A companion function gates which action types each band is permitted to trigger.
Layer 5: Action Layer
Maps each state-plus-confidence combination to a concrete response. A label on its own does not change anything. This layer turns the classification into a specific recommendation so someone (or something) can act on it. Depending on the state and confidence level, the action might be "do nothing" (low confidence), "surface a relevant CTA" (medium), or "trigger a personalised offer" (high). It is the bridge between classification and business outcome.
In the reference implementation: action.js looks up the visitor's state in an action-mapping table from config.js and returns the recommended action, success metric, owner, and a natural-language prescription with context-specific detail interpolated in.
Layer 6: Feedback Loop
Measures whether the actions taken actually worked. Did the personalised CTA increase conversions? Did Scanners who were shown navigation aids find what they needed? Without this, the system never learns whether its recommendations actually helped. The feedback loop sends outcome data back to refine signal weights, classification thresholds, and action rules over time.
In the reference implementation: temporal.js tracks recency, frequency, trend direction, and velocity across a visitor's session history, while 03-temporal-analysis.sql and supporting SQL queries handle the same at scale in BigQuery. A documented review cadence (weekly, monthly, quarterly) and defined rollback triggers govern when thresholds are recalibrated.
Layer 7 (Optional): Motivation Layer
Only applied when confidence is medium or high. Infers why the visitor is behaving the way they are (e.g. "price-sensitive", "risk-averse", "comparison shopping") based on which content clusters they focus on. This adds a qualitative dimension to the state label, allowing even more targeted responses. It is deliberately optional because motivation inference is less reliable than behavioural classification.
In the reference implementation: refinements.js first detects a content sub-type based on where engagement time is concentrated, then infers one of six motivations by combining the sub-type, state, and signal values. The confidence gate in confidence.js controls whether the inferred motivation is suppressed, flagged for review, or allowed to drive automated action modifiers.
How the layers chain together
The flow is a pipeline: GA4 events → structured signals → state label → confidence gate → action → outcome measurement → refinement. Each layer depends on the one before it, and the feedback loop at the end circles back to improve layers 2–5. The confidence gate (layer 4) is the key safety mechanism. It prevents the system from acting on weak evidence, so low-data sessions are observed rather than acted upon prematurely.
In the reference implementation: pipeline.js orchestrates the entire flow through a single entry point, evaluateVisitor. It accepts the raw session data and user history, then calls each layer in sequence: scoreAllSignals → assessTemporalContext → classifyByPriority → calculateConfidence → applyRefinements → resolveAction. The returned object contains the full evaluation: signals, temporal context, classification, confidence, refinements, and action plan. The SQL pipeline mirrors this sequence across six numbered query files (01 through 06), each corresponding to a layer, designed to run in order inside BigQuery.
3. Signal Model
The system needs a way to measure behaviour that is consistent across every session and every site. Without a defined signal model, classification would depend on ad hoc metrics that shift from report to report. These four signals provide the common language that makes everything downstream possible.
3.1 Core signals (primary)
1. Breadth (Exploration Volume)
How much the user explores.
Breadth should be calculated from:
- unique pages viewed in the session
- unique page types viewed in the session
- unique topic clusters viewed in the session
Recommended calculation:
- Raw breadth inputs
unique_pagesunique_page_typesunique_topics
- Suggested breadth score (0–10)
- 0–1 = 1 page only
- 2–3 = 2–3 pages, low variety
- 4–5 = moderate exploration
- 6–7 = broad exploration across multiple page types
- 8–10 = very broad exploration across many page types / topics
Breadth can be calibrated per site using percentiles once enough data exists.
Interpretation warning: High breadth does not always mean healthy exploration.
On a poorly structured site, high breadth often signals a lost user, someone clicking widely because they cannot find what they need, not because they are surveying options.
How to check: Cross-reference breadth with depth and progression. If breadth is high but depth is very low (≤ 2) and progression is zero, the user is more likely lost than exploring.
What to watch: The Scanner state captures this pattern. A spike in Scanner volume may indicate a site navigation problem rather than a traffic quality problem.
2. Depth (Engagement)
How deeply they engage.
Depth should be calculated from:
- engagement time
- active time on page
- scroll depth
- repeated long dwell on related content
- high-attention actions such as video plays or file downloads where relevant
Recommended inputs:
engagement_time_secondsavg_time_on_key_pagesavg_scroll_percentdeep_engagement_events
Suggested depth score (0–10):
- 0–2 = glance / shallow interaction
- 3–5 = moderate reading / review
- 6–8 = sustained engagement
- 9–10 = very deep attention, repeat deep reading, long dwell on key content
3. Progression (Intent Momentum)
How far they move toward meaningful action.
Progression should be calculated from:
- movement toward evaluation pages
- CTA clicks
- form starts
- form submits
- booking actions
- repeat sessions that move closer to conversion
Recommended inputs:
high_intent_page_viewscta_click_countform_startform_submitbooking_clickconversion_complete
Suggested progression score (0–10):
- 0 = no movement toward action
- 1–3 = weak progression, mostly orientation
- 4–6 = evaluation behaviour present
- 7–8 = strong action intent
- 9–10 = conversion or near-conversion completed
3.2 Derived signal
4. Clustering (Behavioural Coherence)
How concentrated behaviour is within a topic, pathway, or offer cluster.
This is the main signal that distinguishes broad, scattered browsing from coherent evaluation.
Clustering should be calculated from three components:
A. Topic concentration
What proportion of views fall within the dominant topic cluster.
Example:
- if 7 of 10 page views are in one topic cluster, concentration is high
B. Topic switching
How often the visitor moves between unrelated topics.
Example:
- homepage → service A → article → service B → about → article = high switching
- homepage → service A → case study A → FAQ A → contact = low switching
C. Repeat cluster return
Whether the user repeatedly returns to the same cluster during the session or across sessions.
Example:
- repeated visits to the same offer, proof pages, or pricing path = stronger clustering
Recommended inputs:
dominant_topic_sharetopic_switch_countrepeat_cluster_visitssame_cluster_sequence_length
Suggested clustering score (0–10):
- 0–2 = highly scattered
- 3–5 = partially coherent
- 6–8 = clearly clustered
- 9–10 = strongly concentrated around one topic / pathway
A simple practical starting formula:
clustering_score = (dominant_topic_share * 10) - topic_switch_penalty + repeat_cluster_bonus
Where:
dominant_topic_shareis expressed from 0 to 1 (e.g. 7 of 10 views in one cluster = 0.7)topic_switch_penalty=min(topic_switch_count, 5)(capped to avoid overwhelming the score; each switch between unrelated clusters adds 1 point of penalty)repeat_cluster_bonus=min(repeat_cluster_visits - 1, 3)(capped at 3; each return visit to the same cluster beyond the first adds 1 point of bonus)
Minimum signal floor: Do not apply
topic_switch_penaltyuntil the user has viewed 4 or more pages.Below this threshold, a single topic switch (e.g. 2 pages in Cluster A, then 1 in Cluster B) is normal exploratory behaviour and should not be penalised. For short sessions, set
topic_switch_penalty = 0and rely ondominant_topic_sharealone.
Example 1, sufficient data: a visitor views 10 pages, 7 in one cluster, switches topics twice, and returns to the primary cluster 3 times:
(0.7 * 10) - 2 + 2 = 7.0→ clearly clustered
Example 2, below signal floor: a visitor views 3 pages, 2 in Cluster A and 1 in Cluster B, with 1 topic switch:
(0.67 * 10) - 0 + 0 = 6.7→ penalty suppressed; score reflects concentration only
The exact weightings should be calibrated to the site once sufficient data exists. Start with these defaults and adjust.
3.3 Temporal signals (integrated)
Temporal signals track how behaviour changes across visits: how recently someone returned, how often they visit, and how quickly they move toward action. These are not separate decoration; they should directly shape classification and confidence.
This section defines the raw inputs. Section 11 (later in this document) covers how these inputs are used to track state transitions over time. For example, detecting that a Scanner is becoming an Evaluator across three sessions.
Recency
Time since last session.
Use recency to distinguish:
- single exploratory visits
- active evaluation windows
- dormant / re-engaged prospects
Suggested recency bands:
- 0–2 days = highly recent
- 3–7 days = active consideration
- 8–30 days = delayed return
- 30+ days = dormant / re-engaged
Frequency
Number of sessions in a defined time period.
Suggested frequency bands:
- 1 session = single-session user
- 2–3 sessions in 7 days = active evaluator
- 4+ sessions in 14 days = high ongoing engagement
Velocity
How quickly a user moves toward action.
Examples:
- first visit → conversion page in one session = high velocity
- three sessions with increasing intent = medium velocity
- repeated evaluation without stronger action = low velocity
Recommended temporal inputs:
session_count_7dsession_count_30ddays_since_last_sessiontime_to_first_high_intent_eventtime_to_conversionstate_change_over_time
These should explicitly affect the Returning Evaluator and Re-engaged Prospect classifications, and should increase or reduce confidence in other states.
3.4 Score calibration
Scores should not remain arbitrary. They must be calibrated.
There are two valid approaches:
Option A: Fixed rules (best for early-stage / low data)
Use fixed score thresholds based on known business logic. This is easier to explain and debug.
Option B: Percentile-based normalisation (best once data volume grows)
Instead of fixed thresholds, compare each visitor's raw values against what is typical for your site. "Percentile-based" means ranking a value against historical data. For example, if a visitor's engagement time is higher than 80% of all sessions, they score in the 80th percentile.
Examples:
- Breadth score of 8 = top 20% of page variety for this site
- Depth score of 7 = above-average engagement for this content type
This avoids applying the same thresholds to very different sites.
Recommended approach:
- start with fixed rules
- migrate to percentile-based calibration once enough data exists
4. Context Weighting
Not all pages and actions are equal. A visitor clicking a CTA on a pricing page signals stronger intent than a visitor scrolling a blog post. Without weighting, the system would treat every click and every page as equally important, and a five-page blog reader would look the same as a five-page pricing evaluator. Context weighting adjusts signal scores based on where, how, and from where a visitor interacts.
These weights are multipliers. They increase or decrease the signal value of an action. They are applied during the classification process described in Section 7. You do not need to read Section 7 first; the tables below define the weights themselves.
Page types
Each page type carries a weight that increases or decreases the signal value of actions taken on it.
| Page type | Intent weight | Effect |
|---|---|---|
| Homepage | 0.5 | Actions here are orientation; downweight toward progression |
| Blog / resource | 0.6 | Useful for depth, but low direct conversion signal |
| Service / product | 1.0 | Baseline evaluation behaviour |
| Case study | 1.2 | Proof-seeking; upweight depth and clustering |
| Pricing | 1.5 | Strong intent signal; upweight progression |
| Contact / booking | 2.0 | Conversion action; maximum progression weight |
Action strength
Each action type carries a signal weight reflecting how strongly it indicates intent.
| Action | Weight | Signal contribution |
|---|---|---|
| Page view | 0.2 | Breadth only |
| Scroll (≥75%) | 0.5 | Depth |
| CTA click | 1.0 | Progression |
| Form start | 1.5 | Strong progression |
| Form submit | 2.0 | Conversion / maximum progression |
Source context
Traffic source applies a bias to the initial state probability, not a hard override.
| Source | Bias | Rationale |
|---|---|---|
| Direct / bookmark | +1 to progression baseline | Returning with purpose suggests prior awareness |
| Organic search | Neutral | Intent varies; let behaviour determine state |
| Social media | -1 to progression baseline | Typically exploratory; higher Scanner probability |
| Referral | +1 to depth baseline | Trust transfer from referring source |
| Paid search | +1 to progression baseline | Keyword intent suggests evaluation |
These weights are starting defaults. Calibrate them against actual conversion data once enough volume exists.
Element-level metadata (micro-signals)
Scalability is not just about the page, it is about the elements on the page. Individual interactive elements can carry their own weight, independent of the page they sit on.
When an element carries an element_weight, it overrides the page’s intent weight for that specific interaction. When absent, the page weight applies as the default.
| Element role | Default weight | Example |
|---|---|---|
| Progression | 2.0 | “Get a Quote” button, “Book a Demo” |
| Depth | 0.5 | “Read More” link, “See Details” |
| Tool use | 0.8 | Calculators, configurators |
| Navigation | 0.3 | Menu links, breadcrumbs |
| Social | 0.3 | Share buttons, social links |
Implementation: add data-element-role and optionally data-element-weight to interactive HTML elements. GTM reads these attributes on click events and sends them to GA4 as custom parameters.
<button data-element-role="progression" data-element-weight="2.0">Get a Quote</button>
<a href="/blog/..." data-element-role="depth" data-element-weight="0.5">Read More</a>
The effective weight for progression scoring becomes:
effective_weight = element_weight ?? page_intent_weight
progression_contribution = action_weight × effective_weight
This means a “Get a Quote” button (element_weight 2.0) on a blog page (page_weight 0.6) contributes 2.0 to progression, not 0.6. The element’s own significance wins.
5. State Model
States are the point of the system. A dashboard full of signal scores is useful to an analyst, but it does not tell a product manager what to fix or a CRM team who to follow up with. Named states translate numerical patterns into plain descriptions of visitor behaviour that any team can understand and act on.
The system uses 10 core states with explicit signal definitions.
Each state is determined using:
- Breadth score (0–10)
- Depth score (0–10)
- Progression score (0–10)
- Clustering score (0–10)
- Temporal context where relevant
State definitions
Note on score overlaps: The score ranges below intentionally overlap at boundaries. Real user behaviour does not fall neatly into boxes. When a visitor's scores could match two states, Section 7 provides a priority order to resolve the tie (e.g. "Engaged" always takes priority over "Focused Evaluator"). If no state's criteria are fully met, assign the closest match with low confidence and flag for review.
1. Mismatch
- Breadth ≤ 2
- Depth ≤ 2
- Progression = 0
- Clustering irrelevant
- Usually 1 session only
Immediate exit or no meaningful engagement.
2. Scanner
- Breadth ≥ 6
- Depth ≤ 3
- Clustering ≤ 3
- Progression ≤ 2
- Usually low velocity, low continuity
Wide but shallow exploration.
3. Explorer
- Breadth 4–7
- Depth 3–6
- Clustering 3–6
- Progression 2–4
- May show increasing structure within session or across 2 sessions
Exploration with emerging intent.
4. Comparator
- Breadth 4–7
- Depth 3–5
- Clustering ≥ 5 across competing options / pathways
- Progression 3–5
- Often includes repeat visits to proof, pricing, or alternative offers
Comparing multiple options.
5. Evaluator
- Breadth 3–6
- Depth ≥ 6
- Clustering ≥ 5
- Progression 4–6
- May occur in one strong session or repeated sessions within 7 days
Serious evaluation.
6. Focused Evaluator
- Breadth 2–4
- Depth ≥ 7
- Clustering ≥ 7
- Progression ≥ 6
- High velocity or repeated strong cluster return
Highly aligned, strong intent.
7. Hesitant
- Progression ≥ 6 (form start / CTA / booking step)
- No completion
- Depth ≥ 4
- Can occur in one session or repeated attempts over 7–14 days
Intent present but interrupted.
8. Stalled
- Breadth 3–6
- Depth 4–6
- Progression ≤ 3
- Repeated loops or repeated low-progress sessions
- Low improvement over time
Confusion, overload, or structural friction.
Important distinction: Stalled vs. Frustrated. A Stalled user has intent but lacks clarity (revisiting the same pages, looping between sections, failing to progress). A Frustrated user is blocked by technical or UX failures. To distinguish between them, monitor for friction signals:
rage_click_count: rapid repeated clicks on the same element (3+ clicks within 2 seconds)dead_click_count: clicks on non-interactive elements that produce no responseform_error_count: validation errors encountered during form completionhigh_layout_shift: significant content movement during page load (Cumulative Layout Shift > 0.25, a measure of how much visible page content shifts unexpectedly during loading)
If friction signals are present alongside Stalled criteria, classify as Stalled (Friction) sub-type. This changes the recommended action from "simplify navigation" to "fix the broken interaction", a UX engineering problem, not a content strategy problem.
9. Engaged (Committed)
- Progression ≥ 8 (conversion)
- Continued activity post-conversion
- May include trust / process validation after action
User has acted and is validating or onboarding.
10. Returning Evaluator
- 2+ sessions within 7 days, or 3+ sessions within 30 days
- Increasing depth, clustering, or progression over time
- No completed conversion yet
Intent strengthening across sessions.
Special temporal state: Re-engaged Prospect
This may be treated as a sub-type of Explorer, Evaluator, or Returning Evaluator.
Typical pattern:
- gap of 30+ days
- then a renewed visit with medium-high clustering and progression
This matters because the visitor is not simply "new" or "returning"; they are reactivated.
6. Optional Refinement Layers
6.1 Content Sub-types (Optional)
Each core state can carry a sub-type label that describes the content orientation of the behaviour, not just its intensity. Sub-types are determined by which page types and content roles dominate the session.
| Sub-type | Determined by | Example |
|---|---|---|
| Proof-focused | Majority of depth on case studies, testimonials, or results pages | An Evaluator spending 70% of engagement time on case studies |
| Trust-focused | Concentration on about, team, credentials, or review pages | A Hesitant user who revisits the "About us" and "Our team" pages before returning to the form |
| Price-focused | Repeated or deep engagement with pricing, comparison, or plan pages | A Comparator returning to the pricing page across two sessions |
| Resource-seeking | Majority of actions on downloads, guides, tools, or documentation | An Explorer downloading three whitepapers but not visiting any service pages |
Sub-types are optional. Implement them only when the action layer needs to differentiate which kind of content to surface. For example, sending a proof-led follow-up to a proof-focused Evaluator vs. a pricing summary to a price-focused Comparator.
Sub-types do not affect state classification or priority. They inform the content of the response, not the type of response.
6.2 Motivation Signals (Optional Layer)
The framework may apply a lightweight motivation signal after state assignment. This is an inference layer, not a replacement for state classification.
Use motivation only to refine action precision:
- State = what behaviour is doing now
- Motivation signal = what behaviour is most consistent with
Recommended motivation categories (keep small)
| Motivation signal | Typical behavioural pattern |
|---|---|
| Curiosity-driven | Broad exploration, low commitment, limited progression |
| Value-driven | Deep engagement with proof/outcome content and evaluation behaviour |
| Risk-sensitive | Strong intent signals with hesitation before completion |
| Confusion-driven | Repeated loops, switching, and low forward movement |
| Overload-sensitive | Deep dwell and repeated review without clear progression |
| Urgency-driven | Fast movement to high-intent actions with minimal exploration |
Guardrails (must follow)
- Behaviour first: assign state before motivation.
- Confidence gate: only assign motivation when state confidence is medium or high (4+).
- Minimal set: do not expand motivation categories unless a new category changes action.
- Action-linked only: if a motivation label does not alter response, remove it.
- No psychological overreach: describe behavioural consistency, not internal truth.
What should not be added
Do not add abstract, non-operational labels (for example, "status-seeking" or "identity-driven") unless there is a reliable behavioural proxy and a distinct action pathway.
Do not use motivation as a primary classifier, and do not assign motivation when data is sparse or confidence is low.
7. Classification Logic
The state model defines what each state looks like. This section defines how the system actually decides which state to assign: the priority order, the scoring method, and what happens when a visitor's signals are ambiguous. Without clear rules, two implementations of the same framework could classify the same visitor differently.
Important note on thresholds
All thresholds (e.g. page count, time, events) should be calibrated per site. The example rules below are starting points only. Different products, traffic types, and session lengths will require adjustment.
Signal scoring approach (recommended)
Instead of relying only on hard rules, assign scores:
- Breadth score (0–10)
- Depth score (0–10)
- Progression score (0–10)
- Clustering score (0–10)
Then map score ranges to states. This improves flexibility and reduces brittle classification.
Recommended scoring method
Step 1: calculate raw metrics
Examples:
- unique pages
- unique topics
- engagement time
- average scroll
- CTA clicks
- form starts / submits
- topic switch count
- return visits
Step 2: convert to normalised scores
Use either:
- fixed score mapping, or
- percentile-based normalisation against historical site behaviour
Step 3: apply context weighting
Adjust signals based on:
- page type importance
- action strength
- source bias
- device / session context where relevant
Step 4: assign likely state
Apply states in the following priority order (highest priority first). Evaluate each rule top-down; assign the first state whose criteria are met.
- Engaged: conversion completed (progression ≥ 8). Overrides all other states.
- Hesitant: high-intent action started but not completed (progression ≥ 6, no conversion). Overrides Focused Evaluator.
- Returning Evaluator: temporal criteria met (2+ sessions in 7 days or 3+ in 30 days) with increasing signals and no conversion. Overrides single-session states.
- Focused Evaluator: narrow, deep, high-progression behaviour (breadth 2–4, depth ≥ 7, clustering ≥ 7, progression ≥ 6).
- Evaluator: serious evaluation with depth and clustering (depth ≥ 6, clustering ≥ 5, progression 4–6).
- Comparator: evaluation across competing options (breadth 4–7, depth 3–5, clustering ≥ 5, progression 3–5).
- Stalled: moderate engagement with low progression and repeated loops (breadth 3–6, depth 4–6, progression ≤ 3).
- Scanner: wide but shallow (breadth ≥ 6, depth ≤ 3, clustering ≤ 3).
- Explorer: moderate exploration with emerging structure (breadth 4–7, depth 3–6).
- Mismatch: minimal engagement (breadth ≤ 2, depth ≤ 2, progression = 0).
If no state criteria are fully met, assign the closest match and flag confidence as low. Where scores fall between two adjacent states, assign a hybrid classification (see Hybrid States below).
Step 5: assign confidence score
Every state assignment must be paired with a confidence score.
Step 6 (optional): assign motivation signal
Only after state and confidence are assigned:
- if confidence is low (0–3): do not assign motivation
- if confidence is medium/high (4–10): assign one primary motivation signal
Motivation is secondary metadata that refines response content. It must not override state priority or confidence logic.
Example rules (raw metric thresholds)
The state definitions in Section 5 use scored thresholds (e.g. "Breadth ≥ 6"). The examples below show how raw metrics translate into those scores under fixed-rule scoring (Section 3.4, Option A). These are starting defaults; calibrate to your site.
Scanner (raw metrics → Breadth score ≥ 6, Depth score ≤ 3):
- unique pages ≥ 6
- avg engagement time < 20s
- clustering score ≤ 3
- CTA clicks = 0
Evaluator (raw metrics → Depth score ≥ 6, Clustering score ≥ 5):
- service or case study pages ≥ 3
- engagement time > 60s
- clustering score ≥ 5
- at least one CTA click or high-intent page view
Hesitant (raw metrics → Progression score ≥ 6, no conversion):
- form_start = true
- form_submit = false
- progression score ≥ 6
Returning Evaluator (raw metrics + temporal criteria):
- sessions_in_7_days ≥ 2 OR sessions_in_30_days ≥ 3
- depth or progression score trend is increasing across sessions
- conversion_complete = false
Hybrid states
Users may exhibit signals consistent with multiple states simultaneously. When the primary state accounts for less than 70% of the signal weight, classify as a hybrid.
Example: a visitor with breadth 5, depth 5, clustering 5, progression 3 may score as:
- 60% Explorer (moderate breadth and depth, low progression)
- 40% Evaluator (clustering and depth suggest emerging evaluation)
When a hybrid classification occurs, the system should store:
- primary state (highest signal fit)
- secondary state (next closest fit)
- confidence score for each
- combined confidence (use the primary state's confidence, reduced by 1 point for ambiguity)
The action layer should respond to the primary state but avoid actions that would be counterproductive for the secondary state. For example, if a visitor is 60% Explorer / 40% Evaluator, guide them toward deeper evaluation content rather than immediately pushing for conversion.
8. Confidence Scoring
Each classification includes a confidence score. Confidence determines how strongly the system acts on a classification. It is not optional metadata. Acting on a weak classification wastes effort or, worse, annoys a visitor with the wrong intervention. Confidence is what separates a system that guesses from one that knows when to wait.
Confidence calculation
Confidence is calculated from five factors, each scored 0–2:
| Factor | 0 (weak) | 1 (moderate) | 2 (strong) |
|---|---|---|---|
| Signal count | ≤ 2 distinct signal types observed | 3–4 distinct signal types | 5+ distinct signal types |
| Signal strength | Only passive actions (views, scrolls) | Mix of passive and active (CTA clicks) | Active high-intent actions (form starts, bookings) |
| State clarity | Primary and secondary states within 20% of each other | Primary state clearly leads but secondary is plausible | Primary state dominant, no close competitor |
| Session depth | < 30 seconds or ≤ 2 pages | 30s–2min, 3–5 pages | > 2min, 5+ pages |
| Temporal consistency | Single session, no history | 2 sessions with consistent direction | 3+ sessions with reinforcing pattern |
"Signal types" means distinct categories of observed behaviour: page views, scrolls, CTA clicks, form starts, form submits, downloads, booking clicks, etc. Multiple page views count as one signal type; a page view plus a CTA click counts as two.
Confidence score = sum of all factors (0–10)
Confidence bands
- Low (0–3): short session, weak signals, multiple plausible states. Insufficient evidence for reliable classification.
- Medium (4–6): enough signal for a useful interpretation, but some ambiguity remains. Classification is directionally useful.
- High (7–10): strong, repeated, coherent signals with clear state fit. Classification can drive automated actions.
How confidence governs action
| Confidence | Permitted actions | Examples |
|---|---|---|
| Low (0–3) | Reporting and aggregate analysis only | Include in state distribution dashboards; do not trigger individual interventions |
| Medium (4–6) | Lightweight nudges and analyst review | Adjust content recommendations; flag for manual review; add to nurture segments; optional motivation tag may be applied |
| High (7–10) | Direct automated action and personalisation | Trigger CRM workflows; personalise page content; alert sales team; motivation tag can drive targeted response variant |
Motivation assignment rule:
- Low confidence: no motivation tag
- Medium confidence: one motivation tag allowed with analyst review
- High confidence: one motivation tag can drive automated response variants
Examples
- Low confidence (score 1): A 2-page visit with one CTA click, 15 seconds on site. Signal types: page view + CTA click = 2 types (score 0). Strength: mix of passive and active (score 1). Clarity: ambiguous between Scanner and Explorer (score 0). Session depth: < 30s, 2 pages (score 0). Temporal: single session (score 0). Total = 1. Not enough evidence to act on.
- Medium confidence (score 5): A single session lasting 3 minutes across 5 pages, with deep scrolling on two service pages, one CTA click, and behaviour concentrated in one topic cluster. Signal types: page view + scroll + CTA click = 3 types (score 1). Strength: CTA click present (score 1). Clarity: likely Explorer, but Evaluator is plausible (score 1). Session depth: 3 minutes, 5 pages (score 2). Temporal: single session (score 0). Total = 5. Enough to adjust content recommendations, but not enough to trigger automated outreach.
- High confidence (score 10): 3 sessions over 5 days, repeated cluster visits, form starts, deep scrolling, and increasing progression. Signal types: page view + scroll + CTA click + form start + cluster return = 5 types (score 2). Strength: form starts present (score 2). Clarity: clearly Returning Evaluator (score 2). Session depth: sustained engagement across sessions (score 2). Temporal: 3 sessions with reinforcing pattern (score 2). Total = 10. High enough to trigger CRM workflows and personalisation.
9. Interactive State Classifier
Use this tool to test how the four signal scores map to states. Adjust the sliders to see the classification update in real time. Temporal states (Returning Evaluator, Re-engaged Prospect) require multi-session data and cannot be tested with single-session scores alone. The confidence score shown here is a simplified approximation . The full engine (Section 8) uses five factors including session metadata and temporal consistency that sliders alone cannot capture.
10. Action Layer
Each state must map to a specific, testable action.
Principle
Classification without response is only reporting.
The system becomes useful when each state changes what the business does.
Types of actions
UX / product actions
Change the experience itself.
Examples:
- improve top-level hierarchy
- add guided entry paths
- reduce form friction
- simplify navigation
- surface trust elements earlier
CRM / communication actions
Change what is said, when, and to whom.
Examples:
- send proof-led follow-up
- send reassurance after a form drop-off
- nurture low-risk converters toward stronger actions
Strategic actions
Change internal decision-making.
Examples:
- identify which traffic sources produce evaluators vs scanners
- identify which pages create hesitation or overload
- prioritise fixes based on which state blocks conversion most often
State + motivation refinement examples
Use these only when the motivation confidence gate is met:
- Hesitant + Risk-sensitive → add reassurance (guarantees, proof, process clarity)
- Hesitant + Confusion-driven → simplify UX and next-step guidance
- Hesitant + Overload-sensitive → reduce options and shorten decision path
- Scanner + Curiosity-driven → improve guided entry and value framing
- Evaluator + Value-driven → surface outcomes, case studies, and implementation detail
- Focused Evaluator + Urgency-driven → remove distractions and shorten conversion path
If motivation does not change the action, keep state-level action only.
Example mappings with metrics
- Mismatch
- Action: review traffic source quality and landing page relevance. If Mismatch volume is high from a specific source, the source may be poorly targeted. If Mismatch volume is high on a specific landing page, the page may be failing to communicate relevance.
- Metric: reduction in Mismatch share from targeted sources; increase in sessions progressing beyond the first page
- Scanner
- Action: add guided entry and clearer value proposition above the fold
- Metric: increase in deeper-session rate and evaluator-state share
- Explorer
- Action: strengthen pathways from broad discovery content into relevant offers
- Metric: increase in clustered navigation and progression score
- Comparator
- Action: clarify differentiation, side-by-side proof, and concise summaries
- Metric: increase in focused evaluator or conversion rate
- Evaluator
- Action: add case studies, implementation details, FAQs, and proof elements
- Metric: increase in high-intent page views and conversion starts
- Focused Evaluator
- Action: reduce distractions, shorten path to contact / purchase
- Metric: increase in conversion completion rate
- Hesitant
- Action: reduce fields, clarify next step, add reassurance around process and commitment
- Metric: form completion rate and drop-off reduction
- Stalled
- Action: simplify navigation, reduce loops, add stronger recommendation pathways
- Metric: increase in forward progression and reduction in repeated loops
- Stalled (Friction)
- Action: fix the broken interaction. Resolve rage-click targets, make dead-click elements interactive or remove misleading affordances, reduce form validation errors, address layout shift issues
- Metric: reduction in friction event counts; increase in progression from previously blocked pages
- Returning Evaluator
- Action: reinforce differentiation and provide stronger closing reassurance
- Metric: increase in conversion among repeat visitors
- Engaged (Committed)
- Action: support onboarding, confirmation, and post-conversion confidence
- Metric: faster onboarding completion and reduced post-conversion abandonment
11. Temporal Layer
The system should not treat sessions as isolated events. A single session rarely tells the full story. Someone who visits three times in a week with increasing depth is fundamentally different from someone who visits once and leaves. Without tracking behaviour over time, the system would miss returning evaluators, chronic hesitation, and re-engagement after a long gap.
Section 3.3 defines the temporal signal inputs (recency, frequency, velocity). This section defines how those inputs are used to track state transitions and evaluate trend direction across sessions.
What temporal analysis should answer
- is this person becoming more serious over time?
- are they stuck in repeated hesitation?
- did they disappear and return with stronger intent?
- is evaluation accelerating or decaying?
Example transition patterns
Transition significance
A state matters more when it changes predictably.
For example:
- many Scanners becoming Explorers suggests the top-level proposition is improving
- many Evaluators becoming Hesitant suggests the conversion path is the bottleneck
- many Returning Evaluators failing to convert suggests unresolved trust or pricing friction
Lifecycle phase mapping (optional overlay)
The states and temporal signals already described can be grouped into three broad lifecycle phases. This is not a separate classification. It is a lens over existing state data that clarifies which system should own the response (product, CRM, or sales) and what kind of action is appropriate.
| Lifecycle phase | Typical states | What it means |
|---|---|---|
| Acquisition | Mismatch, Scanner, Explorer | First contact. The visitor is orienting. The question is whether the proposition is relevant and whether the site helps them find what they need. |
| Evaluation | Comparator, Evaluator, Focused Evaluator, Hesitant, Stalled | Active consideration. The visitor has intent but has not yet acted. The question is whether the experience removes enough friction and builds enough confidence to convert. |
| Retention | Engaged, Returning Evaluator, Re-engaged Prospect | Post-conversion or repeat engagement. The question shifts from "will they act?" to "will they stay, deepen, or return?" |
Phase is determined by the visitor's current state, not by calendar time. A visitor may reach Evaluation in their first session or remain in Acquisition across several visits. The phase changes when the state changes.
This mapping is useful when the action layer needs to route responses to different teams or systems. For example, Acquisition-phase issues are typically product or content problems, Evaluation-phase issues are conversion path problems, and Retention-phase issues are CRM or onboarding problems.
Recommended temporal thresholds
- Returning Evaluator: 2+ sessions in 7 days, or 3+ in 30 days, with increasing depth or progression
- Re-engaged Prospect: 30+ day gap followed by medium-high clustering and progression
- Persistent Hesitation: 2+ interrupted conversion attempts within 14 days
- Chronic Stall: 3+ low-progression sessions with repeated loops and no improvement
12. Implementation (GA4)
The framework is tool-agnostic in principle, but it needs a concrete data layer to work. GA4 with Google Tag Manager is the recommended default because most sites already have it, it supports custom events and dimensions natively, and its BigQuery export provides the raw event-level data that the scoring pipeline requires.
Required events
page_viewscrollcta_clickform_startform_submitresource_downloadnavigation_clicksection_viewbooking_clickwhere relevantconversion_complete
Friction events (recommended)
These are not required for core state classification but are needed to distinguish Stalled from Stalled (Friction); see Section 5, State 8.
rage_click: 3+ rapid clicks on the same element within 2 secondsdead_click: click on a non-interactive element with no system responseform_error: validation error displayed to the user during form completionhigh_layout_shift: Cumulative Layout Shift exceeds 0.25 during a page view (significant unexpected content movement)
Required parameters
page_typepage_topicconversion_stagecontent_roleoffer_idwhere relevanttraffic_source_group
GA4 property limits: Standard GA4 properties allow up to 50 event-scoped custom dimensions and 25 user-scoped custom dimensions. This framework uses 6 required event parameters plus friction events. Plan your custom dimension budget early. If the site already uses 40+ custom dimensions for other needs, consolidate where possible or use BigQuery export (where these limits do not apply). Audit existing custom dimensions before rollout so you do not hit the limit halfway through implementation.
Source-integrated taxonomy (CMS-first model)
Strategic meaning is a property of the content, not a secondary layer. For the clustering signal to work, every page must carry its own taxonomy metadata, assigned within the CMS at the moment of creation.
When a page is published, the CMS assigns three required fields stored as hidden metadata in the HTML:
page_type: the structural type (e.g. service, blog, pricing)page_topic: the topic cluster (e.g. strategy, proof, pricing)intent_weight: business significance (0.5–2.0)
GTM reads this metadata directly from the page and attaches it to every GA4 event. The system then uses these labels to calculate clustering and progression scores. No external lookup table is required.
Why CMS-first is the only scalable approach:
Zero blind spots: Every new page is classified the moment it goes live. There is no lag where a user visits a page that has not been added to a separate register.
No maintenance burden: No separate, massive lookup table to manage. The taxonomy is part of the website’s structure.
Reliable clustering: The clustering signal depends on seeing consistent topic tags. CMS-embedded tags ensure the signal is always accurate and never null.
Infinite scalability: Whether you have 10 pages or 10,000, the system scales because the metadata is distributed across the site rather than trapped in a central spreadsheet.
Example topic clusters: strategy, proof, pricing, onboarding, product A, product B.
Example logic:
- if most page views belong to one topic cluster and switching is low, clustering is high
- if page views are spread across many unrelated clusters with frequent switches, clustering is low
Taxonomy maintenance
The clustering signal is only as reliable as the taxonomy behind it. In the CMS-first model, the CMS publishing workflow is the primary gate; pages should not go live without page_type and page_topic assigned.
Required process:
- Require metadata at publish: the CMS must include
page_type,page_topic, andintent_weightas required fields. Untagged pages cannot exist if the publishing workflow enforces this. - Audit monthly: run a report of pages viewed in the last 30 days that have a
page_topicof “General” (the default). Any page receiving more than 100 views without a proper tag is a blind spot that must be classified. - Track coverage: maintain a simple metric:
tagged_pages / total_pages_with_traffic. Target ≥ 95% coverage. Below 90%, clustering scores should be treated as unreliable.
Default for untagged pages: Any page without CMS metadata should default to page_topic: General and intent_weight: 0.5. This prevents null values from breaking score calculations while making untagged pages visible in audits.
Common failure mode (without CMS-first): The marketing team publishes 10 blog posts without tagging them. Users who read those posts appear to have scattered, low-clustering behaviour, creating “false Scanners” or “false Mismatches.” The system then recommends navigation improvements for a problem that is actually a taxonomy gap. The CMS-first model prevents this entirely.
Fallback: external taxonomy register
For legacy sites that cannot yet embed CMS metadata, an external register provides a fallback. The register maps URL patterns to taxonomy values and is hosted in Google Sheets or Airtable for CSV export or BigQuery sync.
| URL pattern | Page type | Topic cluster | Journey role | Intent weight | Key progression event |
|---|---|---|---|---|---|
/ |
Homepage | Brand | Orientation | 0.5 | nav_click_services |
/services/consulting/* |
Service | Strategy | Evaluation | 1.0 | cta_click_quote |
/case-studies/* |
Case study | Proof | Validation | 1.2 | resource_download |
/pricing |
Pricing | Commercial | High intent | 1.5 | form_start_trial |
/blog/ai-trends/* |
Blog | AI / Tech | Awareness | 0.6 | newsletter_signup |
/contact-success |
Confirmation | Admin | Post-action | 2.0 | conversion_complete |
How the taxonomy feeds the clustering signal: When a page loads, the GTM data layer reads the CMS-embedded metadata (or the register as fallback) and sends page_topic and page_type with the GA4 page_view event. In BigQuery, clustering is calculated from the sequence of page_topic values in each session. The SQL pipeline uses a COALESCE pattern: CMS-embedded values are used first; the external register provides values only when CMS metadata is absent.
Processing options
Basic setup
- GA4 event collection
- Looker Studio reporting
- manual or spreadsheet-based scoring
Intermediate setup
- GA4 + BigQuery export (BigQuery is Google's cloud data warehouse that stores raw GA4 event data for flexible querying)
- SQL-based score calculation
- dashboard state reporting
Advanced setup
- BigQuery + warehouse logic
- near-real-time classification
- automated CRM or UX triggers based on high-confidence states
Recommended implementation sequence
- Define taxonomy (page types, topics, offers)
- Implement events and parameters in GTM
- Validate data quality in GA4
- Build score calculations
- Test state assignment against real sessions
- Add confidence scoring
- Connect states to actions and metrics
13. Output Layer
Classification is only useful if the right people see the right information at the right time. The output layer exists because a well-built model that lives inside a database query and never reaches a decision-maker has zero business value.
Dashboards should show:
- distribution of states
- transition flows
- conversion by state
- drop-off by state
- confidence by state
- source mix by state
Problem-first reporting views
The system should not only be organised around states. It should also be organised around business problems.
Examples:
- High bounce / low engagement → inspect Mismatch and Scanner
- Traffic but weak progression → inspect Explorer and Stalled
- Strong evaluation but weak conversion → inspect Evaluator, Comparator, Hesitant
- Repeat visits without action → inspect Returning Evaluator and chronic hesitation
GA4 data thresholding caveat
GA4 applies data thresholding to reports when user counts in a segment are small, suppressing rows to protect user privacy. This means state distribution dashboards may show incomplete or misleading data for low-volume segments. For example, a "Focused Evaluator" segment with only 12 users in a reporting period may be hidden entirely.
Mitigations:
- Use wider date ranges to increase user counts per segment
- Use BigQuery export for unsampled, unthresholded data
- Do not draw conclusions from state segments with fewer than 30 users in the reporting period
Output principle
Every report should answer:
- what state is happening?
- how confident are we?
- what should we change?
- how will we know if it worked?
Prescriptive output
The final dashboard should not just report data; it should issue instructions. Each state classification, when combined with aggregate context, generates a natural-language prescription that tells the team exactly what to do.
Examples:
- Hesitant + High Confidence: “Reduce form friction on the pricing contact form. 47 users started but did not complete conversion in the last 7 days.”
- Scanner + Medium Confidence: “Add guided entry points on the homepage. 312 sessions showed wide browsing with no depth.”
- Stalled (Friction) + High Confidence: “Fix the broken CTA on the services page. 23 users were blocked by UX failures.”
Prescriptions are template-based, not AI-generated. Each state maps to an instruction template with placeholders (e.g. {sessionCount}, {topBlockedPage}) that are interpolated from aggregate data at query time.
14. Feedback Loop
Any fixed set of thresholds will drift as the site, the traffic, and the market change. The feedback loop exists to make sure the system stays accurate over time rather than slowly becoming wrong in ways nobody notices.
The system improves through a continuous cycle:
- Observe: collect behavioural data through GA4 events and parameters.
- Classify: assign states and confidence scores using the signal model and classification logic.
- Act: trigger the appropriate response (UX change, CRM action, strategic decision) based on state and confidence.
- Measure: track the defined success metric for each action (Section 10). Did the intervention change behaviour in the expected direction?
- Refine: adjust thresholds, weights, and state definitions based on measured outcomes.
What refinement looks like in practice
- If a high proportion of Explorers convert without passing through Evaluator, the Explorer → Evaluator boundary may be set too high. Lower the clustering or depth threshold.
- If Hesitant users rarely convert even after intervention, investigate whether the form friction is structural (too many fields, unclear next step) rather than behavioural.
- If confidence scores cluster around medium with few high-confidence classifications, the signal model may need additional inputs or the score thresholds may be too conservative.
Recommended review cadence
- Weekly: review state distributions and conversion rates by state.
- Monthly: review confidence distributions, action effectiveness, and threshold accuracy.
- Quarterly: recalibrate score ranges using percentile analysis (Section 3.4) and reassess state definitions against actual user journeys.
15. Constraints and Limitations
Every model has boundaries. Being explicit about what this system cannot do is just as important as explaining what it can. It prevents overconfidence in the output and sets realistic expectations for anyone using the results to make decisions.
Analytical constraints
- Probabilistic, not deterministic. All classifications are probabilistic estimates. No behavioural signal guarantees intent. Treat states as the most likely interpretation, not a fact about the visitor.
- Multi-signal required. A single data point (one page view, one click) is insufficient for reliable classification. Require at least 3 signals before assigning any state above low confidence.
- Small data = low confidence. Sites with fewer than 500 sessions per month will have limited calibration data. Use fixed-rule scoring (Section 3.4, Option A) and avoid percentile-based normalisation until volume grows.
- Motivation is inferred, not observed. Motivation tags are optional secondary signals. They must never replace behavioural states as the primary classification layer.
- No low-confidence motivation. Do not assign motivation tags when confidence is low (0–3) or when signal evidence is sparse.
- Avoid psychological overreach. Do not claim internal mental truth; only describe behavioural patterns consistent with a motivation hypothesis.
Data quality constraints
- Taxonomy completeness. The clustering signal depends entirely on every important page being tagged with a page type and topic cluster. Untagged pages create blind spots that distort clustering scores. Audit taxonomy coverage before trusting clustering outputs.
- Event reliability. Custom events (CTA clicks, form starts, section views) require correct GTM implementation. Missing or double-firing events will corrupt progression and depth scores. Validate event accuracy in GA4 DebugView before using scores operationally.
- Cross-device and cross-session identity. GA4 identity relies on cookies and optional User-ID (which links sessions when a user logs in). If visitors switch devices, browse in private mode, or clear cookies, they appear as new users. This splits their behavioural history. It mainly affects temporal signals (frequency, velocity, and returning evaluator detection). Treat this as a known limitation and avoid over-interpreting single-session classifications on high cross-device sites.
- B2B-specific risk. In B2B journeys, people often research on one device, revisit on another, and return through a shared link. Without a logged-in User-ID, the "Returning Evaluator" pattern can fragment into separate "new Scanner" sessions. That weakens the temporal layer. For B2B sites, User-ID (via gated content, account login, or CRM integration) is a practical requirement for reliable multi-session tracking.
Privacy and consent
- Session-level and cross-session tracking requires user consent under GDPR (the EU's General Data Protection Regulation), ePrivacy, and similar data protection regulations. Ensure consent management is in place before collecting the events described in this framework.
- Do not store personally identifiable information (PII) in GA4 custom parameters. State classifications should be based on behavioural patterns, not individual identity.
- Where consent is not granted, the system should degrade gracefully to aggregate-only reporting with no individual state assignment.
Exclusions
- Bot and crawler traffic. Filter known bots before scoring. GA4 excludes known bots by default, but verify that automated traffic is not inflating Scanner or Mismatch counts.
- Internal traffic. Exclude staff and internal IP ranges to avoid contaminating state distributions.
16. Final Summary
This system transforms website analytics from passive observation into an active decision framework. It does this by:
- Classifying visitors into behavioural states using four core signals (breadth, depth, progression, clustering) and temporal context, replacing vague metrics with actionable categories.
- Optionally refining response precision with motivation signals (for medium/high-confidence classifications only), so actions can be tailored without overclaiming psychological certainty.
- Quantifying certainty through confidence scoring, so that the strength of response matches the strength of evidence.
- Connecting every state to a specific, testable action, ensuring that classification always leads to a concrete business response with a measurable outcome.
- Learning continuously through a feedback loop that refines thresholds, weights, and state definitions as data accumulates.
Key distinction
Most analytics systems describe what happened.
This system decides what to do next.
Appendix A: BigQuery Reference Implementation
This SQL provides a starting implementation for calculating the four core signal scores from GA4 BigQuery export data. It assumes the taxonomy register has been uploaded as a BigQuery table (manual_taxonomy_lookup) with columns: url_pattern, page_type, topic_cluster, intent_weight.
Prerequisites:
- GA4 BigQuery export enabled
- Taxonomy register uploaded as a lookup table
- Date range adjusted to match your reporting period
WITH
-- 1. Extract raw events with session identity
raw_events AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
event_name,
TIMESTAMP_MICROS(event_timestamp) AS event_time,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS url,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled') AS scroll_percent
FROM `your-project.analytics_123456.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
),
-- 2. Map events to taxonomy (REGEXP_CONTAINS matches URLs against patterns, e.g. /blog/.* matches any blog page)
-- NOTE: For large taxonomy tables, pre-compute the join or use exact URL matching
-- with a materialised lookup to avoid expensive regex scans on every query.
mapped_events AS (
SELECT
e.*,
COALESCE(t.topic_cluster, 'General') AS topic_cluster,
COALESCE(t.page_type, 'Unknown') AS page_type,
COALESCE(t.intent_weight, 0.5) AS intent_weight
FROM raw_events e
LEFT JOIN `your-project.your_dataset.manual_taxonomy_lookup` t
ON REGEXP_CONTAINS(e.url, t.url_pattern)
),
-- 3. Breadth score: unique pages, page types, and topic clusters per session
breadth_metrics AS (
SELECT
user_pseudo_id,
session_id,
COUNT(DISTINCT url) AS unique_pages,
COUNT(DISTINCT page_type) AS unique_page_types,
COUNT(DISTINCT topic_cluster) AS unique_topics
FROM mapped_events
WHERE event_name = 'page_view'
GROUP BY 1, 2
),
-- 4. Depth score: engagement time and scroll depth per session
depth_metrics AS (
SELECT
user_pseudo_id,
session_id,
SUM(engagement_time_msec) / 1000.0 AS engagement_time_seconds,
AVG(CASE WHEN scroll_percent IS NOT NULL THEN scroll_percent END) AS avg_scroll_percent,
COUNTIF(event_name IN ('resource_download', 'video_start')) AS deep_engagement_events
FROM mapped_events
GROUP BY 1, 2
),
-- 5. Clustering: topic concentration, switching, and repeat returns
clustering_prep AS (
SELECT
user_pseudo_id,
session_id,
topic_cluster,
event_time,
COUNT(*) OVER(PARTITION BY user_pseudo_id, session_id) AS total_views,
COUNT(*) OVER(PARTITION BY user_pseudo_id, session_id, topic_cluster) AS cluster_views,
LAG(topic_cluster) OVER(PARTITION BY user_pseudo_id, session_id ORDER BY event_time) AS prev_topic
FROM mapped_events
WHERE event_name = 'page_view'
),
clustering_metrics AS (
SELECT
user_pseudo_id,
session_id,
MAX(SAFE_DIVIDE(cluster_views, total_views)) AS dominant_topic_share,
-- Count topic switches (where current topic differs from previous)
COUNTIF(topic_cluster != prev_topic AND prev_topic IS NOT NULL) AS topic_switch_count,
-- Total page views (for minimum signal floor check)
MAX(total_views) AS total_page_views,
-- Repeat cluster return: views in the dominant cluster beyond the first visit
MAX(cluster_views) - 1 AS repeat_cluster_visits
FROM clustering_prep
GROUP BY 1, 2
),
-- 6. Progression: weighted action scores using intent weights from taxonomy
progression_metrics AS (
SELECT
user_pseudo_id,
session_id,
-- NOTE: page_view is excluded here. It contributes to breadth only (Section 4).
-- Scroll contributes to depth, not progression, so it is also excluded.
SUM(CASE
WHEN event_name = 'cta_click' THEN 1.0 * intent_weight
WHEN event_name = 'form_start' THEN 1.5 * intent_weight
WHEN event_name = 'form_submit' THEN 2.0 * intent_weight
WHEN event_name = 'booking_click' THEN 1.5 * intent_weight
WHEN event_name = 'conversion_complete' THEN 2.0 * intent_weight
ELSE 0
END) AS raw_progression_sum,
COUNTIF(event_name = 'form_start') AS form_starts,
COUNTIF(event_name = 'form_submit') AS form_submits,
COUNTIF(event_name = 'conversion_complete') AS conversions
FROM mapped_events
GROUP BY 1, 2
)
-- 7. Final scoring: assemble all four signal scores (0–10)
SELECT
b.user_pseudo_id,
b.session_id,
-- Breadth score (0–10): based on unique pages and variety
LEAST(10, CASE
WHEN b.unique_pages = 1 THEN 1
WHEN b.unique_pages <= 3 AND b.unique_page_types <= 2 THEN 3
WHEN b.unique_pages <= 5 THEN 5
WHEN b.unique_pages <= 8 AND b.unique_page_types >= 3 THEN 7
ELSE 9
END) AS breadth_score,
-- Depth score (0–10): based on engagement time and scroll
LEAST(10, CASE
WHEN d.engagement_time_seconds < 10 THEN 1
WHEN d.engagement_time_seconds < 30 THEN 3
WHEN d.engagement_time_seconds < 90 THEN 5
WHEN d.engagement_time_seconds < 180 THEN 7
ELSE 9
END
+ CASE WHEN COALESCE(d.avg_scroll_percent, 0) >= 75 THEN 1 ELSE 0 END
+ CASE WHEN d.deep_engagement_events > 0 THEN 1 ELSE 0 END
) AS depth_score,
-- Progression score (0–10): capped weighted sum
LEAST(10, ROUND(p.raw_progression_sum, 1)) AS progression_score,
-- Clustering score (0–10): formula with minimum signal floor
ROUND(
(c.dominant_topic_share * 10)
- CASE
WHEN c.total_page_views < 4 THEN 0 -- minimum signal floor: no penalty below 4 pages
ELSE LEAST(c.topic_switch_count, 5)
END
+ LEAST(GREATEST(c.repeat_cluster_visits, 0), 3)
, 1) AS clustering_score,
-- Raw metrics for debugging and calibration
b.unique_pages,
b.unique_page_types,
d.engagement_time_seconds,
d.avg_scroll_percent,
c.dominant_topic_share,
c.topic_switch_count,
c.total_page_views,
p.form_starts,
p.form_submits,
p.conversions
FROM breadth_metrics b
JOIN depth_metrics d ON b.user_pseudo_id = d.user_pseudo_id AND b.session_id = d.session_id
JOIN clustering_metrics c ON b.user_pseudo_id = c.user_pseudo_id AND b.session_id = c.session_id
JOIN progression_metrics p ON b.user_pseudo_id = p.user_pseudo_id AND b.session_id = p.session_id
Implementation notes
- Session identity: GA4's
ga_session_idis a timestamp and is not unique across users.user_pseudo_idis GA4's anonymous identifier for a visitor (based on their browser cookie). Always partition by bothuser_pseudo_idANDsession_idto avoid mixing sessions from different visitors. - Taxonomy join performance:
REGEXP_CONTAINSjoins are computationally expensive. For production use, materialise the taxonomy lookup as a pre-computed URL-to-metadata table (exact match on URL path) and reserve regex matching for a nightly batch update. This can reduce query costs by 10–100x on large event tables. - Null handling: The
COALESCEwrappers on taxonomy fields ensure untagged pages default totopic_cluster: 'General'andintent_weight: 0.5rather than producing null scores. Monitor the volume of 'General' classifications. High volume indicates taxonomy debt. - Calibration: The breadth and depth score thresholds above (e.g. "< 30 seconds = 3") are fixed-rule defaults (Section 3.4, Option A). Once you have 3+ months of data, replace them with percentile-based scoring by computing
PERCENT_RANK()over the raw metrics and mapping the percentile to a 0–10 scale. - Next step, state assignment: This query produces the four signal scores per session. To assign states, add a final
CASE WHENblock applying the priority order from Section 7, Step 4, or export the scores to a downstream transformation layer (e.g. dbt, a SQL-based data transformation tool) for state classification and confidence scoring.