Behaviour Intelligence from Web AnalyticsStrategic Analytics: v1.0, March 2026

1. Orientation

What this is: An open-source framework and reference implementation for classifying website visitors into behavioural states using GA4 data. The full system is available on GitHub under MIT licence.

This document defines a behavioural intelligence system for websites and digital products.

Its purpose is to answer a practical question:

What is this visitor trying to do, how confident are we, and what should we change as a result?

This system transforms analytics from passive reporting into an active decision framework.

Open-source implementation available

This framework has been implemented as a production-ready classification engine: a JavaScript scoring and classification library, a BigQuery SQL pipeline, GTM client-side scripts, deployment automation, and a full test suite. The implementation is open source under the MIT licence.

View the repository on GitHub →

Why this is useful

Traditional analytics answers:

what happened
how many users
where they clicked

This system answers:

what state the user is in
how that state evolves over time
what action should be taken

Core principle

Behaviour

State

Response

Outcome

Learning

Optional refinement layer:

Behaviour

State

Motivationinferred, optional

Response

Outcome

Learning

Key terms used in this document

Signal: A measurable indicator of user behaviour (for example, how many pages someone viewed, how long they spent reading, or whether they clicked a call-to-action button). The system uses four core signals: breadth, depth, progression, and clustering.
State: A classification category that describes where a visitor currently sits on the journey from awareness to action. Examples: "Scanner" (browsing widely but shallowly) or "Evaluator" (reading deeply and moving toward a decision).
Confidence: A score (0–10) that measures how certain the system is about a state classification. Low confidence means the evidence is thin; high confidence means the signals are strong and consistent. Confidence controls what actions the system is allowed to take.
Cluster / Clustering: A group of related pages tagged with the same topic (e.g. "pricing", "case studies", "product A"). Clustering measures how concentrated a visitor's behaviour is within one topic group versus scattered across many.
Taxonomy: A structured register that tags every page on the site with a page type (e.g. "service page", "blog post") and topic cluster (e.g. "pricing", "proof"). This tagging is what makes clustering measurable.
GA4: Google Analytics 4: the current version of Google's web analytics platform, used here as the default data collection tool.
GTM: Google Tag Manager: a tool that manages tracking code on a website without requiring direct code changes. Used to send events (like clicks and form submissions) to GA4.
CTA: Call-to-action: a button or link designed to prompt a specific user action, such as "Get a quote" or "Book a demo".
Motivation: An optional inference about what a visitor may be seeking (e.g. "risk-sensitive" or "value-driven"), based on their observed behaviour. Motivation is always secondary to state classification and is only assigned when confidence is medium or high.

2. System Overview

The system consists of six core layers (plus one optional refinement layer):

1Data Collection (GA4)

2Signal Construction

3State Classification

4Confidence Scoring

5Action Layer

6Feedback Loop

7Optional Motivation Layer (applied only when confidence is medium/high)

What each layer does

Layer 1: Data Collection (GA4)

The raw input. GA4 captures events from the website: page views, scroll depth, engagement time, clicks, form interactions, and more. If the right events are not collected here, everything downstream is guessing. It feeds raw event-level data into the next layer.

In the reference implementation: six client-side JavaScript modules deployed via GTM detect rage clicks, dead clicks, form errors, layout shifts, traffic source groups, and element-level intent signals, then push structured events and custom dimensions into the GA4 data layer.

Layer 2: Signal Construction

Takes the raw GA4 events and transforms them into four structured scores (each 0–10): Breadth (how widely the user explores), Depth (how deeply they engage), Progression (how far they move toward conversion actions), and Clustering (how focused their browsing is on a single topic). Raw events on their own are too noisy to compare. This layer gives every session a common shape so the classifier can tell visitors apart.

In the reference implementation: signals.js exposes a scorer for each signal (breadth, depth, progression, clustering) that converts raw session metrics into a 0–10 score. The same logic is mirrored in 01-signal-scores.sql for batch processing in BigQuery.

Layer 3: State Classification

Uses the four signal scores to assign the visitor to a named behavioural state (e.g. Scanner, Explorer, Evaluator, Engaged). Each state has defined signal thresholds. For example, high breadth combined with low depth and no progression produces a "Scanner" classification. Scores alone do not tell a team what to do. A named state gives everyone a shared word for the visitor's situation and makes the system actionable.

In the reference implementation: classifier.js walks a priority-ordered rule set from config.js and returns the first state whose signal thresholds are met, with a continuous fit-score fallback for ambiguous cases. The SQL equivalent is 02-state-classification.sql.

Layer 4: Confidence Scoring

Evaluates how reliable the classification is. A visitor with two page views and ten seconds of data gets a low confidence score; someone with fifteen pages, deep scroll, and multiple CTA clicks gets a high one. Confidence (0–10, bucketed into low, medium, and high) acts as a gate that controls what the system is allowed to do next. Without it, a two-page bounce would carry the same weight as a fifteen-page deep session. The system needs to know when it has enough evidence before it acts. Low confidence means observe only; high confidence means act.

In the reference implementation: confidence.js sums five factors (signal count, signal strength, state clarity, session depth, and temporal consistency), applies contradiction penalties, then buckets the result into low, medium, or high. A companion function gates which action types each band is permitted to trigger.

Layer 5: Action Layer

Maps each state-plus-confidence combination to a concrete response. A label on its own does not change anything. This layer turns the classification into a specific recommendation so someone (or something) can act on it. Depending on the state and confidence level, the action might be "do nothing" (low confidence), "surface a relevant CTA" (medium), or "trigger a personalised offer" (high). It is the bridge between classification and business outcome.

In the reference implementation: action.js looks up the visitor's state in an action-mapping table from config.js and returns the recommended action, success metric, owner, and a natural-language prescription with context-specific detail interpolated in.

Layer 6: Feedback Loop

Measures whether the actions taken actually worked. Did the personalised CTA increase conversions? Did Scanners who were shown navigation aids find what they needed? Without this, the system never learns whether its recommendations actually helped. The feedback loop sends outcome data back to refine signal weights, classification thresholds, and action rules over time.

In the reference implementation: temporal.js tracks recency, frequency, trend direction, and velocity across a visitor's session history, while 03-temporal-analysis.sql and supporting SQL queries handle the same at scale in BigQuery. A documented review cadence (weekly, monthly, quarterly) and defined rollback triggers govern when thresholds are recalibrated.

Layer 7 (Optional): Motivation Layer

Only applied when confidence is medium or high. Infers why the visitor is behaving the way they are (e.g. "price-sensitive", "risk-averse", "comparison shopping") based on which content clusters they focus on. This adds a qualitative dimension to the state label, allowing even more targeted responses. It is deliberately optional because motivation inference is less reliable than behavioural classification.

In the reference implementation: refinements.js first detects a content sub-type based on where engagement time is concentrated, then infers one of six motivations by combining the sub-type, state, and signal values. The confidence gate in confidence.js controls whether the inferred motivation is suppressed, flagged for review, or allowed to drive automated action modifiers.

How the layers chain together

The flow is a pipeline: GA4 events → structured signals → state label → confidence gate → action → outcome measurement → refinement. Each layer depends on the one before it, and the feedback loop at the end circles back to improve layers 2–5. The confidence gate (layer 4) is the key safety mechanism. It prevents the system from acting on weak evidence, so low-data sessions are observed rather than acted upon prematurely.

In the reference implementation: pipeline.js orchestrates the entire flow through a single entry point, evaluateVisitor. It accepts the raw session data and user history, then calls each layer in sequence: scoreAllSignals → assessTemporalContext → classifyByPriority → calculateConfidence → applyRefinements → resolveAction. The returned object contains the full evaluation: signals, temporal context, classification, confidence, refinements, and action plan. The SQL pipeline mirrors this sequence across six numbered query files (01 through 06), each corresponding to a layer, designed to run in order inside BigQuery.

3. Signal Model

The system needs a way to measure behaviour that is consistent across every session and every site. Without a defined signal model, classification would depend on ad hoc metrics that shift from report to report. These four signals provide the common language that makes everything downstream possible.

3.1 Core signals (primary)

1. Breadth (Exploration Volume)

How much the user explores.

Breadth should be calculated from:

unique pages viewed in the session
unique page types viewed in the session
unique topic clusters viewed in the session

Recommended calculation:

Raw breadth inputs
- unique_pages
- unique_page_types
- unique_topics
Suggested breadth score (0–10)
- 0–1 = 1 page only
- 2–3 = 2–3 pages, low variety
- 4–5 = moderate exploration
- 6–7 = broad exploration across multiple page types
- 8–10 = very broad exploration across many page types / topics

Breadth can be calibrated per site using percentiles once enough data exists.

Interpretation warning: High breadth does not always mean healthy exploration.

On a poorly structured site, high breadth often signals a lost user, someone clicking widely because they cannot find what they need, not because they are surveying options.

How to check: Cross-reference breadth with depth and progression. If breadth is high but depth is very low (≤ 2) and progression is zero, the user is more likely lost than exploring.

What to watch: The Scanner state captures this pattern. A spike in Scanner volume may indicate a site navigation problem rather than a traffic quality problem.

2. Depth (Engagement)

How deeply they engage.

Depth should be calculated from:

engagement time
active time on page
scroll depth
repeated long dwell on related content
high-attention actions such as video plays or file downloads where relevant

Recommended inputs:

engagement_time_seconds
avg_time_on_key_pages
avg_scroll_percent
deep_engagement_events

Suggested depth score (0–10):

0–2 = glance / shallow interaction
3–5 = moderate reading / review
6–8 = sustained engagement
9–10 = very deep attention, repeat deep reading, long dwell on key content

3. Progression (Intent Momentum)

How far they move toward meaningful action.

Progression should be calculated from:

movement toward evaluation pages
CTA clicks
form starts
form submits
booking actions
repeat sessions that move closer to conversion

Recommended inputs:

high_intent_page_views
cta_click_count
form_start
form_submit
booking_click
conversion_complete

Suggested progression score (0–10):

0 = no movement toward action
1–3 = weak progression, mostly orientation
4–6 = evaluation behaviour present
7–8 = strong action intent
9–10 = conversion or near-conversion completed

3.2 Derived signal

4. Clustering (Behavioural Coherence)

How concentrated behaviour is within a topic, pathway, or offer cluster.

This is the main signal that distinguishes broad, scattered browsing from coherent evaluation.

Clustering should be calculated from three components:

A. Topic concentration

What proportion of views fall within the dominant topic cluster.

Example:

if 7 of 10 page views are in one topic cluster, concentration is high

B. Topic switching

How often the visitor moves between unrelated topics.

Example:

homepage → service A → article → service B → about → article = high switching
homepage → service A → case study A → FAQ A → contact = low switching

C. Repeat cluster return

Whether the user repeatedly returns to the same cluster during the session or across sessions.

Example:

repeated visits to the same offer, proof pages, or pricing path = stronger clustering

Recommended inputs:

dominant_topic_share
topic_switch_count
repeat_cluster_visits
same_cluster_sequence_length

Suggested clustering score (0–10):

0–2 = highly scattered
3–5 = partially coherent
6–8 = clearly clustered
9–10 = strongly concentrated around one topic / pathway

A simple practical starting formula:

clustering_score = (dominant_topic_share * 10) - topic_switch_penalty + repeat_cluster_bonus

Where:

dominant_topic_share is expressed from 0 to 1 (e.g. 7 of 10 views in one cluster = 0.7)
topic_switch_penalty = min(topic_switch_count, 5) (capped to avoid overwhelming the score; each switch between unrelated clusters adds 1 point of penalty)
repeat_cluster_bonus = min(repeat_cluster_visits - 1, 3) (capped at 3; each return visit to the same cluster beyond the first adds 1 point of bonus)

Minimum signal floor: Do not apply topic_switch_penalty until the user has viewed 4 or more pages.

Below this threshold, a single topic switch (e.g. 2 pages in Cluster A, then 1 in Cluster B) is normal exploratory behaviour and should not be penalised. For short sessions, set topic_switch_penalty = 0 and rely on dominant_topic_share alone.

Example 1, sufficient data: a visitor views 10 pages, 7 in one cluster, switches topics twice, and returns to the primary cluster 3 times:

(0.7 * 10) - 2 + 2 = 7.0 → clearly clustered

Example 2, below signal floor: a visitor views 3 pages, 2 in Cluster A and 1 in Cluster B, with 1 topic switch:

(0.67 * 10) - 0 + 0 = 6.7 → penalty suppressed; score reflects concentration only

The exact weightings should be calibrated to the site once sufficient data exists. Start with these defaults and adjust.

3.3 Temporal signals (integrated)

Temporal signals track how behaviour changes across visits: how recently someone returned, how often they visit, and how quickly they move toward action. These are not separate decoration; they should directly shape classification and confidence.

This section defines the raw inputs. Section 11 (later in this document) covers how these inputs are used to track state transitions over time. For example, detecting that a Scanner is becoming an Evaluator across three sessions.

Recency

Time since last session.

Use recency to distinguish:

single exploratory visits
active evaluation windows
dormant / re-engaged prospects

Suggested recency bands:

0–2 days = highly recent
3–7 days = active consideration
8–30 days = delayed return
30+ days = dormant / re-engaged

Frequency

Number of sessions in a defined time period.

Suggested frequency bands:

1 session = single-session user
2–3 sessions in 7 days = active evaluator
4+ sessions in 14 days = high ongoing engagement

Velocity

How quickly a user moves toward action.

Examples:

first visit → conversion page in one session = high velocity
three sessions with increasing intent = medium velocity
repeated evaluation without stronger action = low velocity

Recommended temporal inputs:

session_count_7d
session_count_30d
days_since_last_session
time_to_first_high_intent_event
time_to_conversion
state_change_over_time

These should explicitly affect the Returning Evaluator and Re-engaged Prospect classifications, and should increase or reduce confidence in other states.

3.4 Score calibration

Scores should not remain arbitrary. They must be calibrated.

There are two valid approaches:

Option A: Fixed rules (best for early-stage / low data)

Use fixed score thresholds based on known business logic. This is easier to explain and debug.

Option B: Percentile-based normalisation (best once data volume grows)

Instead of fixed thresholds, compare each visitor's raw values against what is typical for your site. "Percentile-based" means ranking a value against historical data. For example, if a visitor's engagement time is higher than 80% of all sessions, they score in the 80th percentile.

Examples:

Breadth score of 8 = top 20% of page variety for this site
Depth score of 7 = above-average engagement for this content type

This avoids applying the same thresholds to very different sites.

Recommended approach:

start with fixed rules
migrate to percentile-based calibration once enough data exists

4. Context Weighting

Not all pages and actions are equal. A visitor clicking a CTA on a pricing page signals stronger intent than a visitor scrolling a blog post. Without weighting, the system would treat every click and every page as equally important, and a five-page blog reader would look the same as a five-page pricing evaluator. Context weighting adjusts signal scores based on where, how, and from where a visitor interacts.

These weights are multipliers. They increase or decrease the signal value of an action. They are applied during the classification process described in Section 7. You do not need to read Section 7 first; the tables below define the weights themselves.

Page types

Each page type carries a weight that increases or decreases the signal value of actions taken on it.

Page type	Intent weight	Effect
Homepage	0.5	Actions here are orientation; downweight toward progression
Blog / resource	0.6	Useful for depth, but low direct conversion signal
Service / product	1.0	Baseline evaluation behaviour
Case study	1.2	Proof-seeking; upweight depth and clustering
Pricing	1.5	Strong intent signal; upweight progression
Contact / booking	2.0	Conversion action; maximum progression weight

Action strength

Each action type carries a signal weight reflecting how strongly it indicates intent.

Action	Weight	Signal contribution
Page view	0.2	Breadth only
Scroll (≥75%)	0.5	Depth
CTA click	1.0	Progression
Form start	1.5	Strong progression
Form submit	2.0	Conversion / maximum progression

Source context

Traffic source applies a bias to the initial state probability, not a hard override.

Source	Bias	Rationale
Direct / bookmark	+1 to progression baseline	Returning with purpose suggests prior awareness
Organic search	Neutral	Intent varies; let behaviour determine state
Social media	-1 to progression baseline	Typically exploratory; higher Scanner probability
Referral	+1 to depth baseline	Trust transfer from referring source
Paid search	+1 to progression baseline	Keyword intent suggests evaluation

These weights are starting defaults. Calibrate them against actual conversion data once enough volume exists.

Element-level metadata (micro-signals)

Scalability is not just about the page, it is about the elements on the page. Individual interactive elements can carry their own weight, independent of the page they sit on.

When an element carries an element_weight, it overrides the page’s intent weight for that specific interaction. When absent, the page weight applies as the default.

Element role	Default weight	Example
Progression	2.0	“Get a Quote” button, “Book a Demo”
Depth	0.5	“Read More” link, “See Details”
Tool use	0.8	Calculators, configurators
Navigation	0.3	Menu links, breadcrumbs
Social	0.3	Share buttons, social links

Implementation: add data-element-role and optionally data-element-weight to interactive HTML elements. GTM reads these attributes on click events and sends them to GA4 as custom parameters.

<button data-element-role="progression" data-element-weight="2.0">Get a Quote</button>
<a href="/blog/..." data-element-role="depth" data-element-weight="0.5">Read More</a>

The effective weight for progression scoring becomes:

effective_weight = element_weight ?? page_intent_weight
progression_contribution = action_weight × effective_weight

This means a “Get a Quote” button (element_weight 2.0) on a blog page (page_weight 0.6) contributes 2.0 to progression, not 0.6. The element’s own significance wins.

5. State Model

States are the point of the system. A dashboard full of signal scores is useful to an analyst, but it does not tell a product manager what to fix or a CRM team who to follow up with. Named states translate numerical patterns into plain descriptions of visitor behaviour that any team can understand and act on.

The system uses 10 core states with explicit signal definitions.

Each state is determined using:

Breadth score (0–10)
Depth score (0–10)
Progression score (0–10)
Clustering score (0–10)
Temporal context where relevant

State definitions

Note on score overlaps: The score ranges below intentionally overlap at boundaries. Real user behaviour does not fall neatly into boxes. When a visitor's scores could match two states, Section 7 provides a priority order to resolve the tie (e.g. "Engaged" always takes priority over "Focused Evaluator"). If no state's criteria are fully met, assign the closest match with low confidence and flag for review.

1. Mismatch

Breadth ≤ 2
Depth ≤ 2
Progression = 0
Clustering irrelevant
Usually 1 session only

Immediate exit or no meaningful engagement.

2. Scanner

Breadth ≥ 6
Depth ≤ 3
Clustering ≤ 3
Progression ≤ 2
Usually low velocity, low continuity

Wide but shallow exploration.

3. Explorer

Breadth 4–7
Depth 3–6
Clustering 3–6
Progression 2–4
May show increasing structure within session or across 2 sessions

Exploration with emerging intent.

4. Comparator

Breadth 4–7
Depth 3–5
Clustering ≥ 5 across competing options / pathways
Progression 3–5
Often includes repeat visits to proof, pricing, or alternative offers

Comparing multiple options.

5. Evaluator

Breadth 3–6
Depth ≥ 6
Clustering ≥ 5
Progression 4–6
May occur in one strong session or repeated sessions within 7 days

Serious evaluation.

6. Focused Evaluator

Breadth 2–4
Depth ≥ 7
Clustering ≥ 7
Progression ≥ 6
High velocity or repeated strong cluster return

Highly aligned, strong intent.

7. Hesitant

Progression ≥ 6 (form start / CTA / booking step)
No completion
Depth ≥ 4
Can occur in one session or repeated attempts over 7–14 days

Intent present but interrupted.

8. Stalled

Breadth 3–6
Depth 4–6
Progression ≤ 3
Repeated loops or repeated low-progress sessions
Low improvement over time

Confusion, overload, or structural friction.

Important distinction: Stalled vs. Frustrated. A Stalled user has intent but lacks clarity (revisiting the same pages, looping between sections, failing to progress). A Frustrated user is blocked by technical or UX failures. To distinguish between them, monitor for friction signals:

rage_click_count: rapid repeated clicks on the same element (3+ clicks within 2 seconds)
dead_click_count: clicks on non-interactive elements that produce no response
form_error_count: validation errors encountered during form completion
high_layout_shift: significant content movement during page load (Cumulative Layout Shift > 0.25, a measure of how much visible page content shifts unexpectedly during loading)

If friction signals are present alongside Stalled criteria, classify as Stalled (Friction) sub-type. This changes the recommended action from "simplify navigation" to "fix the broken interaction", a UX engineering problem, not a content strategy problem.

9. Engaged (Committed)

Progression ≥ 8 (conversion)
Continued activity post-conversion
May include trust / process validation after action

User has acted and is validating or onboarding.

10. Returning Evaluator

2+ sessions within 7 days, or 3+ sessions within 30 days
Increasing depth, clustering, or progression over time
No completed conversion yet

Intent strengthening across sessions.

Special temporal state: Re-engaged Prospect

This may be treated as a sub-type of Explorer, Evaluator, or Returning Evaluator.

Typical pattern:

gap of 30+ days
then a renewed visit with medium-high clustering and progression

This matters because the visitor is not simply "new" or "returning"; they are reactivated.

6. Optional Refinement Layers

6.1 Content Sub-types (Optional)

Each core state can carry a sub-type label that describes the content orientation of the behaviour, not just its intensity. Sub-types are determined by which page types and content roles dominate the session.

Sub-type	Determined by	Example
Proof-focused	Majority of depth on case studies, testimonials, or results pages	An Evaluator spending 70% of engagement time on case studies
Trust-focused	Concentration on about, team, credentials, or review pages	A Hesitant user who revisits the "About us" and "Our team" pages before returning to the form
Price-focused	Repeated or deep engagement with pricing, comparison, or plan pages	A Comparator returning to the pricing page across two sessions
Resource-seeking	Majority of actions on downloads, guides, tools, or documentation	An Explorer downloading three whitepapers but not visiting any service pages

Sub-types are optional. Implement them only when the action layer needs to differentiate which kind of content to surface. For example, sending a proof-led follow-up to a proof-focused Evaluator vs. a pricing summary to a price-focused Comparator.

Sub-types do not affect state classification or priority. They inform the content of the response, not the type of response.

6.2 Motivation Signals (Optional Layer)

The framework may apply a lightweight motivation signal after state assignment. This is an inference layer, not a replacement for state classification.

Use motivation only to refine action precision:

State = what behaviour is doing now
Motivation signal = what behaviour is most consistent with

Recommended motivation categories (keep small)

Motivation signal	Typical behavioural pattern
Curiosity-driven	Broad exploration, low commitment, limited progression
Value-driven	Deep engagement with proof/outcome content and evaluation behaviour
Risk-sensitive	Strong intent signals with hesitation before completion
Confusion-driven	Repeated loops, switching, and low forward movement
Overload-sensitive	Deep dwell and repeated review without clear progression
Urgency-driven	Fast movement to high-intent actions with minimal exploration

Guardrails (must follow)

Behaviour first: assign state before motivation.
Confidence gate: only assign motivation when state confidence is medium or high (4+).
Minimal set: do not expand motivation categories unless a new category changes action.
Action-linked only: if a motivation label does not alter response, remove it.
No psychological overreach: describe behavioural consistency, not internal truth.

What should not be added

Do not add abstract, non-operational labels (for example, "status-seeking" or "identity-driven") unless there is a reliable behavioural proxy and a distinct action pathway.

Do not use motivation as a primary classifier, and do not assign motivation when data is sparse or confidence is low.

7. Classification Logic

The state model defines what each state looks like. This section defines how the system actually decides which state to assign: the priority order, the scoring method, and what happens when a visitor's signals are ambiguous. Without clear rules, two implementations of the same framework could classify the same visitor differently.

Important note on thresholds

All thresholds (e.g. page count, time, events) should be calibrated per site. The example rules below are starting points only. Different products, traffic types, and session lengths will require adjustment.

Signal scoring approach (recommended)

Instead of relying only on hard rules, assign scores:

Breadth score (0–10)
Depth score (0–10)
Progression score (0–10)
Clustering score (0–10)

Then map score ranges to states. This improves flexibility and reduces brittle classification.

Recommended scoring method

Step 1: calculate raw metrics

Examples:

unique pages
unique topics
engagement time
average scroll
CTA clicks
form starts / submits
topic switch count
return visits

Step 2: convert to normalised scores

Use either:

fixed score mapping, or
percentile-based normalisation against historical site behaviour

Step 3: apply context weighting

Adjust signals based on:

page type importance
action strength
source bias
device / session context where relevant

Step 4: assign likely state

Apply states in the following priority order (highest priority first). Evaluate each rule top-down; assign the first state whose criteria are met.

Engaged: conversion completed (progression ≥ 8). Overrides all other states.
Hesitant: high-intent action started but not completed (progression ≥ 6, no conversion). Overrides Focused Evaluator.
Returning Evaluator: temporal criteria met (2+ sessions in 7 days or 3+ in 30 days) with increasing signals and no conversion. Overrides single-session states.
Focused Evaluator: narrow, deep, high-progression behaviour (breadth 2–4, depth ≥ 7, clustering ≥ 7, progression ≥ 6).
Evaluator: serious evaluation with depth and clustering (depth ≥ 6, clustering ≥ 5, progression 4–6).
Comparator: evaluation across competing options (breadth 4–7, depth 3–5, clustering ≥ 5, progression 3–5).
Stalled: moderate engagement with low progression and repeated loops (breadth 3–6, depth 4–6, progression ≤ 3).
Scanner: wide but shallow (breadth ≥ 6, depth ≤ 3, clustering ≤ 3).
Explorer: moderate exploration with emerging structure (breadth 4–7, depth 3–6).
Mismatch: minimal engagement (breadth ≤ 2, depth ≤ 2, progression = 0).

If no state criteria are fully met, assign the closest match and flag confidence as low. Where scores fall between two adjacent states, assign a hybrid classification (see Hybrid States below).

Step 5: assign confidence score

Every state assignment must be paired with a confidence score.

Step 6 (optional): assign motivation signal

Only after state and confidence are assigned:

if confidence is low (0–3): do not assign motivation
if confidence is medium/high (4–10): assign one primary motivation signal

Motivation is secondary metadata that refines response content. It must not override state priority or confidence logic.

Example rules (raw metric thresholds)

The state definitions in Section 5 use scored thresholds (e.g. "Breadth ≥ 6"). The examples below show how raw metrics translate into those scores under fixed-rule scoring (Section 3.4, Option A). These are starting defaults; calibrate to your site.

Scanner (raw metrics → Breadth score ≥ 6, Depth score ≤ 3):

unique pages ≥ 6
avg engagement time < 20s
clustering score ≤ 3
CTA clicks = 0

Evaluator (raw metrics → Depth score ≥ 6, Clustering score ≥ 5):

service or case study pages ≥ 3
engagement time > 60s
clustering score ≥ 5
at least one CTA click or high-intent page view

Hesitant (raw metrics → Progression score ≥ 6, no conversion):

form_start = true
form_submit = false
progression score ≥ 6

Returning Evaluator (raw metrics + temporal criteria):

sessions_in_7_days ≥ 2 OR sessions_in_30_days ≥ 3
depth or progression score trend is increasing across sessions
conversion_complete = false

Hybrid states

Users may exhibit signals consistent with multiple states simultaneously. When the primary state accounts for less than 70% of the signal weight, classify as a hybrid.

Example: a visitor with breadth 5, depth 5, clustering 5, progression 3 may score as:

60% Explorer (moderate breadth and depth, low progression)
40% Evaluator (clustering and depth suggest emerging evaluation)

When a hybrid classification occurs, the system should store:

primary state (highest signal fit)
secondary state (next closest fit)
confidence score for each
combined confidence (use the primary state's confidence, reduced by 1 point for ambiguity)

The action layer should respond to the primary state but avoid actions that would be counterproductive for the secondary state. For example, if a visitor is 60% Explorer / 40% Evaluator, guide them toward deeper evaluation content rather than immediately pushing for conversion.

8. Confidence Scoring

Each classification includes a confidence score. Confidence determines how strongly the system acts on a classification. It is not optional metadata. Acting on a weak classification wastes effort or, worse, annoys a visitor with the wrong intervention. Confidence is what separates a system that guesses from one that knows when to wait.

Confidence calculation

Confidence is calculated from five factors, each scored 0–2:

Factor	0 (weak)	1 (moderate)	2 (strong)
Signal count	≤ 2 distinct signal types observed	3–4 distinct signal types	5+ distinct signal types
Signal strength	Only passive actions (views, scrolls)	Mix of passive and active (CTA clicks)	Active high-intent actions (form starts, bookings)
State clarity	Primary and secondary states within 20% of each other	Primary state clearly leads but secondary is plausible	Primary state dominant, no close competitor
Session depth	< 30 seconds or ≤ 2 pages	30s–2min, 3–5 pages	> 2min, 5+ pages
Temporal consistency	Single session, no history	2 sessions with consistent direction	3+ sessions with reinforcing pattern

"Signal types" means distinct categories of observed behaviour: page views, scrolls, CTA clicks, form starts, form submits, downloads, booking clicks, etc. Multiple page views count as one signal type; a page view plus a CTA click counts as two.

Confidence score = sum of all factors (0–10)

Confidence bands

Low (0–3): short session, weak signals, multiple plausible states. Insufficient evidence for reliable classification.
Medium (4–6): enough signal for a useful interpretation, but some ambiguity remains. Classification is directionally useful.
High (7–10): strong, repeated, coherent signals with clear state fit. Classification can drive automated actions.

How confidence governs action

Confidence	Permitted actions	Examples
Low (0–3)	Reporting and aggregate analysis only	Include in state distribution dashboards; do not trigger individual interventions
Medium (4–6)	Lightweight nudges and analyst review	Adjust content recommendations; flag for manual review; add to nurture segments; optional motivation tag may be applied
High (7–10)	Direct automated action and personalisation	Trigger CRM workflows; personalise page content; alert sales team; motivation tag can drive targeted response variant

Motivation assignment rule:

Low confidence: no motivation tag
Medium confidence: one motivation tag allowed with analyst review
High confidence: one motivation tag can drive automated response variants

Examples

Low confidence (score 1): A 2-page visit with one CTA click, 15 seconds on site. Signal types: page view + CTA click = 2 types (score 0). Strength: mix of passive and active (score 1). Clarity: ambiguous between Scanner and Explorer (score 0). Session depth: < 30s, 2 pages (score 0). Temporal: single session (score 0). Total = 1. Not enough evidence to act on.
Medium confidence (score 5): A single session lasting 3 minutes across 5 pages, with deep scrolling on two service pages, one CTA click, and behaviour concentrated in one topic cluster. Signal types: page view + scroll + CTA click = 3 types (score 1). Strength: CTA click present (score 1). Clarity: likely Explorer, but Evaluator is plausible (score 1). Session depth: 3 minutes, 5 pages (score 2). Temporal: single session (score 0). Total = 5. Enough to adjust content recommendations, but not enough to trigger automated outreach.
High confidence (score 10): 3 sessions over 5 days, repeated cluster visits, form starts, deep scrolling, and increasing progression. Signal types: page view + scroll + CTA click + form start + cluster return = 5 types (score 2). Strength: form starts present (score 2). Clarity: clearly Returning Evaluator (score 2). Session depth: sustained engagement across sessions (score 2). Temporal: 3 sessions with reinforcing pattern (score 2). Total = 10. High enough to trigger CRM workflows and personalisation.

9. Interactive State Classifier

Use this tool to test how the four signal scores map to states. Adjust the sliders to see the classification update in real time. Temporal states (Returning Evaluator, Re-engaged Prospect) require multi-session data and cannot be tested with single-session scores alone. The confidence score shown here is a simplified approximation . The full engine (Section 8) uses five factors including session metadata and temporal consistency that sliders alone cannot capture.

Breadth 5/10 Depth 5/10 Progression 3/10 Clustering 5/10

Explorer

Confidence: Medium

Strengthen pathways from broad discovery content into relevant offers.

10. Action Layer

Each state must map to a specific, testable action.

Principle

Classification without response is only reporting.
The system becomes useful when each state changes what the business does.

Types of actions

UX / product actions

Change the experience itself.
Examples:

improve top-level hierarchy
add guided entry paths
reduce form friction
simplify navigation
surface trust elements earlier

CRM / communication actions

Change what is said, when, and to whom.
Examples:

send proof-led follow-up
send reassurance after a form drop-off
nurture low-risk converters toward stronger actions

Strategic actions

Change internal decision-making.
Examples:

identify which traffic sources produce evaluators vs scanners
identify which pages create hesitation or overload
prioritise fixes based on which state blocks conversion most often

State + motivation refinement examples

Use these only when the motivation confidence gate is met:

Hesitant + Risk-sensitive → add reassurance (guarantees, proof, process clarity)
Hesitant + Confusion-driven → simplify UX and next-step guidance
Hesitant + Overload-sensitive → reduce options and shorten decision path
Scanner + Curiosity-driven → improve guided entry and value framing
Evaluator + Value-driven → surface outcomes, case studies, and implementation detail
Focused Evaluator + Urgency-driven → remove distractions and shorten conversion path

If motivation does not change the action, keep state-level action only.

Example mappings with metrics

Mismatch
- Action: review traffic source quality and landing page relevance. If Mismatch volume is high from a specific source, the source may be poorly targeted. If Mismatch volume is high on a specific landing page, the page may be failing to communicate relevance.
- Metric: reduction in Mismatch share from targeted sources; increase in sessions progressing beyond the first page
Scanner
- Action: add guided entry and clearer value proposition above the fold
- Metric: increase in deeper-session rate and evaluator-state share
Explorer
- Action: strengthen pathways from broad discovery content into relevant offers
- Metric: increase in clustered navigation and progression score
Comparator
- Action: clarify differentiation, side-by-side proof, and concise summaries
- Metric: increase in focused evaluator or conversion rate
Evaluator
- Action: add case studies, implementation details, FAQs, and proof elements
- Metric: increase in high-intent page views and conversion starts
Focused Evaluator
- Action: reduce distractions, shorten path to contact / purchase
- Metric: increase in conversion completion rate
Hesitant
- Action: reduce fields, clarify next step, add reassurance around process and commitment
- Metric: form completion rate and drop-off reduction
Stalled
- Action: simplify navigation, reduce loops, add stronger recommendation pathways
- Metric: increase in forward progression and reduction in repeated loops
Stalled (Friction)
- Action: fix the broken interaction. Resolve rage-click targets, make dead-click elements interactive or remove misleading affordances, reduce form validation errors, address layout shift issues
- Metric: reduction in friction event counts; increase in progression from previously blocked pages
Returning Evaluator
- Action: reinforce differentiation and provide stronger closing reassurance
- Metric: increase in conversion among repeat visitors
Engaged (Committed)
- Action: support onboarding, confirmation, and post-conversion confidence
- Metric: faster onboarding completion and reduced post-conversion abandonment

11. Temporal Layer

The system should not treat sessions as isolated events. A single session rarely tells the full story. Someone who visits three times in a week with increasing depth is fundamentally different from someone who visits once and leaves. Without tracking behaviour over time, the system would miss returning evaluators, chronic hesitation, and re-engagement after a long gap.

Section 3.3 defines the temporal signal inputs (recency, frequency, velocity). This section defines how those inputs are used to track state transitions and evaluate trend direction across sessions.

What temporal analysis should answer

is this person becoming more serious over time?
are they stuck in repeated hesitation?
did they disappear and return with stronger intent?
is evaluation accelerating or decaying?

Example transition patterns

Scanner→ Explorer→ Evaluator

Explorer→ Comparator→ Focused Evaluator

Evaluator→ Hesitant→ Returning Evaluator→ Engaged

Explorer→ dormant→ Re-engaged Prospect

Transition significance

A state matters more when it changes predictably.
For example:

many Scanners becoming Explorers suggests the top-level proposition is improving
many Evaluators becoming Hesitant suggests the conversion path is the bottleneck
many Returning Evaluators failing to convert suggests unresolved trust or pricing friction

Lifecycle phase mapping (optional overlay)

The states and temporal signals already described can be grouped into three broad lifecycle phases. This is not a separate classification. It is a lens over existing state data that clarifies which system should own the response (product, CRM, or sales) and what kind of action is appropriate.

Lifecycle phase	Typical states	What it means
Acquisition	Mismatch, Scanner, Explorer	First contact. The visitor is orienting. The question is whether the proposition is relevant and whether the site helps them find what they need.
Evaluation	Comparator, Evaluator, Focused Evaluator, Hesitant, Stalled	Active consideration. The visitor has intent but has not yet acted. The question is whether the experience removes enough friction and builds enough confidence to convert.
Retention	Engaged, Returning Evaluator, Re-engaged Prospect	Post-conversion or repeat engagement. The question shifts from "will they act?" to "will they stay, deepen, or return?"

Phase is determined by the visitor's current state, not by calendar time. A visitor may reach Evaluation in their first session or remain in Acquisition across several visits. The phase changes when the state changes.

This mapping is useful when the action layer needs to route responses to different teams or systems. For example, Acquisition-phase issues are typically product or content problems, Evaluation-phase issues are conversion path problems, and Retention-phase issues are CRM or onboarding problems.

Recommended temporal thresholds

Returning Evaluator: 2+ sessions in 7 days, or 3+ in 30 days, with increasing depth or progression
Re-engaged Prospect: 30+ day gap followed by medium-high clustering and progression
Persistent Hesitation: 2+ interrupted conversion attempts within 14 days
Chronic Stall: 3+ low-progression sessions with repeated loops and no improvement

12. Implementation (GA4)

The framework is tool-agnostic in principle, but it needs a concrete data layer to work. GA4 with Google Tag Manager is the recommended default because most sites already have it, it supports custom events and dimensions natively, and its BigQuery export provides the raw event-level data that the scoring pipeline requires.

Required events

page_view
scroll
cta_click
form_start
form_submit
resource_download
navigation_click
section_view
booking_click where relevant
conversion_complete

Friction events (recommended)

These are not required for core state classification but are needed to distinguish Stalled from Stalled (Friction); see Section 5, State 8.

rage_click: 3+ rapid clicks on the same element within 2 seconds
dead_click: click on a non-interactive element with no system response
form_error: validation error displayed to the user during form completion
high_layout_shift: Cumulative Layout Shift exceeds 0.25 during a page view (significant unexpected content movement)

Required parameters

page_type
page_topic
conversion_stage
content_role
offer_id where relevant
traffic_source_group

GA4 property limits: Standard GA4 properties allow up to 50 event-scoped custom dimensions and 25 user-scoped custom dimensions. This framework uses 6 required event parameters plus friction events. Plan your custom dimension budget early. If the site already uses 40+ custom dimensions for other needs, consolidate where possible or use BigQuery export (where these limits do not apply). Audit existing custom dimensions before rollout so you do not hit the limit halfway through implementation.

Source-integrated taxonomy (CMS-first model)

Strategic meaning is a property of the content, not a secondary layer. For the clustering signal to work, every page must carry its own taxonomy metadata, assigned within the CMS at the moment of creation.

When a page is published, the CMS assigns three required fields stored as hidden metadata in the HTML:

page_type: the structural type (e.g. service, blog, pricing)
page_topic: the topic cluster (e.g. strategy, proof, pricing)
intent_weight: business significance (0.5–2.0)

GTM reads this metadata directly from the page and attaches it to every GA4 event. The system then uses these labels to calculate clustering and progression scores. No external lookup table is required.

Why CMS-first is the only scalable approach:

Zero blind spots: Every new page is classified the moment it goes live. There is no lag where a user visits a page that has not been added to a separate register.

No maintenance burden: No separate, massive lookup table to manage. The taxonomy is part of the website’s structure.

Reliable clustering: The clustering signal depends on seeing consistent topic tags. CMS-embedded tags ensure the signal is always accurate and never null.

Infinite scalability: Whether you have 10 pages or 10,000, the system scales because the metadata is distributed across the site rather than trapped in a central spreadsheet.

Example topic clusters: strategy, proof, pricing, onboarding, product A, product B.

Example logic:

if most page views belong to one topic cluster and switching is low, clustering is high
if page views are spread across many unrelated clusters with frequent switches, clustering is low

Taxonomy maintenance

The clustering signal is only as reliable as the taxonomy behind it. In the CMS-first model, the CMS publishing workflow is the primary gate; pages should not go live without page_type and page_topic assigned.

Required process:

Require metadata at publish: the CMS must include page_type, page_topic, and intent_weight as required fields. Untagged pages cannot exist if the publishing workflow enforces this.
Audit monthly: run a report of pages viewed in the last 30 days that have a page_topic of “General” (the default). Any page receiving more than 100 views without a proper tag is a blind spot that must be classified.
Track coverage: maintain a simple metric: tagged_pages / total_pages_with_traffic. Target ≥ 95% coverage. Below 90%, clustering scores should be treated as unreliable.

Default for untagged pages: Any page without CMS metadata should default to page_topic: General and intent_weight: 0.5. This prevents null values from breaking score calculations while making untagged pages visible in audits.

Common failure mode (without CMS-first): The marketing team publishes 10 blog posts without tagging them. Users who read those posts appear to have scattered, low-clustering behaviour, creating “false Scanners” or “false Mismatches.” The system then recommends navigation improvements for a problem that is actually a taxonomy gap. The CMS-first model prevents this entirely.

Fallback: external taxonomy register

For legacy sites that cannot yet embed CMS metadata, an external register provides a fallback. The register maps URL patterns to taxonomy values and is hosted in Google Sheets or Airtable for CSV export or BigQuery sync.

URL pattern	Page type	Topic cluster	Journey role	Intent weight	Key progression event
`/`	Homepage	Brand	Orientation	0.5	`nav_click_services`
`/services/consulting/*`	Service	Strategy	Evaluation	1.0	`cta_click_quote`
`/case-studies/*`	Case study	Proof	Validation	1.2	`resource_download`
`/pricing`	Pricing	Commercial	High intent	1.5	`form_start_trial`
`/blog/ai-trends/*`	Blog	AI / Tech	Awareness	0.6	`newsletter_signup`
`/contact-success`	Confirmation	Admin	Post-action	2.0	`conversion_complete`

How the taxonomy feeds the clustering signal: When a page loads, the GTM data layer reads the CMS-embedded metadata (or the register as fallback) and sends page_topic and page_type with the GA4 page_view event. In BigQuery, clustering is calculated from the sequence of page_topic values in each session. The SQL pipeline uses a COALESCE pattern: CMS-embedded values are used first; the external register provides values only when CMS metadata is absent.

Processing options

Basic setup

GA4 event collection
Looker Studio reporting
manual or spreadsheet-based scoring

Intermediate setup

GA4 + BigQuery export (BigQuery is Google's cloud data warehouse that stores raw GA4 event data for flexible querying)
SQL-based score calculation
dashboard state reporting

Advanced setup

BigQuery + warehouse logic
near-real-time classification
automated CRM or UX triggers based on high-confidence states

Recommended implementation sequence

Define taxonomy (page types, topics, offers)
Implement events and parameters in GTM
Validate data quality in GA4
Build score calculations
Test state assignment against real sessions
Add confidence scoring
Connect states to actions and metrics

13. Output Layer

Classification is only useful if the right people see the right information at the right time. The output layer exists because a well-built model that lives inside a database query and never reaches a decision-maker has zero business value.

Dashboards should show:

distribution of states
transition flows
conversion by state
drop-off by state
confidence by state
source mix by state

Problem-first reporting views

The system should not only be organised around states. It should also be organised around business problems.

Examples:

High bounce / low engagement → inspect Mismatch and Scanner
Traffic but weak progression → inspect Explorer and Stalled
Strong evaluation but weak conversion → inspect Evaluator, Comparator, Hesitant
Repeat visits without action → inspect Returning Evaluator and chronic hesitation

GA4 data thresholding caveat

GA4 applies data thresholding to reports when user counts in a segment are small, suppressing rows to protect user privacy. This means state distribution dashboards may show incomplete or misleading data for low-volume segments. For example, a "Focused Evaluator" segment with only 12 users in a reporting period may be hidden entirely.

Mitigations:

Use wider date ranges to increase user counts per segment
Use BigQuery export for unsampled, unthresholded data
Do not draw conclusions from state segments with fewer than 30 users in the reporting period

Output principle

Every report should answer:

what state is happening?
how confident are we?
what should we change?
how will we know if it worked?

Prescriptive output

The final dashboard should not just report data; it should issue instructions. Each state classification, when combined with aggregate context, generates a natural-language prescription that tells the team exactly what to do.

Examples:

Hesitant + High Confidence: “Reduce form friction on the pricing contact form. 47 users started but did not complete conversion in the last 7 days.”
Scanner + Medium Confidence: “Add guided entry points on the homepage. 312 sessions showed wide browsing with no depth.”
Stalled (Friction) + High Confidence: “Fix the broken CTA on the services page. 23 users were blocked by UX failures.”

Prescriptions are template-based, not AI-generated. Each state maps to an instruction template with placeholders (e.g. {sessionCount}, {topBlockedPage}) that are interpolated from aggregate data at query time.

14. Feedback Loop

Any fixed set of thresholds will drift as the site, the traffic, and the market change. The feedback loop exists to make sure the system stays accurate over time rather than slowly becoming wrong in ways nobody notices.

The system improves through a continuous cycle:

Observe

→

Classify

→

Act

→

Measure

→

Refine

↻ continuous cycle

Observe: collect behavioural data through GA4 events and parameters.
Classify: assign states and confidence scores using the signal model and classification logic.
Act: trigger the appropriate response (UX change, CRM action, strategic decision) based on state and confidence.
Measure: track the defined success metric for each action (Section 10). Did the intervention change behaviour in the expected direction?
Refine: adjust thresholds, weights, and state definitions based on measured outcomes.

What refinement looks like in practice

If a high proportion of Explorers convert without passing through Evaluator, the Explorer → Evaluator boundary may be set too high. Lower the clustering or depth threshold.
If Hesitant users rarely convert even after intervention, investigate whether the form friction is structural (too many fields, unclear next step) rather than behavioural.
If confidence scores cluster around medium with few high-confidence classifications, the signal model may need additional inputs or the score thresholds may be too conservative.

Recommended review cadence

Weekly: review state distributions and conversion rates by state.
Monthly: review confidence distributions, action effectiveness, and threshold accuracy.
Quarterly: recalibrate score ranges using percentile analysis (Section 3.4) and reassess state definitions against actual user journeys.

15. Constraints and Limitations

Every model has boundaries. Being explicit about what this system cannot do is just as important as explaining what it can. It prevents overconfidence in the output and sets realistic expectations for anyone using the results to make decisions.

Analytical constraints

Probabilistic, not deterministic. All classifications are probabilistic estimates. No behavioural signal guarantees intent. Treat states as the most likely interpretation, not a fact about the visitor.
Multi-signal required. A single data point (one page view, one click) is insufficient for reliable classification. Require at least 3 signals before assigning any state above low confidence.
Small data = low confidence. Sites with fewer than 500 sessions per month will have limited calibration data. Use fixed-rule scoring (Section 3.4, Option A) and avoid percentile-based normalisation until volume grows.
Motivation is inferred, not observed. Motivation tags are optional secondary signals. They must never replace behavioural states as the primary classification layer.
No low-confidence motivation. Do not assign motivation tags when confidence is low (0–3) or when signal evidence is sparse.
Avoid psychological overreach. Do not claim internal mental truth; only describe behavioural patterns consistent with a motivation hypothesis.

Data quality constraints

Taxonomy completeness. The clustering signal depends entirely on every important page being tagged with a page type and topic cluster. Untagged pages create blind spots that distort clustering scores. Audit taxonomy coverage before trusting clustering outputs.
Event reliability. Custom events (CTA clicks, form starts, section views) require correct GTM implementation. Missing or double-firing events will corrupt progression and depth scores. Validate event accuracy in GA4 DebugView before using scores operationally.
Cross-device and cross-session identity. GA4 identity relies on cookies and optional User-ID (which links sessions when a user logs in). If visitors switch devices, browse in private mode, or clear cookies, they appear as new users. This splits their behavioural history. It mainly affects temporal signals (frequency, velocity, and returning evaluator detection). Treat this as a known limitation and avoid over-interpreting single-session classifications on high cross-device sites.
B2B-specific risk. In B2B journeys, people often research on one device, revisit on another, and return through a shared link. Without a logged-in User-ID, the "Returning Evaluator" pattern can fragment into separate "new Scanner" sessions. That weakens the temporal layer. For B2B sites, User-ID (via gated content, account login, or CRM integration) is a practical requirement for reliable multi-session tracking.

Privacy and consent

Session-level and cross-session tracking requires user consent under GDPR (the EU's General Data Protection Regulation), ePrivacy, and similar data protection regulations. Ensure consent management is in place before collecting the events described in this framework.
Do not store personally identifiable information (PII) in GA4 custom parameters. State classifications should be based on behavioural patterns, not individual identity.
Where consent is not granted, the system should degrade gracefully to aggregate-only reporting with no individual state assignment.

Exclusions

Bot and crawler traffic. Filter known bots before scoring. GA4 excludes known bots by default, but verify that automated traffic is not inflating Scanner or Mismatch counts.
Internal traffic. Exclude staff and internal IP ranges to avoid contaminating state distributions.

16. Final Summary

This system transforms website analytics from passive observation into an active decision framework. It does this by:

Classifying visitors into behavioural states using four core signals (breadth, depth, progression, clustering) and temporal context, replacing vague metrics with actionable categories.
Optionally refining response precision with motivation signals (for medium/high-confidence classifications only), so actions can be tailored without overclaiming psychological certainty.
Quantifying certainty through confidence scoring, so that the strength of response matches the strength of evidence.
Connecting every state to a specific, testable action, ensuring that classification always leads to a concrete business response with a measurable outcome.
Learning continuously through a feedback loop that refines thresholds, weights, and state definitions as data accumulates.

Key distinction

Most analytics systems describe what happened.

This system decides what to do next.

Appendix A: BigQuery Reference Implementation

This SQL provides a starting implementation for calculating the four core signal scores from GA4 BigQuery export data. It assumes the taxonomy register has been uploaded as a BigQuery table (manual_taxonomy_lookup) with columns: url_pattern, page_type, topic_cluster, intent_weight.

Prerequisites:

GA4 BigQuery export enabled
Taxonomy register uploaded as a lookup table
Date range adjusted to match your reporting period

WITH

-- 1. Extract raw events with session identity
raw_events AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    event_name,
    TIMESTAMP_MICROS(event_timestamp) AS event_time,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS url,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled') AS scroll_percent
  FROM `your-project.analytics_123456.events_*`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
    AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
),

-- 2. Map events to taxonomy (REGEXP_CONTAINS matches URLs against patterns, e.g. /blog/.* matches any blog page)
-- NOTE: For large taxonomy tables, pre-compute the join or use exact URL matching
-- with a materialised lookup to avoid expensive regex scans on every query.
mapped_events AS (
  SELECT
    e.*,
    COALESCE(t.topic_cluster, 'General') AS topic_cluster,
    COALESCE(t.page_type, 'Unknown') AS page_type,
    COALESCE(t.intent_weight, 0.5) AS intent_weight
  FROM raw_events e
  LEFT JOIN `your-project.your_dataset.manual_taxonomy_lookup` t
    ON REGEXP_CONTAINS(e.url, t.url_pattern)
),

-- 3. Breadth score: unique pages, page types, and topic clusters per session
breadth_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    COUNT(DISTINCT url) AS unique_pages,
    COUNT(DISTINCT page_type) AS unique_page_types,
    COUNT(DISTINCT topic_cluster) AS unique_topics
  FROM mapped_events
  WHERE event_name = 'page_view'
  GROUP BY 1, 2
),

-- 4. Depth score: engagement time and scroll depth per session
depth_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    SUM(engagement_time_msec) / 1000.0 AS engagement_time_seconds,
    AVG(CASE WHEN scroll_percent IS NOT NULL THEN scroll_percent END) AS avg_scroll_percent,
    COUNTIF(event_name IN ('resource_download', 'video_start')) AS deep_engagement_events
  FROM mapped_events
  GROUP BY 1, 2
),

-- 5. Clustering: topic concentration, switching, and repeat returns
clustering_prep AS (
  SELECT
    user_pseudo_id,
    session_id,
    topic_cluster,
    event_time,
    COUNT(*) OVER(PARTITION BY user_pseudo_id, session_id) AS total_views,
    COUNT(*) OVER(PARTITION BY user_pseudo_id, session_id, topic_cluster) AS cluster_views,
    LAG(topic_cluster) OVER(PARTITION BY user_pseudo_id, session_id ORDER BY event_time) AS prev_topic
  FROM mapped_events
  WHERE event_name = 'page_view'
),

clustering_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    MAX(SAFE_DIVIDE(cluster_views, total_views)) AS dominant_topic_share,
    -- Count topic switches (where current topic differs from previous)
    COUNTIF(topic_cluster != prev_topic AND prev_topic IS NOT NULL) AS topic_switch_count,
    -- Total page views (for minimum signal floor check)
    MAX(total_views) AS total_page_views,
    -- Repeat cluster return: views in the dominant cluster beyond the first visit
    MAX(cluster_views) - 1 AS repeat_cluster_visits
  FROM clustering_prep
  GROUP BY 1, 2
),

-- 6. Progression: weighted action scores using intent weights from taxonomy
progression_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    -- NOTE: page_view is excluded here. It contributes to breadth only (Section 4).
    -- Scroll contributes to depth, not progression, so it is also excluded.
    SUM(CASE
      WHEN event_name = 'cta_click' THEN 1.0 * intent_weight
      WHEN event_name = 'form_start' THEN 1.5 * intent_weight
      WHEN event_name = 'form_submit' THEN 2.0 * intent_weight
      WHEN event_name = 'booking_click' THEN 1.5 * intent_weight
      WHEN event_name = 'conversion_complete' THEN 2.0 * intent_weight
      ELSE 0
    END) AS raw_progression_sum,
    COUNTIF(event_name = 'form_start') AS form_starts,
    COUNTIF(event_name = 'form_submit') AS form_submits,
    COUNTIF(event_name = 'conversion_complete') AS conversions
  FROM mapped_events
  GROUP BY 1, 2
)

-- 7. Final scoring: assemble all four signal scores (0–10)
SELECT
  b.user_pseudo_id,
  b.session_id,

  -- Breadth score (0–10): based on unique pages and variety
  LEAST(10, CASE
    WHEN b.unique_pages = 1 THEN 1
    WHEN b.unique_pages <= 3 AND b.unique_page_types <= 2 THEN 3
    WHEN b.unique_pages <= 5 THEN 5
    WHEN b.unique_pages <= 8 AND b.unique_page_types >= 3 THEN 7
    ELSE 9
  END) AS breadth_score,

  -- Depth score (0–10): based on engagement time and scroll
  LEAST(10, CASE
    WHEN d.engagement_time_seconds < 10 THEN 1
    WHEN d.engagement_time_seconds < 30 THEN 3
    WHEN d.engagement_time_seconds < 90 THEN 5
    WHEN d.engagement_time_seconds < 180 THEN 7
    ELSE 9
  END
  + CASE WHEN COALESCE(d.avg_scroll_percent, 0) >= 75 THEN 1 ELSE 0 END
  + CASE WHEN d.deep_engagement_events > 0 THEN 1 ELSE 0 END
  ) AS depth_score,

  -- Progression score (0–10): capped weighted sum
  LEAST(10, ROUND(p.raw_progression_sum, 1)) AS progression_score,

  -- Clustering score (0–10): formula with minimum signal floor
  ROUND(
    (c.dominant_topic_share * 10)
    - CASE
        WHEN c.total_page_views < 4 THEN 0  -- minimum signal floor: no penalty below 4 pages
        ELSE LEAST(c.topic_switch_count, 5)
      END
    + LEAST(GREATEST(c.repeat_cluster_visits, 0), 3)
  , 1) AS clustering_score,

  -- Raw metrics for debugging and calibration
  b.unique_pages,
  b.unique_page_types,
  d.engagement_time_seconds,
  d.avg_scroll_percent,
  c.dominant_topic_share,
  c.topic_switch_count,
  c.total_page_views,
  p.form_starts,
  p.form_submits,
  p.conversions

FROM breadth_metrics b
JOIN depth_metrics d ON b.user_pseudo_id = d.user_pseudo_id AND b.session_id = d.session_id
JOIN clustering_metrics c ON b.user_pseudo_id = c.user_pseudo_id AND b.session_id = c.session_id
JOIN progression_metrics p ON b.user_pseudo_id = p.user_pseudo_id AND b.session_id = p.session_id

Implementation notes

Session identity: GA4's ga_session_id is a timestamp and is not unique across users. user_pseudo_id is GA4's anonymous identifier for a visitor (based on their browser cookie). Always partition by both user_pseudo_id AND session_id to avoid mixing sessions from different visitors.
Taxonomy join performance: REGEXP_CONTAINS joins are computationally expensive. For production use, materialise the taxonomy lookup as a pre-computed URL-to-metadata table (exact match on URL path) and reserve regex matching for a nightly batch update. This can reduce query costs by 10–100x on large event tables.
Null handling: The COALESCE wrappers on taxonomy fields ensure untagged pages default to topic_cluster: 'General' and intent_weight: 0.5 rather than producing null scores. Monitor the volume of 'General' classifications. High volume indicates taxonomy debt.
Calibration: The breadth and depth score thresholds above (e.g. "< 30 seconds = 3") are fixed-rule defaults (Section 3.4, Option A). Once you have 3+ months of data, replace them with percentile-based scoring by computing PERCENT_RANK() over the raw metrics and mapping the percentile to a 0–10 scale.
Next step, state assignment: This query produces the four signal scores per session. To assign states, add a final CASE WHEN block applying the priority order from Section 7, Step 4, or export the scores to a downstream transformation layer (e.g. dbt, a SQL-based data transformation tool) for state classification and confidence scoring.