← Portfolio

Behaviour Intelligence from Web AnalyticsStrategic Analytics: v1.0, March 2026

1. Orientation

What this is: An open-source framework and reference implementation for classifying website visitors into behavioural states using GA4 data. The full system is available on GitHub under MIT licence.

This document defines a behavioural intelligence system for websites and digital products.

Its purpose is to answer a practical question:

What is this visitor trying to do, how confident are we, and what should we change as a result?

This system transforms analytics from passive reporting into an active decision framework.

Open-source implementation available

This framework has been implemented as a production-ready classification engine: a JavaScript scoring and classification library, a BigQuery SQL pipeline, GTM client-side scripts, deployment automation, and a full test suite. The implementation is open source under the MIT licence.

View the repository on GitHub →


Why this is useful

Traditional analytics answers:

This system answers:


Core principle

Behaviour
State
Response
Outcome
Learning

Optional refinement layer:

Behaviour
State
Motivationinferred, optional
Response
Outcome
Learning

Key terms used in this document

Signal
A measurable indicator of user behaviour (for example, how many pages someone viewed, how long they spent reading, or whether they clicked a call-to-action button). The system uses four core signals: breadth, depth, progression, and clustering.
State
A classification category that describes where a visitor currently sits on the journey from awareness to action. Examples: "Scanner" (browsing widely but shallowly) or "Evaluator" (reading deeply and moving toward a decision).
Confidence
A score (0–10) that measures how certain the system is about a state classification. Low confidence means the evidence is thin; high confidence means the signals are strong and consistent. Confidence controls what actions the system is allowed to take.
Cluster / Clustering
A group of related pages tagged with the same topic (e.g. "pricing", "case studies", "product A"). Clustering measures how concentrated a visitor's behaviour is within one topic group versus scattered across many.
Taxonomy
A structured register that tags every page on the site with a page type (e.g. "service page", "blog post") and topic cluster (e.g. "pricing", "proof"). This tagging is what makes clustering measurable.
GA4
Google Analytics 4: the current version of Google's web analytics platform, used here as the default data collection tool.
GTM
Google Tag Manager: a tool that manages tracking code on a website without requiring direct code changes. Used to send events (like clicks and form submissions) to GA4.
CTA
Call-to-action: a button or link designed to prompt a specific user action, such as "Get a quote" or "Book a demo".
Motivation
An optional inference about what a visitor may be seeking (e.g. "risk-sensitive" or "value-driven"), based on their observed behaviour. Motivation is always secondary to state classification and is only assigned when confidence is medium or high.

2. System Overview

The system consists of six core layers (plus one optional refinement layer):

1Data Collection (GA4)
2Signal Construction
3State Classification
4Confidence Scoring
5Action Layer
6Feedback Loop
7Optional Motivation Layer (applied only when confidence is medium/high)

What each layer does

Layer 1: Data Collection (GA4)

The raw input. GA4 captures events from the website: page views, scroll depth, engagement time, clicks, form interactions, and more. If the right events are not collected here, everything downstream is guessing. It feeds raw event-level data into the next layer.

In the reference implementation: six client-side JavaScript modules deployed via GTM detect rage clicks, dead clicks, form errors, layout shifts, traffic source groups, and element-level intent signals, then push structured events and custom dimensions into the GA4 data layer.

Layer 2: Signal Construction

Takes the raw GA4 events and transforms them into four structured scores (each 0–10): Breadth (how widely the user explores), Depth (how deeply they engage), Progression (how far they move toward conversion actions), and Clustering (how focused their browsing is on a single topic). Raw events on their own are too noisy to compare. This layer gives every session a common shape so the classifier can tell visitors apart.

In the reference implementation: signals.js exposes a scorer for each signal (breadth, depth, progression, clustering) that converts raw session metrics into a 0–10 score. The same logic is mirrored in 01-signal-scores.sql for batch processing in BigQuery.

Layer 3: State Classification

Uses the four signal scores to assign the visitor to a named behavioural state (e.g. Scanner, Explorer, Evaluator, Engaged). Each state has defined signal thresholds. For example, high breadth combined with low depth and no progression produces a "Scanner" classification. Scores alone do not tell a team what to do. A named state gives everyone a shared word for the visitor's situation and makes the system actionable.

In the reference implementation: classifier.js walks a priority-ordered rule set from config.js and returns the first state whose signal thresholds are met, with a continuous fit-score fallback for ambiguous cases. The SQL equivalent is 02-state-classification.sql.

Layer 4: Confidence Scoring

Evaluates how reliable the classification is. A visitor with two page views and ten seconds of data gets a low confidence score; someone with fifteen pages, deep scroll, and multiple CTA clicks gets a high one. Confidence (0–10, bucketed into low, medium, and high) acts as a gate that controls what the system is allowed to do next. Without it, a two-page bounce would carry the same weight as a fifteen-page deep session. The system needs to know when it has enough evidence before it acts. Low confidence means observe only; high confidence means act.

In the reference implementation: confidence.js sums five factors (signal count, signal strength, state clarity, session depth, and temporal consistency), applies contradiction penalties, then buckets the result into low, medium, or high. A companion function gates which action types each band is permitted to trigger.

Layer 5: Action Layer

Maps each state-plus-confidence combination to a concrete response. A label on its own does not change anything. This layer turns the classification into a specific recommendation so someone (or something) can act on it. Depending on the state and confidence level, the action might be "do nothing" (low confidence), "surface a relevant CTA" (medium), or "trigger a personalised offer" (high). It is the bridge between classification and business outcome.

In the reference implementation: action.js looks up the visitor's state in an action-mapping table from config.js and returns the recommended action, success metric, owner, and a natural-language prescription with context-specific detail interpolated in.

Layer 6: Feedback Loop

Measures whether the actions taken actually worked. Did the personalised CTA increase conversions? Did Scanners who were shown navigation aids find what they needed? Without this, the system never learns whether its recommendations actually helped. The feedback loop sends outcome data back to refine signal weights, classification thresholds, and action rules over time.

In the reference implementation: temporal.js tracks recency, frequency, trend direction, and velocity across a visitor's session history, while 03-temporal-analysis.sql and supporting SQL queries handle the same at scale in BigQuery. A documented review cadence (weekly, monthly, quarterly) and defined rollback triggers govern when thresholds are recalibrated.

Layer 7 (Optional): Motivation Layer

Only applied when confidence is medium or high. Infers why the visitor is behaving the way they are (e.g. "price-sensitive", "risk-averse", "comparison shopping") based on which content clusters they focus on. This adds a qualitative dimension to the state label, allowing even more targeted responses. It is deliberately optional because motivation inference is less reliable than behavioural classification.

In the reference implementation: refinements.js first detects a content sub-type based on where engagement time is concentrated, then infers one of six motivations by combining the sub-type, state, and signal values. The confidence gate in confidence.js controls whether the inferred motivation is suppressed, flagged for review, or allowed to drive automated action modifiers.

How the layers chain together

The flow is a pipeline: GA4 events → structured signals → state label → confidence gate → action → outcome measurement → refinement. Each layer depends on the one before it, and the feedback loop at the end circles back to improve layers 2–5. The confidence gate (layer 4) is the key safety mechanism. It prevents the system from acting on weak evidence, so low-data sessions are observed rather than acted upon prematurely.

In the reference implementation: pipeline.js orchestrates the entire flow through a single entry point, evaluateVisitor. It accepts the raw session data and user history, then calls each layer in sequence: scoreAllSignalsassessTemporalContextclassifyByPrioritycalculateConfidenceapplyRefinementsresolveAction. The returned object contains the full evaluation: signals, temporal context, classification, confidence, refinements, and action plan. The SQL pipeline mirrors this sequence across six numbered query files (01 through 06), each corresponding to a layer, designed to run in order inside BigQuery.

3. Signal Model

The system needs a way to measure behaviour that is consistent across every session and every site. Without a defined signal model, classification would depend on ad hoc metrics that shift from report to report. These four signals provide the common language that makes everything downstream possible.

3.1 Core signals (primary)

1. Breadth (Exploration Volume)

How much the user explores.

Breadth should be calculated from:

Recommended calculation:

Breadth can be calibrated per site using percentiles once enough data exists.

Interpretation warning: High breadth does not always mean healthy exploration.

On a poorly structured site, high breadth often signals a lost user, someone clicking widely because they cannot find what they need, not because they are surveying options.

How to check: Cross-reference breadth with depth and progression. If breadth is high but depth is very low (≤ 2) and progression is zero, the user is more likely lost than exploring.

What to watch: The Scanner state captures this pattern. A spike in Scanner volume may indicate a site navigation problem rather than a traffic quality problem.

2. Depth (Engagement)

How deeply they engage.

Depth should be calculated from:

Recommended inputs:

Suggested depth score (0–10):

3. Progression (Intent Momentum)

How far they move toward meaningful action.

Progression should be calculated from:

Recommended inputs:

Suggested progression score (0–10):


3.2 Derived signal

4. Clustering (Behavioural Coherence)

How concentrated behaviour is within a topic, pathway, or offer cluster.

This is the main signal that distinguishes broad, scattered browsing from coherent evaluation.

Clustering should be calculated from three components:

A. Topic concentration

What proportion of views fall within the dominant topic cluster.

Example:

B. Topic switching

How often the visitor moves between unrelated topics.

Example:

C. Repeat cluster return

Whether the user repeatedly returns to the same cluster during the session or across sessions.

Example:

Recommended inputs:

Suggested clustering score (0–10):

A simple practical starting formula:

clustering_score = (dominant_topic_share * 10) - topic_switch_penalty + repeat_cluster_bonus

Where:

Minimum signal floor: Do not apply topic_switch_penalty until the user has viewed 4 or more pages.

Below this threshold, a single topic switch (e.g. 2 pages in Cluster A, then 1 in Cluster B) is normal exploratory behaviour and should not be penalised. For short sessions, set topic_switch_penalty = 0 and rely on dominant_topic_share alone.

Example 1, sufficient data: a visitor views 10 pages, 7 in one cluster, switches topics twice, and returns to the primary cluster 3 times:

Example 2, below signal floor: a visitor views 3 pages, 2 in Cluster A and 1 in Cluster B, with 1 topic switch:

The exact weightings should be calibrated to the site once sufficient data exists. Start with these defaults and adjust.


3.3 Temporal signals (integrated)

Temporal signals track how behaviour changes across visits: how recently someone returned, how often they visit, and how quickly they move toward action. These are not separate decoration; they should directly shape classification and confidence.

This section defines the raw inputs. Section 11 (later in this document) covers how these inputs are used to track state transitions over time. For example, detecting that a Scanner is becoming an Evaluator across three sessions.

Recency

Time since last session.

Use recency to distinguish:

Suggested recency bands:

Frequency

Number of sessions in a defined time period.

Suggested frequency bands:

Velocity

How quickly a user moves toward action.

Examples:

Recommended temporal inputs:

These should explicitly affect the Returning Evaluator and Re-engaged Prospect classifications, and should increase or reduce confidence in other states.


3.4 Score calibration

Scores should not remain arbitrary. They must be calibrated.

There are two valid approaches:

Option A: Fixed rules (best for early-stage / low data)

Use fixed score thresholds based on known business logic. This is easier to explain and debug.

Option B: Percentile-based normalisation (best once data volume grows)

Instead of fixed thresholds, compare each visitor's raw values against what is typical for your site. "Percentile-based" means ranking a value against historical data. For example, if a visitor's engagement time is higher than 80% of all sessions, they score in the 80th percentile.

Examples:

This avoids applying the same thresholds to very different sites.

Recommended approach:

4. Context Weighting

Not all pages and actions are equal. A visitor clicking a CTA on a pricing page signals stronger intent than a visitor scrolling a blog post. Without weighting, the system would treat every click and every page as equally important, and a five-page blog reader would look the same as a five-page pricing evaluator. Context weighting adjusts signal scores based on where, how, and from where a visitor interacts.

These weights are multipliers. They increase or decrease the signal value of an action. They are applied during the classification process described in Section 7. You do not need to read Section 7 first; the tables below define the weights themselves.

Page types

Each page type carries a weight that increases or decreases the signal value of actions taken on it.

Page type Intent weight Effect
Homepage0.5Actions here are orientation; downweight toward progression
Blog / resource0.6Useful for depth, but low direct conversion signal
Service / product1.0Baseline evaluation behaviour
Case study1.2Proof-seeking; upweight depth and clustering
Pricing1.5Strong intent signal; upweight progression
Contact / booking2.0Conversion action; maximum progression weight

Action strength

Each action type carries a signal weight reflecting how strongly it indicates intent.

Action Weight Signal contribution
Page view0.2Breadth only
Scroll (≥75%)0.5Depth
CTA click1.0Progression
Form start1.5Strong progression
Form submit2.0Conversion / maximum progression

Source context

Traffic source applies a bias to the initial state probability, not a hard override.

Source Bias Rationale
Direct / bookmark+1 to progression baselineReturning with purpose suggests prior awareness
Organic searchNeutralIntent varies; let behaviour determine state
Social media-1 to progression baselineTypically exploratory; higher Scanner probability
Referral+1 to depth baselineTrust transfer from referring source
Paid search+1 to progression baselineKeyword intent suggests evaluation

These weights are starting defaults. Calibrate them against actual conversion data once enough volume exists.

Element-level metadata (micro-signals)

Scalability is not just about the page, it is about the elements on the page. Individual interactive elements can carry their own weight, independent of the page they sit on.

When an element carries an element_weight, it overrides the page’s intent weight for that specific interaction. When absent, the page weight applies as the default.

Element role Default weight Example
Progression2.0“Get a Quote” button, “Book a Demo”
Depth0.5“Read More” link, “See Details”
Tool use0.8Calculators, configurators
Navigation0.3Menu links, breadcrumbs
Social0.3Share buttons, social links

Implementation: add data-element-role and optionally data-element-weight to interactive HTML elements. GTM reads these attributes on click events and sends them to GA4 as custom parameters.

<button data-element-role="progression" data-element-weight="2.0">Get a Quote</button>
<a href="/blog/..." data-element-role="depth" data-element-weight="0.5">Read More</a>

The effective weight for progression scoring becomes:

effective_weight = element_weight ?? page_intent_weight
progression_contribution = action_weight × effective_weight

This means a “Get a Quote” button (element_weight 2.0) on a blog page (page_weight 0.6) contributes 2.0 to progression, not 0.6. The element’s own significance wins.

5. State Model

States are the point of the system. A dashboard full of signal scores is useful to an analyst, but it does not tell a product manager what to fix or a CRM team who to follow up with. Named states translate numerical patterns into plain descriptions of visitor behaviour that any team can understand and act on.

The system uses 10 core states with explicit signal definitions.

Each state is determined using:

State definitions

Note on score overlaps: The score ranges below intentionally overlap at boundaries. Real user behaviour does not fall neatly into boxes. When a visitor's scores could match two states, Section 7 provides a priority order to resolve the tie (e.g. "Engaged" always takes priority over "Focused Evaluator"). If no state's criteria are fully met, assign the closest match with low confidence and flag for review.

1. Mismatch

  • Breadth ≤ 2
  • Depth ≤ 2
  • Progression = 0
  • Clustering irrelevant
  • Usually 1 session only

Immediate exit or no meaningful engagement.

2. Scanner

  • Breadth ≥ 6
  • Depth ≤ 3
  • Clustering ≤ 3
  • Progression ≤ 2
  • Usually low velocity, low continuity

Wide but shallow exploration.

3. Explorer

  • Breadth 4–7
  • Depth 3–6
  • Clustering 3–6
  • Progression 2–4
  • May show increasing structure within session or across 2 sessions

Exploration with emerging intent.

4. Comparator

  • Breadth 4–7
  • Depth 3–5
  • Clustering ≥ 5 across competing options / pathways
  • Progression 3–5
  • Often includes repeat visits to proof, pricing, or alternative offers

Comparing multiple options.

5. Evaluator

  • Breadth 3–6
  • Depth ≥ 6
  • Clustering ≥ 5
  • Progression 4–6
  • May occur in one strong session or repeated sessions within 7 days

Serious evaluation.

6. Focused Evaluator

  • Breadth 2–4
  • Depth ≥ 7
  • Clustering ≥ 7
  • Progression ≥ 6
  • High velocity or repeated strong cluster return

Highly aligned, strong intent.

7. Hesitant

  • Progression ≥ 6 (form start / CTA / booking step)
  • No completion
  • Depth ≥ 4
  • Can occur in one session or repeated attempts over 7–14 days

Intent present but interrupted.

8. Stalled

  • Breadth 3–6
  • Depth 4–6
  • Progression ≤ 3
  • Repeated loops or repeated low-progress sessions
  • Low improvement over time

Confusion, overload, or structural friction.

Important distinction: Stalled vs. Frustrated. A Stalled user has intent but lacks clarity (revisiting the same pages, looping between sections, failing to progress). A Frustrated user is blocked by technical or UX failures. To distinguish between them, monitor for friction signals:

  • rage_click_count: rapid repeated clicks on the same element (3+ clicks within 2 seconds)
  • dead_click_count: clicks on non-interactive elements that produce no response
  • form_error_count: validation errors encountered during form completion
  • high_layout_shift: significant content movement during page load (Cumulative Layout Shift > 0.25, a measure of how much visible page content shifts unexpectedly during loading)

If friction signals are present alongside Stalled criteria, classify as Stalled (Friction) sub-type. This changes the recommended action from "simplify navigation" to "fix the broken interaction", a UX engineering problem, not a content strategy problem.

9. Engaged (Committed)

  • Progression ≥ 8 (conversion)
  • Continued activity post-conversion
  • May include trust / process validation after action

User has acted and is validating or onboarding.

10. Returning Evaluator

  • 2+ sessions within 7 days, or 3+ sessions within 30 days
  • Increasing depth, clustering, or progression over time
  • No completed conversion yet

Intent strengthening across sessions.


Special temporal state: Re-engaged Prospect

This may be treated as a sub-type of Explorer, Evaluator, or Returning Evaluator.

Typical pattern:

This matters because the visitor is not simply "new" or "returning"; they are reactivated.

6. Optional Refinement Layers

6.1 Content Sub-types (Optional)

Each core state can carry a sub-type label that describes the content orientation of the behaviour, not just its intensity. Sub-types are determined by which page types and content roles dominate the session.

Sub-type Determined by Example
Proof-focused Majority of depth on case studies, testimonials, or results pages An Evaluator spending 70% of engagement time on case studies
Trust-focused Concentration on about, team, credentials, or review pages A Hesitant user who revisits the "About us" and "Our team" pages before returning to the form
Price-focused Repeated or deep engagement with pricing, comparison, or plan pages A Comparator returning to the pricing page across two sessions
Resource-seeking Majority of actions on downloads, guides, tools, or documentation An Explorer downloading three whitepapers but not visiting any service pages

Sub-types are optional. Implement them only when the action layer needs to differentiate which kind of content to surface. For example, sending a proof-led follow-up to a proof-focused Evaluator vs. a pricing summary to a price-focused Comparator.

Sub-types do not affect state classification or priority. They inform the content of the response, not the type of response.

6.2 Motivation Signals (Optional Layer)

The framework may apply a lightweight motivation signal after state assignment. This is an inference layer, not a replacement for state classification.

Use motivation only to refine action precision:

Recommended motivation categories (keep small)

Motivation signal Typical behavioural pattern
Curiosity-drivenBroad exploration, low commitment, limited progression
Value-drivenDeep engagement with proof/outcome content and evaluation behaviour
Risk-sensitiveStrong intent signals with hesitation before completion
Confusion-drivenRepeated loops, switching, and low forward movement
Overload-sensitiveDeep dwell and repeated review without clear progression
Urgency-drivenFast movement to high-intent actions with minimal exploration

Guardrails (must follow)

  1. Behaviour first: assign state before motivation.
  2. Confidence gate: only assign motivation when state confidence is medium or high (4+).
  3. Minimal set: do not expand motivation categories unless a new category changes action.
  4. Action-linked only: if a motivation label does not alter response, remove it.
  5. No psychological overreach: describe behavioural consistency, not internal truth.

What should not be added

Do not add abstract, non-operational labels (for example, "status-seeking" or "identity-driven") unless there is a reliable behavioural proxy and a distinct action pathway.

Do not use motivation as a primary classifier, and do not assign motivation when data is sparse or confidence is low.

7. Classification Logic

The state model defines what each state looks like. This section defines how the system actually decides which state to assign: the priority order, the scoring method, and what happens when a visitor's signals are ambiguous. Without clear rules, two implementations of the same framework could classify the same visitor differently.

Important note on thresholds

All thresholds (e.g. page count, time, events) should be calibrated per site. The example rules below are starting points only. Different products, traffic types, and session lengths will require adjustment.

Signal scoring approach (recommended)

Instead of relying only on hard rules, assign scores:

Then map score ranges to states. This improves flexibility and reduces brittle classification.

Recommended scoring method

Step 1: calculate raw metrics

Examples:

Step 2: convert to normalised scores

Use either:

Step 3: apply context weighting

Adjust signals based on:

Step 4: assign likely state

Apply states in the following priority order (highest priority first). Evaluate each rule top-down; assign the first state whose criteria are met.

  1. Engaged: conversion completed (progression ≥ 8). Overrides all other states.
  2. Hesitant: high-intent action started but not completed (progression ≥ 6, no conversion). Overrides Focused Evaluator.
  3. Returning Evaluator: temporal criteria met (2+ sessions in 7 days or 3+ in 30 days) with increasing signals and no conversion. Overrides single-session states.
  4. Focused Evaluator: narrow, deep, high-progression behaviour (breadth 2–4, depth ≥ 7, clustering ≥ 7, progression ≥ 6).
  5. Evaluator: serious evaluation with depth and clustering (depth ≥ 6, clustering ≥ 5, progression 4–6).
  6. Comparator: evaluation across competing options (breadth 4–7, depth 3–5, clustering ≥ 5, progression 3–5).
  7. Stalled: moderate engagement with low progression and repeated loops (breadth 3–6, depth 4–6, progression ≤ 3).
  8. Scanner: wide but shallow (breadth ≥ 6, depth ≤ 3, clustering ≤ 3).
  9. Explorer: moderate exploration with emerging structure (breadth 4–7, depth 3–6).
  10. Mismatch: minimal engagement (breadth ≤ 2, depth ≤ 2, progression = 0).

If no state criteria are fully met, assign the closest match and flag confidence as low. Where scores fall between two adjacent states, assign a hybrid classification (see Hybrid States below).

Step 5: assign confidence score

Every state assignment must be paired with a confidence score.

Step 6 (optional): assign motivation signal

Only after state and confidence are assigned:

Motivation is secondary metadata that refines response content. It must not override state priority or confidence logic.

Example rules (raw metric thresholds)

The state definitions in Section 5 use scored thresholds (e.g. "Breadth ≥ 6"). The examples below show how raw metrics translate into those scores under fixed-rule scoring (Section 3.4, Option A). These are starting defaults; calibrate to your site.

Scanner (raw metrics → Breadth score ≥ 6, Depth score ≤ 3):

Evaluator (raw metrics → Depth score ≥ 6, Clustering score ≥ 5):

Hesitant (raw metrics → Progression score ≥ 6, no conversion):

Returning Evaluator (raw metrics + temporal criteria):

Hybrid states

Users may exhibit signals consistent with multiple states simultaneously. When the primary state accounts for less than 70% of the signal weight, classify as a hybrid.

Example: a visitor with breadth 5, depth 5, clustering 5, progression 3 may score as:

When a hybrid classification occurs, the system should store:

The action layer should respond to the primary state but avoid actions that would be counterproductive for the secondary state. For example, if a visitor is 60% Explorer / 40% Evaluator, guide them toward deeper evaluation content rather than immediately pushing for conversion.

8. Confidence Scoring

Each classification includes a confidence score. Confidence determines how strongly the system acts on a classification. It is not optional metadata. Acting on a weak classification wastes effort or, worse, annoys a visitor with the wrong intervention. Confidence is what separates a system that guesses from one that knows when to wait.

Confidence calculation

Confidence is calculated from five factors, each scored 0–2:

Factor 0 (weak) 1 (moderate) 2 (strong)
Signal count ≤ 2 distinct signal types observed 3–4 distinct signal types 5+ distinct signal types
Signal strength Only passive actions (views, scrolls) Mix of passive and active (CTA clicks) Active high-intent actions (form starts, bookings)
State clarity Primary and secondary states within 20% of each other Primary state clearly leads but secondary is plausible Primary state dominant, no close competitor
Session depth < 30 seconds or ≤ 2 pages 30s–2min, 3–5 pages > 2min, 5+ pages
Temporal consistency Single session, no history 2 sessions with consistent direction 3+ sessions with reinforcing pattern

"Signal types" means distinct categories of observed behaviour: page views, scrolls, CTA clicks, form starts, form submits, downloads, booking clicks, etc. Multiple page views count as one signal type; a page view plus a CTA click counts as two.

Confidence score = sum of all factors (0–10)

Confidence bands

How confidence governs action

Confidence Permitted actions Examples
Low (0–3) Reporting and aggregate analysis only Include in state distribution dashboards; do not trigger individual interventions
Medium (4–6) Lightweight nudges and analyst review Adjust content recommendations; flag for manual review; add to nurture segments; optional motivation tag may be applied
High (7–10) Direct automated action and personalisation Trigger CRM workflows; personalise page content; alert sales team; motivation tag can drive targeted response variant

Motivation assignment rule:

Examples

9. Interactive State Classifier

Use this tool to test how the four signal scores map to states. Adjust the sliders to see the classification update in real time. Temporal states (Returning Evaluator, Re-engaged Prospect) require multi-session data and cannot be tested with single-session scores alone. The confidence score shown here is a simplified approximation . The full engine (Section 8) uses five factors including session metadata and temporal consistency that sliders alone cannot capture.

Explorer
Confidence: Medium
Strengthen pathways from broad discovery content into relevant offers.

10. Action Layer

Each state must map to a specific, testable action.

Principle

Classification without response is only reporting.
The system becomes useful when each state changes what the business does.

Types of actions

UX / product actions

Change the experience itself.
Examples:

CRM / communication actions

Change what is said, when, and to whom.
Examples:

Strategic actions

Change internal decision-making.
Examples:

State + motivation refinement examples

Use these only when the motivation confidence gate is met:

If motivation does not change the action, keep state-level action only.

Example mappings with metrics

11. Temporal Layer

The system should not treat sessions as isolated events. A single session rarely tells the full story. Someone who visits three times in a week with increasing depth is fundamentally different from someone who visits once and leaves. Without tracking behaviour over time, the system would miss returning evaluators, chronic hesitation, and re-engagement after a long gap.

Section 3.3 defines the temporal signal inputs (recency, frequency, velocity). This section defines how those inputs are used to track state transitions and evaluate trend direction across sessions.

What temporal analysis should answer

Example transition patterns

Scanner Explorer Evaluator
Explorer Comparator Focused Evaluator
Evaluator Hesitant Returning Evaluator Engaged
Explorer dormant Re-engaged Prospect

Transition significance

A state matters more when it changes predictably.
For example:

Lifecycle phase mapping (optional overlay)

The states and temporal signals already described can be grouped into three broad lifecycle phases. This is not a separate classification. It is a lens over existing state data that clarifies which system should own the response (product, CRM, or sales) and what kind of action is appropriate.

Lifecycle phase Typical states What it means
Acquisition Mismatch, Scanner, Explorer First contact. The visitor is orienting. The question is whether the proposition is relevant and whether the site helps them find what they need.
Evaluation Comparator, Evaluator, Focused Evaluator, Hesitant, Stalled Active consideration. The visitor has intent but has not yet acted. The question is whether the experience removes enough friction and builds enough confidence to convert.
Retention Engaged, Returning Evaluator, Re-engaged Prospect Post-conversion or repeat engagement. The question shifts from "will they act?" to "will they stay, deepen, or return?"

Phase is determined by the visitor's current state, not by calendar time. A visitor may reach Evaluation in their first session or remain in Acquisition across several visits. The phase changes when the state changes.

This mapping is useful when the action layer needs to route responses to different teams or systems. For example, Acquisition-phase issues are typically product or content problems, Evaluation-phase issues are conversion path problems, and Retention-phase issues are CRM or onboarding problems.

Recommended temporal thresholds

12. Implementation (GA4)

The framework is tool-agnostic in principle, but it needs a concrete data layer to work. GA4 with Google Tag Manager is the recommended default because most sites already have it, it supports custom events and dimensions natively, and its BigQuery export provides the raw event-level data that the scoring pipeline requires.

Required events

Friction events (recommended)

These are not required for core state classification but are needed to distinguish Stalled from Stalled (Friction); see Section 5, State 8.

Required parameters

GA4 property limits: Standard GA4 properties allow up to 50 event-scoped custom dimensions and 25 user-scoped custom dimensions. This framework uses 6 required event parameters plus friction events. Plan your custom dimension budget early. If the site already uses 40+ custom dimensions for other needs, consolidate where possible or use BigQuery export (where these limits do not apply). Audit existing custom dimensions before rollout so you do not hit the limit halfway through implementation.

Source-integrated taxonomy (CMS-first model)

Strategic meaning is a property of the content, not a secondary layer. For the clustering signal to work, every page must carry its own taxonomy metadata, assigned within the CMS at the moment of creation.

When a page is published, the CMS assigns three required fields stored as hidden metadata in the HTML:

GTM reads this metadata directly from the page and attaches it to every GA4 event. The system then uses these labels to calculate clustering and progression scores. No external lookup table is required.

Why CMS-first is the only scalable approach:

Zero blind spots: Every new page is classified the moment it goes live. There is no lag where a user visits a page that has not been added to a separate register.

No maintenance burden: No separate, massive lookup table to manage. The taxonomy is part of the website’s structure.

Reliable clustering: The clustering signal depends on seeing consistent topic tags. CMS-embedded tags ensure the signal is always accurate and never null.

Infinite scalability: Whether you have 10 pages or 10,000, the system scales because the metadata is distributed across the site rather than trapped in a central spreadsheet.

Example topic clusters: strategy, proof, pricing, onboarding, product A, product B.

Example logic:

Taxonomy maintenance

The clustering signal is only as reliable as the taxonomy behind it. In the CMS-first model, the CMS publishing workflow is the primary gate; pages should not go live without page_type and page_topic assigned.

Required process:

  1. Require metadata at publish: the CMS must include page_type, page_topic, and intent_weight as required fields. Untagged pages cannot exist if the publishing workflow enforces this.
  2. Audit monthly: run a report of pages viewed in the last 30 days that have a page_topic of “General” (the default). Any page receiving more than 100 views without a proper tag is a blind spot that must be classified.
  3. Track coverage: maintain a simple metric: tagged_pages / total_pages_with_traffic. Target ≥ 95% coverage. Below 90%, clustering scores should be treated as unreliable.

Default for untagged pages: Any page without CMS metadata should default to page_topic: General and intent_weight: 0.5. This prevents null values from breaking score calculations while making untagged pages visible in audits.

Common failure mode (without CMS-first): The marketing team publishes 10 blog posts without tagging them. Users who read those posts appear to have scattered, low-clustering behaviour, creating “false Scanners” or “false Mismatches.” The system then recommends navigation improvements for a problem that is actually a taxonomy gap. The CMS-first model prevents this entirely.

Fallback: external taxonomy register

For legacy sites that cannot yet embed CMS metadata, an external register provides a fallback. The register maps URL patterns to taxonomy values and is hosted in Google Sheets or Airtable for CSV export or BigQuery sync.

URL pattern Page type Topic cluster Journey role Intent weight Key progression event
/ Homepage Brand Orientation 0.5 nav_click_services
/services/consulting/* Service Strategy Evaluation 1.0 cta_click_quote
/case-studies/* Case study Proof Validation 1.2 resource_download
/pricing Pricing Commercial High intent 1.5 form_start_trial
/blog/ai-trends/* Blog AI / Tech Awareness 0.6 newsletter_signup
/contact-success Confirmation Admin Post-action 2.0 conversion_complete

How the taxonomy feeds the clustering signal: When a page loads, the GTM data layer reads the CMS-embedded metadata (or the register as fallback) and sends page_topic and page_type with the GA4 page_view event. In BigQuery, clustering is calculated from the sequence of page_topic values in each session. The SQL pipeline uses a COALESCE pattern: CMS-embedded values are used first; the external register provides values only when CMS metadata is absent.

Processing options

Basic setup

Intermediate setup

Advanced setup

Recommended implementation sequence

  1. Define taxonomy (page types, topics, offers)
  2. Implement events and parameters in GTM
  3. Validate data quality in GA4
  4. Build score calculations
  5. Test state assignment against real sessions
  6. Add confidence scoring
  7. Connect states to actions and metrics

13. Output Layer

Classification is only useful if the right people see the right information at the right time. The output layer exists because a well-built model that lives inside a database query and never reaches a decision-maker has zero business value.

Dashboards should show:

Problem-first reporting views

The system should not only be organised around states. It should also be organised around business problems.

Examples:

GA4 data thresholding caveat

GA4 applies data thresholding to reports when user counts in a segment are small, suppressing rows to protect user privacy. This means state distribution dashboards may show incomplete or misleading data for low-volume segments. For example, a "Focused Evaluator" segment with only 12 users in a reporting period may be hidden entirely.

Mitigations:

Output principle

Every report should answer:

  1. what state is happening?
  2. how confident are we?
  3. what should we change?
  4. how will we know if it worked?

Prescriptive output

The final dashboard should not just report data; it should issue instructions. Each state classification, when combined with aggregate context, generates a natural-language prescription that tells the team exactly what to do.

Examples:

Prescriptions are template-based, not AI-generated. Each state maps to an instruction template with placeholders (e.g. {sessionCount}, {topBlockedPage}) that are interpolated from aggregate data at query time.

14. Feedback Loop

Any fixed set of thresholds will drift as the site, the traffic, and the market change. The feedback loop exists to make sure the system stays accurate over time rather than slowly becoming wrong in ways nobody notices.

The system improves through a continuous cycle:

  1. Observe: collect behavioural data through GA4 events and parameters.
  2. Classify: assign states and confidence scores using the signal model and classification logic.
  3. Act: trigger the appropriate response (UX change, CRM action, strategic decision) based on state and confidence.
  4. Measure: track the defined success metric for each action (Section 10). Did the intervention change behaviour in the expected direction?
  5. Refine: adjust thresholds, weights, and state definitions based on measured outcomes.

What refinement looks like in practice

Recommended review cadence

15. Constraints and Limitations

Every model has boundaries. Being explicit about what this system cannot do is just as important as explaining what it can. It prevents overconfidence in the output and sets realistic expectations for anyone using the results to make decisions.

Analytical constraints

Data quality constraints

Privacy and consent

Exclusions

16. Final Summary

This system transforms website analytics from passive observation into an active decision framework. It does this by:

Key distinction

Most analytics systems describe what happened.

This system decides what to do next.


Appendix A: BigQuery Reference Implementation

This SQL provides a starting implementation for calculating the four core signal scores from GA4 BigQuery export data. It assumes the taxonomy register has been uploaded as a BigQuery table (manual_taxonomy_lookup) with columns: url_pattern, page_type, topic_cluster, intent_weight.

Prerequisites:

WITH

-- 1. Extract raw events with session identity
raw_events AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    event_name,
    TIMESTAMP_MICROS(event_timestamp) AS event_time,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS url,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled') AS scroll_percent
  FROM `your-project.analytics_123456.events_*`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
    AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
),

-- 2. Map events to taxonomy (REGEXP_CONTAINS matches URLs against patterns, e.g. /blog/.* matches any blog page)
-- NOTE: For large taxonomy tables, pre-compute the join or use exact URL matching
-- with a materialised lookup to avoid expensive regex scans on every query.
mapped_events AS (
  SELECT
    e.*,
    COALESCE(t.topic_cluster, 'General') AS topic_cluster,
    COALESCE(t.page_type, 'Unknown') AS page_type,
    COALESCE(t.intent_weight, 0.5) AS intent_weight
  FROM raw_events e
  LEFT JOIN `your-project.your_dataset.manual_taxonomy_lookup` t
    ON REGEXP_CONTAINS(e.url, t.url_pattern)
),

-- 3. Breadth score: unique pages, page types, and topic clusters per session
breadth_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    COUNT(DISTINCT url) AS unique_pages,
    COUNT(DISTINCT page_type) AS unique_page_types,
    COUNT(DISTINCT topic_cluster) AS unique_topics
  FROM mapped_events
  WHERE event_name = 'page_view'
  GROUP BY 1, 2
),

-- 4. Depth score: engagement time and scroll depth per session
depth_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    SUM(engagement_time_msec) / 1000.0 AS engagement_time_seconds,
    AVG(CASE WHEN scroll_percent IS NOT NULL THEN scroll_percent END) AS avg_scroll_percent,
    COUNTIF(event_name IN ('resource_download', 'video_start')) AS deep_engagement_events
  FROM mapped_events
  GROUP BY 1, 2
),

-- 5. Clustering: topic concentration, switching, and repeat returns
clustering_prep AS (
  SELECT
    user_pseudo_id,
    session_id,
    topic_cluster,
    event_time,
    COUNT(*) OVER(PARTITION BY user_pseudo_id, session_id) AS total_views,
    COUNT(*) OVER(PARTITION BY user_pseudo_id, session_id, topic_cluster) AS cluster_views,
    LAG(topic_cluster) OVER(PARTITION BY user_pseudo_id, session_id ORDER BY event_time) AS prev_topic
  FROM mapped_events
  WHERE event_name = 'page_view'
),

clustering_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    MAX(SAFE_DIVIDE(cluster_views, total_views)) AS dominant_topic_share,
    -- Count topic switches (where current topic differs from previous)
    COUNTIF(topic_cluster != prev_topic AND prev_topic IS NOT NULL) AS topic_switch_count,
    -- Total page views (for minimum signal floor check)
    MAX(total_views) AS total_page_views,
    -- Repeat cluster return: views in the dominant cluster beyond the first visit
    MAX(cluster_views) - 1 AS repeat_cluster_visits
  FROM clustering_prep
  GROUP BY 1, 2
),

-- 6. Progression: weighted action scores using intent weights from taxonomy
progression_metrics AS (
  SELECT
    user_pseudo_id,
    session_id,
    -- NOTE: page_view is excluded here. It contributes to breadth only (Section 4).
    -- Scroll contributes to depth, not progression, so it is also excluded.
    SUM(CASE
      WHEN event_name = 'cta_click' THEN 1.0 * intent_weight
      WHEN event_name = 'form_start' THEN 1.5 * intent_weight
      WHEN event_name = 'form_submit' THEN 2.0 * intent_weight
      WHEN event_name = 'booking_click' THEN 1.5 * intent_weight
      WHEN event_name = 'conversion_complete' THEN 2.0 * intent_weight
      ELSE 0
    END) AS raw_progression_sum,
    COUNTIF(event_name = 'form_start') AS form_starts,
    COUNTIF(event_name = 'form_submit') AS form_submits,
    COUNTIF(event_name = 'conversion_complete') AS conversions
  FROM mapped_events
  GROUP BY 1, 2
)

-- 7. Final scoring: assemble all four signal scores (0–10)
SELECT
  b.user_pseudo_id,
  b.session_id,

  -- Breadth score (0–10): based on unique pages and variety
  LEAST(10, CASE
    WHEN b.unique_pages = 1 THEN 1
    WHEN b.unique_pages <= 3 AND b.unique_page_types <= 2 THEN 3
    WHEN b.unique_pages <= 5 THEN 5
    WHEN b.unique_pages <= 8 AND b.unique_page_types >= 3 THEN 7
    ELSE 9
  END) AS breadth_score,

  -- Depth score (0–10): based on engagement time and scroll
  LEAST(10, CASE
    WHEN d.engagement_time_seconds < 10 THEN 1
    WHEN d.engagement_time_seconds < 30 THEN 3
    WHEN d.engagement_time_seconds < 90 THEN 5
    WHEN d.engagement_time_seconds < 180 THEN 7
    ELSE 9
  END
  + CASE WHEN COALESCE(d.avg_scroll_percent, 0) >= 75 THEN 1 ELSE 0 END
  + CASE WHEN d.deep_engagement_events > 0 THEN 1 ELSE 0 END
  ) AS depth_score,

  -- Progression score (0–10): capped weighted sum
  LEAST(10, ROUND(p.raw_progression_sum, 1)) AS progression_score,

  -- Clustering score (0–10): formula with minimum signal floor
  ROUND(
    (c.dominant_topic_share * 10)
    - CASE
        WHEN c.total_page_views < 4 THEN 0  -- minimum signal floor: no penalty below 4 pages
        ELSE LEAST(c.topic_switch_count, 5)
      END
    + LEAST(GREATEST(c.repeat_cluster_visits, 0), 3)
  , 1) AS clustering_score,

  -- Raw metrics for debugging and calibration
  b.unique_pages,
  b.unique_page_types,
  d.engagement_time_seconds,
  d.avg_scroll_percent,
  c.dominant_topic_share,
  c.topic_switch_count,
  c.total_page_views,
  p.form_starts,
  p.form_submits,
  p.conversions

FROM breadth_metrics b
JOIN depth_metrics d ON b.user_pseudo_id = d.user_pseudo_id AND b.session_id = d.session_id
JOIN clustering_metrics c ON b.user_pseudo_id = c.user_pseudo_id AND b.session_id = c.session_id
JOIN progression_metrics p ON b.user_pseudo_id = p.user_pseudo_id AND b.session_id = p.session_id

Implementation notes

  1. Session identity: GA4's ga_session_id is a timestamp and is not unique across users. user_pseudo_id is GA4's anonymous identifier for a visitor (based on their browser cookie). Always partition by both user_pseudo_id AND session_id to avoid mixing sessions from different visitors.
  2. Taxonomy join performance: REGEXP_CONTAINS joins are computationally expensive. For production use, materialise the taxonomy lookup as a pre-computed URL-to-metadata table (exact match on URL path) and reserve regex matching for a nightly batch update. This can reduce query costs by 10–100x on large event tables.
  3. Null handling: The COALESCE wrappers on taxonomy fields ensure untagged pages default to topic_cluster: 'General' and intent_weight: 0.5 rather than producing null scores. Monitor the volume of 'General' classifications. High volume indicates taxonomy debt.
  4. Calibration: The breadth and depth score thresholds above (e.g. "< 30 seconds = 3") are fixed-rule defaults (Section 3.4, Option A). Once you have 3+ months of data, replace them with percentile-based scoring by computing PERCENT_RANK() over the raw metrics and mapping the percentile to a 0–10 scale.
  5. Next step, state assignment: This query produces the four signal scores per session. To assign states, add a final CASE WHEN block applying the priority order from Section 7, Step 4, or export the scores to a downstream transformation layer (e.g. dbt, a SQL-based data transformation tool) for state classification and confidence scoring.