← Work

Designing analytics infrastructure that measures audience quality, not just traffic

1,114 words Filed in: analytics, content strategy, information architecture

Woodcut-style print of a single white water droplet suspended above a dark choppy ocean, with concentric ripples spreading out beneath it Image made with FLUX.2-dev.
A single drop above dark water -- one precise signal amid the noise.

24+ websites, no existing benchmarks, and a fundamental question: how do you know if the right people are finding your content?

Clarity | Audience Quality | Strategic Alignment

Context The United Nations Office for Disaster Risk Reduction (UNDRR) is the UN's focal point for disaster risk reduction, coordinating global policy and supporting Member States to reduce disaster risk and losses.

Traditional web analytics answers "how many people visited?" For a UN organization producing policy guidance, technical resources, and knowledge products, that's the wrong question. The right question: did the people who need this content actually find it?

I led the design and implementation of analytics infrastructure across UNDRR's web ecosystem — 24+ domains spanning Drupal sites, Java applications, and static microsites — to answer that. The challenge wasn't primarily technical. It was conceptual: what does success mean for content where impact happens outside your analytics?

What was the actual problem?#

No analytics benchmarks exist for organizations like UNDRR. The widely-cited "16% social traffic" figure comes from fundraising nonprofits running emotional campaigns. For a technical policy organization where audiences search for "Sendai Framework implementation guidance," that benchmark steers you wrong. I needed to figure out what actually applied — and write down what didn't.

The measurement model was broken in a different way. A 50% bounce rate on a terminology definition often means the user found their answer. The same rate on a landing page means something failed. Without classifying content by type, aggregate metrics obscure more than they reveal.

Then there's the volume problem. 10,000 visits from random browsers isn't worth more than 100 visits from government ministries using the content to shape national policy. Traditional analytics gives you geography and device types. It doesn't tell you whether the right people showed up.

Some of the most important impact never touches analytics at all. A researcher downloads a dataset and cites it in a paper six months later. A ministry official reads guidance and it shapes a national strategy. Neither event shows up as a conversion.

How did I approach this?#

I started with benchmarks — or rather, with finding out they didn't exist for our context. I synthesized data from M+R Benchmarks, RKD Group studies (2,000+ nonprofits), First Page Sage, and Similarweb, then worked out which findings actually applied. One thing that came up: only 8% of nonprofit websites pass Core Web Vitals on mobile. I documented confidence levels for each benchmark category and noted the gaps — no UN-system analytics data exists anywhere.

Content-type classification underpins everything else. I established metatag conventions (vf:page-type) that tag every page by type. The tracking script reads these on page load and sends them to GA4 as a custom dimension, so you can apply different success criteria to different content. Reference pages get different expectations than landing pages, which get different expectations than event pages.

For audience quality, I built an ASN lookup: a server-side endpoint converts IP addresses to network categories (government, academic, intergovernmental, NGO, commercial) and sends the category to GA4. Not the specific organization — just the type, which keeps it GDPR-friendly while still answering the real question: are the visitors we're getting the visitors we're trying to reach?

I also classified 200+ referrer domain patterns into 14 strategic buckets — UN System, Government, Academic, DRR Community, Traditional Media, AI Chatbots, and others — each carrying a weight reflecting how much that traffic matters to the mission. An "impact score" multiplies sessions by weight. 100 government visits (5×) ranks higher than 400 unclassified visits (0.5×). Raw session counts stop being the main signal.

Topic-level analysis came from taxonomy metatags (vf:page-terms) covering hazards, themes, SDGs, and Sendai Framework priorities. The backend aggregates by topic, so you can ask "which hazards drive the most search demand?" rather than "which pages were popular last month?" A lightweight "Was this page helpful?" widget sends clicks to GA4 with page container context — most useful for low-traffic content where engagement metrics are too thin to read.

The architecture follows a three-layer model. Content layer: Drupal, Java apps, and static microsites all emit the same metatag conventions. Tracking layer: one shared script, one GA4 property, per-domain data streams. Consumption layer: dashboard, browser extension, Looker Studio — all reading the same source. Adding a new site means implementing metatags. The infrastructure doesn't change.

The single-property decision took some defending. Separate GA4 properties per site would have been simpler to manage, but would have fragmented the data. With one property and hostname filtering, the same query answers "how's PreventionWeb doing?" or "how's the whole ecosystem doing?" Cross-site user journeys become visible for the first time.

What can we now answer?#

58% of traffic comes from identifiable institutions — government, academic, UN system, research networks. That answers "are the right people finding policy guidance?" in a way raw page views can't.

Topic aggregation surfaces patterns that page-level data hides. Flood content dominates hazard interest; Early Warning leads on themes. That feeds content planning decisions, not just retrospective reporting.

A terminology page with 40% engagement isn't "underperforming" — it's what quick-reference content looks like. Content-type benchmarks mean the dashboard compares each page against the right expectations, not a generic industry average.

On direct feedback: terminology pages run 78% helpful, publications 65%. Pages that drop below threshold get flagged. Not a perfect signal, but a real one for content where traffic is too low for engagement rates to tell you much.

What did I learn?#

I built 18 dashboard screens before asking which ones stakeholders would actually use. I should have done the user research first. That sequence mistake cost real time.

We also set targets before documenting baselines. Now we're establishing them retroactively. The order should have been: observe, then set expectations.

Some impact just isn't measurable through analytics. Download clicks don't tell you whether anyone read the PDF. Policy influence doesn't generate a GA4 event. I documented these gaps explicitly — better to set that expectation early than have people treat the dashboard as a universal answer.

I underestimated the research. Deciding which benchmarks apply, which don't, and writing down the reasoning — that became as useful as the tracking implementation itself. Without it, you're measuring against whatever number was easiest to find. That number is usually wrong for your context.