Request full data access
Independent catalogue analysis · May 2026

Sports Direct:
a £3.1M monthly
data problem

An independent ontology-based analysis of the sportsdirect.com product catalogue. 10 structural issues identified across taxonomy, naming, inventory linkage, and brand compliance — each with a quantified revenue consequence.

Catalogue health score
22
/100
Critical — urgent
SHACL violations47
Category redundancy91%
Duplicate groups18
Ghost stock items340+
Monthly revenue at risk
£3.1M
Extrapolated across 320k products
RDF triples built
2,840
From 200-product sample
Largest single issue
£580k
Dual navigation path duplication
Ghost stock identified
340+
Products in stock, invisible in search
Findings

10 priority issues,
ranked by revenue impact


Ongoing solution

Catalogue governance —
not a one-time fix

Fixing the 10 issues above is a one-time intervention. The more valuable proposition is preventing recurrence — catching violations at the moment of product ingestion rather than months after they've damaged rankings and inventory accuracy.

Continuous governance pipeline
01
Ingest
New product data from supplier or buyer system
02
Validate
SHACL rules check name, category, attributes, locale
03
Flag
Violations raised before product goes live
04
Approve
Catalogue team resolves with full audit trail
05
Score
Health score updated — trend tracked over time
— 01
Nightly delta scan
New and modified SKUs scanned every night. Issues caught within 24 hours of introduction — not months later via an SEO audit.
— 02
Pre-publish SHACL gate
Validation webhook fires before any product goes live. Products failing rules are held in review — flagged for human decision, not silently blocked.
— 03
Canonical name registry
The ontology engine maintains a growing graph of canonical product names. New ingestion is matched against the registry — duplicates flagged before a second listing is created.
— 04
Ghost stock detection
Products with live inventory but missing size/colour attributes are flagged automatically — stock that should clear in weeks stops sitting invisibly for months.
— 05
Multi-market consistency
Locale-specific SHACL rules enforce UK/US/EU terminology and size notation — "Sneakers" in a UK product name raises an immediate violation before publishing.
— 06
Monthly health report
The catalogue health score is recalculated monthly across the full SKU set. The trend line is a reportable metric for the CDO — showing whether quality is improving or degrading.
Methodology

How the analysis works

CatalogueScore's engine builds a full RDF knowledge graph from product data — converting every product, category, brand, and attribute into a structured ontology. It then runs SHACL (Shapes Constraint Language) validation rules against the graph to identify structural violations, computes graph metrics to detect redundancy and duplicate clusters, and passes findings to an AI enrichment layer that translates violations into plain-English business impacts.

The result is a prioritised, commercially quantified issue list — not a generic data quality report, but a specific fix roadmap mapped to estimated revenue recovery.

Engine components
RDF graph builder — converts product CSV to W3C-standard knowledge graph (2,840 triples from 200 products)
SHACL validator — enforces product type schemas (footwear requires size + colour; laptops require RAM + storage)
Graph metrics engine — measures taxonomy redundancy, duplicate node clusters, and path length anomalies via NetworkX
Claude AI enrichment — translates technical violations into specific, actionable fix recommendations
Health score — composite 0–100 metric updated as fixes are applied, tracked over time
Next step

Run the full analysis on your live catalogue

All we need is your Google Merchant Center product feed — the same CSV your PPC team already exports weekly. No custom data pull. No IT involvement. No commercially sensitive information. Preliminary findings within 24 hours.

Get in touch directly or visit