dSide
The UK's first AI-powered DIY, hardware and power tools price and product comparison platform — simultaneously monitoring 304 UK e-commerce traders in real time via a distributed data collection cluster, processing 50M+ price events per month through a Kafka/Flink streaming pipeline, and deploying transformer-based AI for demand forecasting, buyer behaviour modelling, and nationwide trade intelligence across every product category.
dSide operates a distributed data collection cluster that continuously harvests live trade data from 304 verified UK e-commerce retailers, marketplaces, and distributors. The collection layer is built on a horizontally scalable Playwright + Scrapy cluster running on Kubernetes (GKE Autopilot), with browser-automation workers for JavaScript-heavy storefronts and lightweight HTTP workers for structured product feeds.
Each data point emits structured product events — SKU, price, availability, seller, delivery option, promotional status, review count — to Apache Kafka topics (partitioned by retailer and category). Peak ingestion: 2M+ product listing updates per day, 50M+ price point events per month. Kafka Streams handles exactly-once deduplication and event schema validation before downstream processing.
Apache Flink stateful stream processing jobs consume the Kafka topics and perform:
• Price change detection: stateful comparison against last-known price per (retailer, SKU) tuple, emitting price drop / price rise events with delta magnitude.
• Availability monitoring: real-time out-of-stock and restock detection across all 304 data points.
• Competitive pricing analytics: rolling min/max/mean price windows per product across all retailers, updated on every price event.
• Category trend signals: Flink sliding-window aggregations compute category-level demand velocity — products with accelerating price increases (supply constraint signal) vs. falling prices (oversupply signal) — in real time.
All Flink output lands in Apache Iceberg tables on Google Cloud Storage (columnar, time-partitioned), queryable via BigQuery for ad-hoc analysis, and replicated to PostgreSQL + TimescaleDB for the live serving layer.
- Product Matching & Classification: A BERT-based model (custom fine-tuned on 15M UK product titles across DIY categories) normalises heterogeneous product titles from 304 retailers into a canonical product taxonomy (4,200 leaf categories), enabling accurate cross-retailer price comparison at the SKU level. Embedding similarity (cosine over 768-dim vectors stored in pgvector) resolves ambiguous title matches.
- Demand Forecasting: A Temporal Fusion Transformer (TFT) — selected for its interpretable variable importance weights — forecasts weekly demand per (product, region) pair. Input covariates include historical sales velocity, price elasticity estimates, seasonality encodings, macroeconomic indicators (UK CPI, construction sector PMI), and weather (for garden tools categories). Regional forecasts cover 12 UK statistical regions at postcode-district granularity.
- Buyer Behaviour Modelling: A neural collaborative filtering model jointly embeds buyers and products in a 256-dimensional latent space. Cross-product purchase patterns reveal habitual basket compositions (e.g., drill + bit set + anchors co-purchase), enabling targeted cross-sell recommendations and market basket analysis for brand intelligence clients.
- Trend & Sentiment Intelligence: GraphRAG deployed over a product–brand–seller–category knowledge graph enables multi-hop queries across the market — "identify brands gaining search share in cordless tools in Yorkshire Q1 vs. Q4." RAG retrieval augments LLM-generated natural language trend summaries with live price and availability data from the pgvector store.
Model serving runs on NVIDIA H100 SXM5 clusters via NVIDIA Triton Inference Server. The BERT product matcher runs as a TensorRT-optimised engine with INT8 quantisation, achieving 4,200 classifications/second per GPU. TFT demand forecasts are generated in batch overnight via vLLM with continuous batching, refreshed for 50,000 (product, region) pairs in under 20 minutes. Flash Attention 3 is applied to long-context product description encoding, reducing memory footprint by 3× vs. standard attention.
dSide surfaces this intelligence via a B2B analytics dashboard for brands, distributors, and retailers:
• Nationwide and regional price positioning heatmaps.
• Seller performance benchmarking (share of lowest price, in-stock rate, delivery SLA compliance).
• Brand velocity scores — a composite metric combining price share, availability, and review velocity.
• Category demand calendars — predicted demand peaks by product type and region for inventory planning.
• Buyer habit reports — aggregated, anonymised purchasing pattern analysis for market research.
The combined intelligence layer is used to actively sharpen, increase, and improve trade across the UK market — connecting supply with demand more efficiently at every level of the stack.