Back to Library
E-Commerce

dSide

The UK's first AI-powered DIY, hardware and power tools price and product comparison platform — simultaneously monitoring 304 UK e-commerce traders in real time via a distributed data collection cluster, processing 50M+ price events per month through a Kafka/Flink streaming pipeline, and deploying transformer-based AI for demand forecasting, buyer behaviour modelling, and nationwide trade intelligence across every product category.

Completed 2024-09-01
Sector E-Commerce
Tech Stack BERT, TFT, Collaborative Filtering, GraphRAG, Kafka, Flink, Triton, pgvector
dSide screenshot
Overview
dSide is the United Kingdom's first dedicated price and product comparison platform for DIY tools, power tools, hand tools, hardware, garden tools, construction materials, electricals, and consumables. Its public-facing comparison layer sits on top of a comprehensive real-time trade intelligence engine monitoring the entire UK DIY and hardware e-commerce market.
In Depth
01 Data Collection Infrastructure

dSide operates a distributed data collection cluster that continuously harvests live trade data from 304 verified UK e-commerce retailers, marketplaces, and distributors. The collection layer is built on a horizontally scalable Playwright + Scrapy cluster running on Kubernetes (GKE Autopilot), with browser-automation workers for JavaScript-heavy storefronts and lightweight HTTP workers for structured product feeds.

Each data point emits structured product events — SKU, price, availability, seller, delivery option, promotional status, review count — to Apache Kafka topics (partitioned by retailer and category). Peak ingestion: 2M+ product listing updates per day, 50M+ price point events per month. Kafka Streams handles exactly-once deduplication and event schema validation before downstream processing.

02 Real-Time Processing Pipeline

Apache Flink stateful stream processing jobs consume the Kafka topics and perform:
• Price change detection: stateful comparison against last-known price per (retailer, SKU) tuple, emitting price drop / price rise events with delta magnitude.
• Availability monitoring: real-time out-of-stock and restock detection across all 304 data points.
• Competitive pricing analytics: rolling min/max/mean price windows per product across all retailers, updated on every price event.
• Category trend signals: Flink sliding-window aggregations compute category-level demand velocity — products with accelerating price increases (supply constraint signal) vs. falling prices (oversupply signal) — in real time.

All Flink output lands in Apache Iceberg tables on Google Cloud Storage (columnar, time-partitioned), queryable via BigQuery for ad-hoc analysis, and replicated to PostgreSQL + TimescaleDB for the live serving layer.

03 AI Models
  • Product Matching & Classification: A BERT-based model (custom fine-tuned on 15M UK product titles across DIY categories) normalises heterogeneous product titles from 304 retailers into a canonical product taxonomy (4,200 leaf categories), enabling accurate cross-retailer price comparison at the SKU level. Embedding similarity (cosine over 768-dim vectors stored in pgvector) resolves ambiguous title matches.
  • Demand Forecasting: A Temporal Fusion Transformer (TFT) — selected for its interpretable variable importance weights — forecasts weekly demand per (product, region) pair. Input covariates include historical sales velocity, price elasticity estimates, seasonality encodings, macroeconomic indicators (UK CPI, construction sector PMI), and weather (for garden tools categories). Regional forecasts cover 12 UK statistical regions at postcode-district granularity.
  • Buyer Behaviour Modelling: A neural collaborative filtering model jointly embeds buyers and products in a 256-dimensional latent space. Cross-product purchase patterns reveal habitual basket compositions (e.g., drill + bit set + anchors co-purchase), enabling targeted cross-sell recommendations and market basket analysis for brand intelligence clients.
  • Trend & Sentiment Intelligence: GraphRAG deployed over a product–brand–seller–category knowledge graph enables multi-hop queries across the market — "identify brands gaining search share in cordless tools in Yorkshire Q1 vs. Q4." RAG retrieval augments LLM-generated natural language trend summaries with live price and availability data from the pgvector store.
04 Inference Infrastructure

Model serving runs on NVIDIA H100 SXM5 clusters via NVIDIA Triton Inference Server. The BERT product matcher runs as a TensorRT-optimised engine with INT8 quantisation, achieving 4,200 classifications/second per GPU. TFT demand forecasts are generated in batch overnight via vLLM with continuous batching, refreshed for 50,000 (product, region) pairs in under 20 minutes. Flash Attention 3 is applied to long-context product description encoding, reducing memory footprint by 3× vs. standard attention.

05 Market Intelligence Platform

dSide surfaces this intelligence via a B2B analytics dashboard for brands, distributors, and retailers:
• Nationwide and regional price positioning heatmaps.
• Seller performance benchmarking (share of lowest price, in-stock rate, delivery SLA compliance).
• Brand velocity scores — a composite metric combining price share, availability, and review velocity.
• Category demand calendars — predicted demand peaks by product type and region for inventory planning.
• Buyer habit reports — aggregated, anonymised purchasing pattern analysis for market research.

The combined intelligence layer is used to actively sharpen, increase, and improve trade across the UK market — connecting supply with demand more efficiently at every level of the stack.

Technology Stack
Playwright + Scrapy (distributed scraping cluster) Apache Kafka Apache Flink (stateful streaming) Apache Iceberg + BigQuery PostgreSQL + TimescaleDB pgvector + pgvectorscale BERT (product classification) Temporal Fusion Transformer (demand forecasting) Neural Collaborative Filtering GraphRAG Neo4j (knowledge graph) NVIDIA H100 SXM5 Triton Inference Server TensorRT INT8 vLLM Flash Attention 3 GKE Autopilot Google Vertex AI.
Build something like this?
Tell us about your project and we'll design a tailored solution.