Guide

Price Monitoring from Ecommerce Sites with Hourly Updates

Track price changes at scale, normalize product data, and push fresh snapshots to BigQuery every hour for reliable analytics, alerting, and experimentation.

Why hourly matters

Prices change fast—promotions, flash sales, stock replenishments, and competitor moves can shift within hours. Hourly monitoring captures these micro‑windows so your team can measure lift, react to competitors, and avoid stale insights.

Architecture overview

  • Discovery: Seed with product listing pages (PLPs) and sitemaps; expand via pagination and related items.
  • Extraction: Parse product cards and detail pages for title, brand, price, availability, variant attributes, and rich metadata (ratings, badges, promo labels).
  • Normalization: Standardize currency, price types (list, sale, compare‑at), and availability states; map categories into a unified taxonomy.
  • Snapshotting: Write immutable hourly snapshots to BigQuery with monotonic timestamps.
  • Indexing: Partition by ingestion date; cluster by domain and product id for fast filters.

BigQuery schema

We recommend a wide, analytics‑friendly schema:

project.dataset.hourly_prices
• observed_at (TIMESTAMP)
• domain (STRING)
• product_id (STRING)
• url (STRING)
• title (STRING)
• brand (STRING)
• category_path (STRING)
• list_price (NUMERIC)
• sale_price (NUMERIC)
• currency (STRING)
• availability (STRING) -- in_stock, out_of_stock, preorder
• rating (FLOAT64)
• reviews_count (INT64)
• promo_badges (ARRAY<STRING>)
• metadata (JSON)

For deduplication, use (domain, product_id, observed_at) as the unique key. Hourly refresh jobs upsert by this key to preserve history.

Analytics and alerting

  • Price delta tracking: Window functions to compute hour‑over‑hour or day‑over‑day changes.
  • Promotion effectiveness: Join with campaign calendars to estimate impact on conversion.
  • Stockout watch: Identify trending OOS items and back‑in‑stock events quickly.
  • Competitive parity: Compare own SKUs vs. competitors across list and sale prices.

Performance tips

  • Partition + cluster: Partition by DATE(observed_at), cluster by(domain, product_id) to cut scan costs.
  • Field pruning: Store verbose HTML only in cold storage; extract atomic fields for queries.
  • Backfills: Keep backfill tables separate, then merge into the main fact table.
  • Idempotency: Deterministic keys ensure repeats never duplicate rows.

Example query

SELECT domain, product_id, observed_at, list_price, sale_price,
       LAG(sale_price) OVER (PARTITION BY domain, product_id ORDER BY observed_at) AS prev_sale_price
FROM project.dataset.hourly_prices
WHERE domain = 'https://nordstrom.com'
  AND observed_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY observed_at DESC
LIMIT 100;

Implementation with BQPipeline

Provide a set of product or category URLs and a short description of the fields you need. We handle crawl, extraction, normalization, and hourly delivery to your BigQuery project—no CSV uploads, no pipelines to maintain.