Playbook

Market Analysis for Fashion from 20 Domains (Nordstrom, Sephora, Macy's, and more)

A blueprint for multi‑domain fashion analytics—catalog unification, attribute extraction, pricing and availability monitoring, and competitive insights in BigQuery.

Objectives

  • Aggregate product catalogs from 20+ fashion and beauty retailers.
  • Normalize product taxonomy and attributes to enable cross‑domain comparisons.
  • Track pricing, promotions, and availability over time.
  • Build dashboards for assortment gaps, pricing parity, and trend detection.

Target domains

Typical dataset includes these retailers (extend as needed):

  • nordstrom.com
  • sephora.com
  • macys.com
  • zara.com
  • hm.com
  • gap.com
  • uniqlo.com
  • bananarepublic.com
  • oldnavy.com
  • footlocker.com
  • finishline.com
  • dickssportinggoods.com
  • backcountry.com
  • farfetch.com
  • ssense.com
  • revolve.com
  • asos.com
  • urbanoutfitters.com
  • bloomingdales.com
  • shop.nike.com

Data model

Use a star schema with slowly changing facts:

  • products (dimension): global_id, domain, url, title, brand, gender, category_path, colorways, materials, size_system, images, description, canonical_sku
  • prices_hourly (fact): observed_at, global_id, domain, list_price, sale_price, currency, badges, promo_text
  • availability_hourly (fact): observed_at, global_id, domain, availability, store_pickup, shipping_speed
  • reviews_daily (fact): observed_on, global_id, rating, reviews_count

Catalog unification

  • Entity resolution: Match products across domains using brand + normalized title + key attributes (material, silhouette, SKU hints) with fuzzy matching.
  • Taxonomy mapping: Map each retailer’s categories to a unified fashion taxonomy (e.g., Men » Shoes » Sneakers).
  • Variant handling: Collapse color/size variants into a parent product with variant attributes stored in arrays for flexible rollups.

KPIs and queries

  • Assortment overlap: Percentage of overlapping SKUs across retailers by brand/category.
  • Pricing parity: Median price difference by global_id and domain over time.
  • Promo cadence: Frequency and depth of markdowns by brand and season.
  • Newness velocity: Time from first seen to first sale across domains.
-- Example: brand-level pricing parity (last 30 days)
WITH latest AS (
  SELECT global_id, domain, brand, DATE(observed_at) AS day,
         ANY_VALUE(sale_price) KEEP (DENSE_RANK LAST ORDER BY observed_at) AS price
  FROM project.dataset.prices_hourly
  WHERE observed_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY global_id, domain, brand, day
)
SELECT brand,
       APPROX_QUANTILES(price, 100)[OFFSET(50)] AS median_price,
       COUNT(*) AS samples
FROM latest
GROUP BY brand
ORDER BY samples DESC
LIMIT 100;

Delivery and refresh

BQPipeline delivers normalized tables to your BigQuery project and refreshes them on an agreed cadence (hourly for prices/availability; daily for reviews). Access is provisioned via IAM—no CSV uploads or custom pipelines required.

Getting started

  1. Share the retailer list and initial categories or URLs you care about.
  2. Confirm the attributes and KPIs you want in the first iteration.
  3. We set up extraction, normalization, and BigQuery delivery with monitoring.