Playbook

Market Analysis for Fashion from 20 Domains (Nordstrom, Sephora, Macy's, and more)

A blueprint for multi‑domain fashion analytics—catalog unification, attribute extraction, pricing and availability monitoring, and competitive insights in BigQuery.

Objectives

Aggregate product catalogs from 20+ fashion and beauty retailers.
Normalize product taxonomy and attributes to enable cross‑domain comparisons.
Track pricing, promotions, and availability over time.
Build dashboards for assortment gaps, pricing parity, and trend detection.

Target domains

Typical dataset includes these retailers (extend as needed):

nordstrom.com
sephora.com
macys.com
zara.com
hm.com
gap.com
uniqlo.com
bananarepublic.com
oldnavy.com
footlocker.com
finishline.com
dickssportinggoods.com
backcountry.com
farfetch.com
ssense.com
revolve.com
asos.com
urbanoutfitters.com
bloomingdales.com
shop.nike.com

Data model

Use a star schema with slowly changing facts:

products (dimension): global_id, domain, url, title, brand, gender, category_path, colorways, materials, size_system, images, description, canonical_sku
prices_hourly (fact): observed_at, global_id, domain, list_price, sale_price, currency, badges, promo_text
availability_hourly (fact): observed_at, global_id, domain, availability, store_pickup, shipping_speed
reviews_daily (fact): observed_on, global_id, rating, reviews_count

Catalog unification

Entity resolution: Match products across domains using brand + normalized title + key attributes (material, silhouette, SKU hints) with fuzzy matching.
Taxonomy mapping: Map each retailer’s categories to a unified fashion taxonomy (e.g., Men » Shoes » Sneakers).
Variant handling: Collapse color/size variants into a parent product with variant attributes stored in arrays for flexible rollups.

KPIs and queries

Assortment overlap: Percentage of overlapping SKUs across retailers by brand/category.
Pricing parity: Median price difference by global_id and domain over time.
Promo cadence: Frequency and depth of markdowns by brand and season.
Newness velocity: Time from first seen to first sale across domains.

-- Example: brand-level pricing parity (last 30 days)
WITH latest AS (
  SELECT global_id, domain, brand, DATE(observed_at) AS day,
         ANY_VALUE(sale_price) KEEP (DENSE_RANK LAST ORDER BY observed_at) AS price
  FROM project.dataset.prices_hourly
  WHERE observed_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY global_id, domain, brand, day
)
SELECT brand,
       APPROX_QUANTILES(price, 100)[OFFSET(50)] AS median_price,
       COUNT(*) AS samples
FROM latest
GROUP BY brand
ORDER BY samples DESC
LIMIT 100;

Delivery and refresh

BQPipeline delivers normalized tables to your BigQuery project and refreshes them on an agreed cadence (hourly for prices/availability; daily for reviews). Access is provisioned via IAM—no CSV uploads or custom pipelines required.

Getting started

Share the retailer list and initial categories or URLs you care about.
Confirm the attributes and KPIs you want in the first iteration.
We set up extraction, normalization, and BigQuery delivery with monitoring.