Market Analysis for Fashion from 20 Domains (Nordstrom, Sephora, Macy's, and more)
A blueprint for multi‑domain fashion analytics—catalog unification, attribute extraction, pricing and availability monitoring, and competitive insights in BigQuery.
Objectives
- Aggregate product catalogs from 20+ fashion and beauty retailers.
- Normalize product taxonomy and attributes to enable cross‑domain comparisons.
- Track pricing, promotions, and availability over time.
- Build dashboards for assortment gaps, pricing parity, and trend detection.
Target domains
Typical dataset includes these retailers (extend as needed):
- nordstrom.com
- sephora.com
- macys.com
- zara.com
- hm.com
- gap.com
- uniqlo.com
- bananarepublic.com
- oldnavy.com
- footlocker.com
- finishline.com
- dickssportinggoods.com
- backcountry.com
- farfetch.com
- ssense.com
- revolve.com
- asos.com
- urbanoutfitters.com
- bloomingdales.com
- shop.nike.com
Data model
Use a star schema with slowly changing facts:
- products (dimension): global_id, domain, url, title, brand, gender, category_path, colorways, materials, size_system, images, description, canonical_sku
- prices_hourly (fact): observed_at, global_id, domain, list_price, sale_price, currency, badges, promo_text
- availability_hourly (fact): observed_at, global_id, domain, availability, store_pickup, shipping_speed
- reviews_daily (fact): observed_on, global_id, rating, reviews_count
Catalog unification
- Entity resolution: Match products across domains using brand + normalized title + key attributes (material, silhouette, SKU hints) with fuzzy matching.
- Taxonomy mapping: Map each retailer’s categories to a unified fashion taxonomy (e.g., Men » Shoes » Sneakers).
- Variant handling: Collapse color/size variants into a parent product with variant attributes stored in arrays for flexible rollups.
KPIs and queries
- Assortment overlap: Percentage of overlapping SKUs across retailers by brand/category.
- Pricing parity: Median price difference by global_id and domain over time.
- Promo cadence: Frequency and depth of markdowns by brand and season.
- Newness velocity: Time from first seen to first sale across domains.
-- Example: brand-level pricing parity (last 30 days)
WITH latest AS (
SELECT global_id, domain, brand, DATE(observed_at) AS day,
ANY_VALUE(sale_price) KEEP (DENSE_RANK LAST ORDER BY observed_at) AS price
FROM project.dataset.prices_hourly
WHERE observed_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY global_id, domain, brand, day
)
SELECT brand,
APPROX_QUANTILES(price, 100)[OFFSET(50)] AS median_price,
COUNT(*) AS samples
FROM latest
GROUP BY brand
ORDER BY samples DESC
LIMIT 100;Delivery and refresh
BQPipeline delivers normalized tables to your BigQuery project and refreshes them on an agreed cadence (hourly for prices/availability; daily for reviews). Access is provisioned via IAM—no CSV uploads or custom pipelines required.
Getting started
- Share the retailer list and initial categories or URLs you care about.
- Confirm the attributes and KPIs you want in the first iteration.
- We set up extraction, normalization, and BigQuery delivery with monitoring.