Case Study

Aggregating Beauty Products + Skin-Type Reviews to Power LLM Recommendations

A startup unifies beauty catalogs from Sephora, Estée Lauder, Amazon, Nordstrom, Macy's, and Ulta, then extracts skin‑type and condition signals from reviews to train an LLM for precise product matching.

Objective

Aggregate beauty products and variants across leading marketplaces and brand sites.
Mine user reviews for skin type (oily, dry, combination, sensitive, normal) and conditions (acne, redness, dark spots, fine lines).
Build a high‑quality training dataset and LLM prompts to recommend the right product per user profile.

Sources covered

sephora.com
esteelauder.com
amazon.com (Beauty)
nordstrom.com (Beauty)
macys.com (Beauty)
ulta.com

Each domain has unique taxonomies, attribute naming, and review structures. BQPipeline normalizes these into a consistent schema with strong keys and clear lineage.

Unified data model

Recommended BigQuery tables for analytics + ML:

project.dataset.products
• global_id (STRING) -- cross-domain product id
• domain (STRING)
• url (STRING)
• title (STRING)
• brand (STRING)
• category_path (STRING)
• ingredients (STRING)
• attributes (JSON) -- spf, finish, coverage, fragrance_free, etc.
• variants (ARRAY<STRUCT<sku STRING, shade STRING, size STRING>>)

project.dataset.prices_hourly
• observed_at (TIMESTAMP)
• global_id (STRING)
• domain (STRING)
• list_price (NUMERIC)
• sale_price (NUMERIC)
• currency (STRING)

project.dataset.reviews
• observed_on (DATE)
• global_id (STRING)
• rating (FLOAT64)
• text (STRING)
• skin_type (STRING) -- extracted
• conditions (ARRAY<STRING>) -- extracted, e.g., ['acne','redness']
• age_bracket (STRING)
• helpful_votes (INT64)

Partition prices_hourly by DATE(observed_at) and reviews byobserved_on; cluster both by (domain, global_id) for efficient filters.

Review enrichment: skin type and conditions

Extraction: Parse on‑site structured fields (when present) and complement with NLP to detect mentions like “oily T‑zone”, “sensitive skin”, “post‑acne marks”.
Normalization: Map synonyms into canonical values (e.g., “combo” → “combination”).
Quality: Confidence scores and rule‑based overrides reduce false positives.

-- Example: skin-type sentiment by product
SELECT global_id,
       skin_type,
       AVG(rating) AS avg_rating,
       COUNT(*) AS reviews
FROM project.dataset.reviews
WHERE skin_type IS NOT NULL
GROUP BY global_id, skin_type
ORDER BY reviews DESC;

LLM training dataset

Construct supervised pairs that connect a user profile to product outcomes:

project.dataset.training_examples
• example_id (STRING)
• user_profile (JSON) -- {skin_type:'oily', conditions:['acne']}
• context (JSON) -- top attributes, ingredients, price range
• candidate_products (ARRAY<STRING>) -- top N by retrieval
• preferred_product (STRING) -- label from review outcomes
• rationale (STRING) -- optional explanation text

Retrieval: Use vector or keyword retrieval over products + attributes to form candidate sets.
Labeling: Derive soft labels from reviews and rating uplift for matching skin profiles.
Prompting: Provide user profile + candidate attributes; ask the LLM to select and justify.

Recommendation flow

Collect user inputs: skin type, conditions, budget, preferences (fragrance‑free, vegan, SPF).
Retrieve candidate products by filtered attributes + embeddings.
LLM re‑ranks candidates conditioned on profile + historic outcomes.
Return top picks with explanations and ingredient callouts.

Results

Catalog coverage: 1.2M+ product variants unified across 6 domains.
Enriched reviews: 68% of reviews mapped to skin type; 42% with condition tags.
LLM uplift: +18% CTR and +11% conversion vs. baseline popularity ranking.
Latency: p95 recommendation under 300 ms with cached retrieval.

Privacy and compliance

Respect site robots and terms; use allowed access patterns.
Store only public review text and anonymous aggregates; no PII.
Apply domain‑level throttling and audit logging for all crawls.

Implement with BQPipeline

Share the domains and categories you need. We set up ingestion, normalization, review enrichment, and deliver ready‑to‑query BigQuery tables—no pipelines to maintain. From there, you can plug retrieval + an LLM to power skin‑type‑aware recommendations in days, not weeks.