Commit Graph

5 Commits

Author SHA1 Message Date
mariosemes
0e2e8d1766 Pre-launch Chromium on server startup to avoid cold-start blocking
Chromium cold launch takes several seconds and blocks the event
loop, preventing SSE events from flushing. Now the browser is
warmed up during server startup if any store uses render_js,
so the first search doesn't pay the launch penalty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:42:15 +01:00
mariosemes
80335d213c Add step-by-step logging to browser scraper and skip HTML capture
Logs each phase (launch, navigate, wait selector, extract, close)
so we can diagnose where Puppeteer gets stuck. Also skips the
expensive page.content() call since full HTML is only needed
for the test endpoint, not search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:40:23 +01:00
mariosemes
75b8759805 Fix Puppeteer hanging by using domcontentloaded instead of networkidle2
networkidle2 waits for all network activity to settle, which hangs
on sites with analytics, trackers, and websockets. domcontentloaded
fires much earlier, then waitForSelector handles the dynamic content.
HG Spot now completes in ~2.5s instead of timing out.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:26:16 +01:00
mariosemes
fe56c3b17e Lazy-load puppeteer to fix tsx watch hanging on startup
Puppeteer import at top level was blocking tsx watch mode,
preventing the server from starting. Now imported dynamically
only when a JS-rendered store is actually scraped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:08:09 +01:00
mariosemes
130ab30fcc Add Puppeteer browser scraping and HG Spot store config
- Add browser-scraper.ts using Puppeteer for JS-heavy stores
- Add render_js flag to store model, migration, YAML sync, and UI
- Scraper engine auto-selects cheerio vs Puppeteer based on flag
- Store forms include JS rendering toggle in Advanced section
- Create first store config: HG Spot (Croatian electronics retailer)
- Update Dockerfile with Chromium for production Puppeteer support

Tested: HG Spot returns 15 products per page with correct names,
prices (EUR), links, and images using headless browser rendering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 21:36:20 +01:00