All Projects
professionaltooldesktop

JobScraper

Playwright browser automation with route optimization

About

A Python scraping engine that uses Playwright to automate browser sessions and extract fiber installation job data. Orchestrates 8-16 concurrent browser workers with async task management, parses job text into structured data, and feeds results into a VRPTW route optimizer powered by a VROOM solver with OSRM routing and Nominatim geocoding backends.

Features

  • Playwright browser automation with 8-16 concurrent worker orchestration
  • Job text parsing and structured data extraction pipeline
  • VRPTW route optimization via self-hosted VROOM solver
  • Self-hosted geocoding (Nominatim) and routing (OSRM) backends
  • Strategy pattern for swappable geocoding/routing backends with cloud fallbacks
  • 795+ automated tests with comprehensive coverage

Tech Stack

PythonPlaywrightPandasVROOMOSRMNominatim