Problem
Some company career pages expose complete job data in server-rendered HTML or stable public payloads, but they do not have a scanner-supported ATS API. Today those companies fall back to agent/browser workflows, which are more expensive than a local structured parser.
Proposed Change
Add a local_parser scan source that lets scan.mjs execute an explicitly configured local parser command for a tracked company. The parser prints normalized jobs JSON to stdout, and scan.mjs applies the existing title filtering, deduplication, dry-run behavior, pipeline append, and scan history logic.
The local parser script performs the HTTP request to the career page/API and parses the response locally. That keeps the agent out of the scrape-deciphering loop, saving LLM tokens that would otherwise be spent reading page snapshots or deciding what to extract.
Example
Use Cohere as the example parser:
templates/portals.example.yml configures Cohere with scan_method: local_parser.
scripts/parsers/cohere_jobs.py reads Cohere jobs from the Ashby public board API and emits jobs-json-v1 compatible stdout.
- Generated JSON artifacts are kept under
data/parser-output/{company}/ and ignored except for .gitkeep placeholders.
Why This Helps
- Keeps SSR/static career page scanning zero-token.
- Avoids Playwright/browser scraping when a deterministic parser exists.
- Keeps parsers explicit in
portals.yml rather than auto-discovering executable files.
- Preserves existing scanner filtering and dedup behavior.
Acceptance Criteria
scan.mjs can run a configured local parser without shell interpolation.
- Parser stdout can be a JSON array,
{ jobs: [] }, or { results: [] }.
- Relative URLs resolve against
careers_url.
- Parser failures are reported without stopping the whole scan.
- Docs explain the parser contract and output artifact location.
Problem
Some company career pages expose complete job data in server-rendered HTML or stable public payloads, but they do not have a scanner-supported ATS API. Today those companies fall back to agent/browser workflows, which are more expensive than a local structured parser.
Proposed Change
Add a
local_parserscan source that letsscan.mjsexecute an explicitly configured local parser command for a tracked company. The parser prints normalized jobs JSON to stdout, andscan.mjsapplies the existing title filtering, deduplication, dry-run behavior, pipeline append, and scan history logic.The local parser script performs the HTTP request to the career page/API and parses the response locally. That keeps the agent out of the scrape-deciphering loop, saving LLM tokens that would otherwise be spent reading page snapshots or deciding what to extract.
Example
Use Cohere as the example parser:
templates/portals.example.ymlconfigures Cohere withscan_method: local_parser.scripts/parsers/cohere_jobs.pyreads Cohere jobs from the Ashby public board API and emitsjobs-json-v1compatible stdout.data/parser-output/{company}/and ignored except for.gitkeepplaceholders.Why This Helps
portals.ymlrather than auto-discovering executable files.Acceptance Criteria
scan.mjscan run a configured local parser without shell interpolation.{ jobs: [] }, or{ results: [] }.careers_url.