Download and parse the history data of 繁星推薦, a college admission program in Taiwan.
usage: starcatcher.py [-h] [-s YEAR] [-w WORKERS] [-d DIRNAME]
options:
-h, --help show this help message and exit
-s, --start YEAR the year since which to download data (default: 105)
-w, --workers WORKERS
number of worker processes (default: CPU count)
-d, --dir DIRNAME directory to store output (default: out/)Download and parse the data from ROC year 109 (2020) with all available workers and save them to the default output directory out/:
$ python starcatcher.py -s 109out/
|-- pdf/
| |-- <year>_<school_id>.pdf
|-- json/
| |-- <year>_<school_id>.json
| |-- schools_<year>.json
<year>_<school_id>.pdf: the original PDF file from CAC.<year>_<school_id>.json: the parsed data from the PDF file.schools_<year>.json: a list of all schools in the given year.
docs/schema/department.json: schema for<year>_<school_id>.jsondocs/schema/schools.json: schema forschools_<year>.json
Verify the JSON schema with:
$ scripts/check-jsonschema.shPull requests are welcome! But if you want to change the data format, please open an issue first to discuss the changes.
This project is not affiliated with or endorsed by the College Admissions Committee (CAC) of Taiwan. This software is provided for educational and research purposes only. Please refer to the CAC's official website for the most accurate and up-to-date information regarding college admissions in Taiwan.