Skip to content

NatsuCamellia/starcatcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

starcatcher

Download and parse the history data of 繁星推薦, a college admission program in Taiwan.

Usage

usage: starcatcher.py [-h] [-s YEAR] [-w WORKERS] [-d DIRNAME]

options:
  -h, --help            show this help message and exit
  -s, --start YEAR      the year since which to download data (default: 105)
  -w, --workers WORKERS
                        number of worker processes (default: CPU count)
  -d, --dir DIRNAME     directory to store output (default: out/)

Example

Download and parse the data from ROC year 109 (2020) with all available workers and save them to the default output directory out/:

$ python starcatcher.py -s 109

Data Format

out/
|-- pdf/
|   |-- <year>_<school_id>.pdf
|-- json/
|   |-- <year>_<school_id>.json
|   |-- schools_<year>.json
  • <year>_<school_id>.pdf: the original PDF file from CAC.
  • <year>_<school_id>.json: the parsed data from the PDF file.
  • schools_<year>.json: a list of all schools in the given year.

JSON Schema

  • docs/schema/department.json: schema for <year>_<school_id>.json
  • docs/schema/schools.json: schema for schools_<year>.json

Verify the JSON schema with:

$ scripts/check-jsonschema.sh

Contributing

Pull requests are welcome! But if you want to change the data format, please open an issue first to discuss the changes.

Disclaimer

This project is not affiliated with or endorsed by the College Admissions Committee (CAC) of Taiwan. This software is provided for educational and research purposes only. Please refer to the CAC's official website for the most accurate and up-to-date information regarding college admissions in Taiwan.

About

Download and parse the history data of 繁星推薦, a college admission program in Taiwan.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors