starcatcher

Download and parse the history data of 繁星推薦, a college admission program in Taiwan.

Usage

usage: starcatcher.py [-h] [-s YEAR] [-w WORKERS] [-d DIRNAME]

options:
  -h, --help            show this help message and exit
  -s, --start YEAR      the year since which to download data (default: 105)
  -w, --workers WORKERS
                        number of worker processes (default: CPU count)
  -d, --dir DIRNAME     directory to store output (default: out/)

Example

Download and parse the data from ROC year 109 (2020) with all available workers and save them to the default output directory out/:

$ python starcatcher.py -s 109

Data Format

out/
|-- pdf/
|   |-- <year>_<school_id>.pdf
|-- json/
|   |-- <year>_<school_id>.json
|   |-- schools_<year>.json

<year>_<school_id>.pdf: the original PDF file from CAC.
<year>_<school_id>.json: the parsed data from the PDF file.
schools_<year>.json: a list of all schools in the given year.

JSON Schema

docs/schema/department.json: schema for <year>_<school_id>.json
docs/schema/schools.json: schema for schools_<year>.json

Verify the JSON schema with:

$ scripts/check-jsonschema.sh

Contributing

Pull requests are welcome! But if you want to change the data format, please open an issue first to discuss the changes.

Disclaimer

This project is not affiliated with or endorsed by the College Admissions Committee (CAC) of Taiwan. This software is provided for educational and research purposes only. Please refer to the CAC's official website for the most accurate and up-to-date information regarding college admissions in Taiwan.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
cert		cert
docs/schemas		docs/schemas
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
starcatcher.py		starcatcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

starcatcher

Usage

Example

Data Format

JSON Schema

Contributing

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

starcatcher

Usage

Example

Data Format

JSON Schema

Contributing

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages