-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Closed
Description
Now annotation mode is separate from basic PageProcessor, but the parsing mode is useful in some cases.
I will try to write a independent PageMapper for just parsing. It can be used in PageParser.process.
public class GithubRepoPageProcessor implements PageProcessor {
private Site site = Site.me().setRetryTimes(3);
private PageMapper<GithubRepo> githubRepoPageMapper = new PageMapper<GithubRepo>(GithubRepo.class);
@Override
public void process(Page page) {
page.addTargetRequests(page.getHtml().links().regex("(https://github\\.com/\\w+/\\w+)").all());
page.addTargetRequests(page.getHtml().links().regex("(https://github\\.com/\\w+)").all());
GithubRepo githubRepo = githubRepoPageMapper.get(page);
page.putField("repo",githubRepo);
}
@Override
public Site getSite() {
return site;
}
public static void main(String[] args) {
Spider.create(new GithubRepoPageProcessor()).addUrl("https://github.com/code4craft").thread(5).run();
}
}