Skip to content

建议内置支持Post请求的Downloader #108

@usenrong

Description

@usenrong

黄大你好,最近在项目中使用webmagic,采集页面时遇到很多需要Post请求的页面,扩展了下webmagic 的HttpClientDownloader代码如下

 protected HttpUriRequest getHttpUriRequest(Request request, Site site, Map<String, String> headers) {
        RequestBuilder requestBuilder = null;
        if (request.getExtra("isPost")!=null){ //post请求
            requestBuilder = RequestBuilder.post().setUri(request.getUrl());
            NameValuePair[]  nameValuePair = (NameValuePair[]) request.getExtra("nameValuePair");
            if (nameValuePair.length>0 ) {
                requestBuilder.addParameters(nameValuePair);
            }

        }else {//get请求
            requestBuilder =  RequestBuilder.get().setUri(request.getUrl());
        }

        if (headers != null) {
            for (Map.Entry<String, String> headerEntry : headers.entrySet()) {
                requestBuilder.addHeader(headerEntry.getKey(), headerEntry.getValue());
            }
        }
        RequestConfig.Builder requestConfigBuilder = RequestConfig.custom()
                .setConnectionRequestTimeout(site.getTimeOut())
                .setSocketTimeout(site.getTimeOut())
                .setConnectTimeout(site.getTimeOut())
                .setCookieSpec(CookieSpecs.BEST_MATCH);
        if (site != null && site.getHttpProxy() != null) {
            requestConfigBuilder.setProxy(site.getHttpProxy());
        }
        requestBuilder.setConfig(requestConfigBuilder.build());
        return requestBuilder.build();
    }

isPost 和请求参数数组放在request的属性里可能更优雅些。

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions