1、模拟Get请求爬取Html
1 2 3 4 5 6 7 8
| CloseableHttpClient httpClient =HttpClients.createDefault(); HttpGet get = new HttpGet("http://192.168.100.2:8080"); CloseableHttpResponse response = httpclient.execute(get); HttpEntity entity = response.getEntity(); if (entity != null) { System.out.println(EntityUtils.toString(entity)); } response.close();
|
2、模拟Post请求登录
2.1、登陆原理
这里首先要理解WEB项目是如何识别用户已经登录的。一般情况下,用户登录WEB项目后,WEB项目会将用户的登录信息保存在session中用以识别用户是否已经登录。那么WEB项目又是如何将不同用户不同浏览器的请求与在服务器端保存的session相匹配的呢?
答案是cookie。
用户浏览器访问WEB服务器后 ,默认会向浏览器写入名为JSESSIONID的cookie。当用户请求服务器后,服务器会读取该cookie的值,用于匹配出用户对应的session。
所以,爬虫模拟登陆最关键的部分是保存cookie。
2.2、HttpClient默认管理cookie
幸运的是,HttpClient4.x的版本已经默认自动保存发送cookie,基本不需要开发者管理cookie。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| CloseableHttpClient httpClient =HttpClients.createDefault();
HttpPost post = new HttpPost("http://192.168.100.2:8080"); List<NameValuePair> params = new ArrayList<NameValuePair>(); params.add(new BasicNameValuePair("username", "")); params.add(new BasicNameValuePair("password", "")); params.add(new BasicNameValuePair("roleId", "3")); post.setEntity(new UrlEncodedFormEntity(params, "UTF-8")); httpclient.execute(post);
HttpGet get = new HttpGet("http://192.168.100.2:8080/index"); CloseableHttpResponse response = httpclient.execute(get); HttpEntity entity = response.getEntity(); if (entity != null) { System.out.println(EntityUtils.toString(entity)); } response.close();
|
2.3、使用HttpClient的CookieStore管理cookie
1 2 3
| CookieStore cookieStore = new BasicCookieStore(); CloseableHttpClient httpclient= HttpClientBuilder.create().setDefaultCookieStore(cookieStore).build();
|
3、模拟下载文件
下载和get请求爬取HTML同理,只不过是以流的形式获取
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| HttpGet httpGet = new HttpGet("http://192.168.100.2:8080/file/download?file_id=24"); CloseableHttpResponse response = httpclient.execute(httpGet); File file=new File("D:/test.doc"); InputStream in = response.getEntity().getContent(); OutputStream out = new FileOutputStream(file); int len; byte[] tmp = new byte[1024]; while ((len = in.read(tmp)) != -1) { out.write(tmp, 0, len); } out.close(); in.close(); response.close();
|
4、模拟上传文件
由于没有现成的系统可以上传文件,想测试的可以自己新建一个WEB项目实现上传功能。
1 2 3 4 5 6 7 8 9 10 11
| HttpPost httpPost = new HttpPost("上传地址"); FileBody bin = new FileBody(new File("上传文件")); HttpEntity reqEntity = MultipartEntityBuilder.create() .setMode(HttpMultipartMode.BROWSER_COMPATIBLE) .addPart("uploadFile", bin) .setCharset(CharsetUtils.get("UTF-8")).build(); httpPost.setEntity(reqEntity); CloseableHttpResponse response = httpclient.execute(httpPost); String html = EntityUtils.toString(response.getEntity()); response.close(); System.out.println(html);
|
1 2 3 4 5 6 7 8
| HttpClient httpClient = HttpClients.createDefault(); HttpPost post = new HttpPost(url); post.setHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_FORM_URLENCODED_VALUE); List<NameValuePair> form = new LinkedList<>(); form.add(new BasicNameValuePair("message", content)); UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(form); post.setEntity(urlEncodedFormEntity); HttpResponse response = httpClient.execute(post);
|