1. ホーム
  2. Java

JSoupは、新バージョンの正方学務システム(イントラネット-学務システム)にログインし、情報処理の詳細をクロールするシミュレーションを行います。

2022-02-21 06:08:58
<パス

正方学システムの新バージョンのログイン画面。

目次

I. ニーズ分析

  教務システムにアクセスし、授業スケジュールの成績などをクローリングして、書いたアプリに表示する必要があります。学務システムにアクセスするには、キャンパスネットワークに接続する必要があるため、このクロールでは "intranet-academic-system"。 2段階クロール戦略、すなわち、まず学内のイントラネットにログインするシミュレーションを行い、次にイントラネットのクッキーで学務システムにログインし、最後に関連情報をクロールする。

II. イントラネットへの擬似ログイン

イントラネットのログイン画面。

のURLです。

https://webvpn.ncepu.edu.cn/users/sign_in

主な手順

  1. ユーザー名とログインパスワードを入力し、F12キーを押して、Elementsを検索してアクションを実行します。

    入力したフォームデータが最終的に "/users/sign_in" に送信されたことがわかります。

  2. ログインをクリックし、Network内でsign_inを見つけると、ログインをシミュレートするために必要なさまざまな情報の断片が表示されます。

  3. コードを書き始める。

Connection connection = Jsoup.connect(URL);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0. 4103.61 Safari/537.36");
Response res = connection.execute(); //get res.cookies(), to be used later
Document d = Jsoup.parse(res.body());
List
 elements = d.select("form");
Map
Information such as USER_AGENT is in here.


We can print the datas. {user[dymatice_code]=unknown, utf8=? , commit=login Login, user[login]=mosaic, user[password]=mosaic, authenticity_token=+BD3FgRXj+ LsvgUpS81EKyU7SOF1B6eshSzfo3aMOSHD3LoMsx8ZP85vWNbm1PbPJGbgJqHVbFkTvHuSzDwI8A==} The second step is to submit the form information as well as cookies for a simulated login: the Connection connection2 = Jsoup.connect("https://webvpn.ncepu.edu.cn/users/sign_in"); connection2.header(USER_AGENT, USER_AGENT_VALUE); Response response = connection2.ignoreContentType(true).followRedirects(true).method(Method.POST).data(datas).cookies(res.cookies()). execute(); Last step: print the obtained html and the obtained cookies.
System.out.println(response.body());
Map
III. Simulation of logging into the academic system
We simulated logging in to the intranet interface at


Now we want to simulate logging in to this web page of the new academic system by going to its login page, which is the interface given at the beginning of the article.

The main steps are as follows. Follow the simulation of logging into the campus intranet to see what form data needs to be submitted, and instead of demonstrating it here, go straight to the code at //login public boolean beginLogin() throws Exception{ connection = Jsoup.connect(url+ "/jwglxt/xtgl/login_slogin.html").cookies(cookies_innet); connection.header("Content-Type","application/x-www-form-urlencoded;charset=utf-8"); connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0. 4103.61 Safari/537.36"); connection.data("csrftoken",csrftoken); connection.data("yhm",stuNum); connection.data("mm",password); connection.data("mm",password); connection.cookies(cookies).ignoreContentType(true) .method(Connection.Method.POST).execute(); response = connection.followRedirects(true).execute(); document = Jsoup.parse(response.body()); //login successful //System.out.println(document); if(document.getElementById("tips") == null){ System.out.println("Welcome to login"); System.out.println(response.cookies()); return true; }else{ System.out.println(document.getElementById("tips").text()); System.out.println(response.cookies()); return false; } } The cookie_innet inside the code is the cookie obtained by simulating a login to the intranet. The csrftoken needs to be obtained additionally, and also the password in this is encrypted, so we also need to obtain the password after encrypting the currently entered password, with the following code. // Get csrftoken and cookies with no errors private void getCsrftoken(){ try{ connection = Jsoup.connect(url+ "/jwglxt/xtgl/login_slogin.html?language=zh_CN&_t="+new Date().getTime()).cookies( cookies_innet); connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0. 4103.61 Safari/537.36"); response = connection.followRedirects(true).execute(); cookies = response.cookies(); // save csrftoken document = Jsoup.parse(response.body()); csrftoken = document.getElementById("csrftoken").val(); }catch (Exception ex){ ex.printStackTrace(); } } // Get the public key and encrypt the password public void getRSApublickey() throws Exception{ connection = Jsoup.connect(url+ "/jwglxt/xtgl/login_getPublicKey.html?" + "time="+ new Date().getTime()).cookies(cookies_innet); connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0. 4103.61 Safari/537.36"); response = connection.cookies(cookies).ignoreContentType(true).followRedirects(true).execute(); JSONObject jsonObject = JSON.parseObject(response.body()); modulus = jsonObject.getString("modulus"); exponent = jsonObject.getString("exponent"); password = RSAEncoder.RSAEncrypt(password, B64.b64tohex(modulus), B64.b64tohex(exponent)); password = B64.hex2b64(password); } Additional B64.java with RSAEncoder.java code. import static java.lang.Integer.parseInt; public class B64 { public static String b64map="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; private static char b64pad = '='; private static String hexCode = "0123456789abcdef"; // get the corresponding hexadecimal characters public static char int2char(int a){ return hexCode.charAt(a); } // Base64 to hex public static String b64tohex(String s) { String ret = ""; int k = 0; int slop = 0; for(int i = 0; i < s.length(); ++i) { if(s.charAt(i) == b64pad) break; int v = b64map.indexOf(s.charAt(i)); if(v < 0) continue; if(k == 0) { ret += int2char(v > > 2); slop = v & 3; k = 1; } else if(k == 1) { ret += int2char((slop << 2) | (v >> 4)); slop = v & 0xf; k = 2; } else if(k == 2) { ret += int2char(slop); ret += int2char(v > > 2); slop = v & 3; k = 3; } else { ret += int2char((s return ret; } // hexadecimal to Base64 public static String hex2b64(String h) { int i , c; StringBuilder ret = new StringBuilder(); for(i = 0; i+3 <= h.length(); i+=3) { c = parseInt(h.substring(i,i+3),16); ret.append(b64map.charAt(c >> 6)); ret.append(b64map.charAt(c & 63)); } if(i+1 == h.length()) { c = parseInt(h.substring(i,i+1),16); ret.append(b64map.charAt(c << 2)); } else if(i+2 == h.length()) { c = parseInt(h.substring(i,i+2),16); ret.append(b64map.charAt(c >> 2)); ret.append(b64map.charAt((c & 3) << 4)); } while((ret.length() & 3) > 0) ret.append(b64pad); return ret.toString(); } } import java.math; import java.util; public class RSAEncoder { private static BigInteger n = null; private static BigInteger e = null; public static String RSAEncrypt(String pwd, String nStr, String eStr){ n = new BigInteger(nStr,16); e = new BigInteger(eStr,16); BigInteger r = RSADoPublic(pkcs1pad2(pwd,(n.bitLength()+7)>>3)); String sp = r.toString(16); if((sp.length()&1) ! = 0 ) sp = "0" + sp; return sp; } private static BigInteger RSADoPublic(BigInteger x){ return x.modPow(e, n); } private static BigInteger pkcs1pad2(String s, int n){ if(n < s.length() + 11) { // TODO: fix for utf-8 System.err.println("Message too long for RSAEncoder"); return null; } byte[] ba = new byte[n]; int i = s.length()-1; while(i >= 0 && n > 0) { int c = s.codePointAt(i--); if(c < 128) { // encode using utf-8 ba[--n] = new Byte(String.valueOf(c)); } else if((c > 127) && (c < 2048)) { ba[--n] = new Byte(String.valueOf((c & 63) | 128)); ba[--n] = new Byte(String.valueOf((c > > 6) | 192)); } else { ba[--n] = new Byte(String.valueOf((c & 63) | 128)); ba[--n] = new Byte(String.valueOf(((c >> 6) & 63) | 128)); ba[--n] = new Byte(String.valueOf((c >> 12) | 224)); } } ba[--n] = new Byte("0"); byte[] temp = new byte[1]; Random rdm = new Random(47L); while(n > 2) { // random non-zero pad temp[0] = new Byte("0"); while(temp[0] == 0) rdm.nextBytes(temp); ba[--n] = temp[0]; } ba[--n] = 2; ba[--n] = 0; return new BigInteger(ba); } } IV. Crawling grade and class schedule information   Finally, we are in the interface of the academic system. The next step is to crawl the grade and class schedule information, and then display it in the app we wrote, which works as follows.

But we still have to take it one step at a time:. Get the grade information. Similar to the previous one, we need to submit the form data, and the process is exactly the same, so see this blog post for what data to submit. How do I see what form data needs to be submitted when crawling?
Here is the direct code. // Get the grade information public void getStudentGrade(int year , int term) throws Exception { Map One thing to note: the showCount in the submit parameter should be larger, because by default we only crawl the first page of data and show all the results on the first page to crawl it all at once. Showing results information Detailed tutorial on using Android ExpandableListView (with code parsing process) Reference articles These articles were written by me, and are a bit of a summary of the previous scattered knowledge. JSoup mock login site (using the campus intranet as an example) JSoup uses the obtained cookies to access other links in this page How do I see what form data needs to be submitted when crawling? JSoup carries cookies to jump to multiple screens in a row Java crawler to simply determine if a simulated login is successful (using JSoup as an example) Detailed tutorial on using Android ExpandableListView (with code parsing process) You are welcome to follow my WeChat public number: KI's Algorithm Miscellany, and send private messages directly if you have any questions.
Connection connection2 = Jsoup.connect("https://webvpn.ncepu.edu.cn/users/sign_in"); connection2.header(USER_AGENT, USER_AGENT_VALUE); Response response = connection2.ignoreContentType(true).followRedirects(true).method(Method.POST).data(datas).cookies(res.cookies()). execute();