Suppose you have a website which is up and running. You can create a native android application for your website by parsing html content from your web page into your app. This technique is generally called android web scraping. In android we have one cool library for web scraping - THE JSOUP LIBRARY
jsoup is an efficient html parser libary. jsoup consists of a class called Elements for representing a list of nodes. The elements class implements iterable which enables us to iterate it over a for loop. This is one reason why jsoup becomes a popular choice while considering android web scraping.
First thing that you would require is the gradle dependency for jsoup, add it to your app's build.gradle
Some usefull classes provided by jsoup for handling html responses easily are
Here,
Now we can easily parse the table rows from doc element simply like following
Now we have rows which contains the list of all rows within the selected table element. You can now loop through each row and get values of each column element like this
Congrats! you have now learned the basics of android web scraping using jsoup library.
jsoup is an efficient html parser libary. jsoup consists of a class called Elements for representing a list of nodes. The elements class implements iterable which enables us to iterate it over a for loop. This is one reason why jsoup becomes a popular choice while considering android web scraping.
First thing that you would require is the gradle dependency for jsoup, add it to your app's build.gradle
compile 'org.jsoup:jsoup:1.10.1'
Some usefull classes provided by jsoup for handling html responses easily are
- Document - load the entire web page into a document object, which can be then queried upon by using the select()
- Elements - save the contents of a particular tag(or cssQuery)
Before it gets boring let us take a dive into the coding part. First of all, create a new thread to perform the network call. Then make the network request to the web page using Jsoup.connect() as shown below
new Thread(new Runnable() { @Override public void run() { final StringBuilder builder = new StringBuilder(); try { Document doc = Jsoup.connect("add_webpage_url_here").get(); String title = doc.title(); Elements links = doc.select("a[href]"); builder.append(title).append("\n"); Element table = doc.select("table").get(0); Elements rows = table.select("tr"); } catch (IOException e) { bus_progress.setVisibility(View.GONE); builder.append("Error : ").append(e.getMessage()).append("\n"); } runOnUiThread(new Runnable() { @Override public void run() { } }); } }).start();
Here,
- doc.title() gives the title of the requested web page
- doc.select("a[href]") gives list of all the links in the web page
<table id="bus-timing-chart" class="table table-responsive table-striped bus-timing-chart"> <thead> <tr> <th>From</th> <th>Via</th> <th>To</th> <th>Arrival</th> <th>Departure</th> <th>Bay</th> <th>Bus Name</th> </tr> </thead> <tbody> <tr> <td align="left" valign="top">test 1</td> <td align="left" valign="top">test 2</td> <td align="left" valign="top">test 3</td> <td align="left" valign="top" style="width:9%; ">test 4</td> <td align="left" valign="top" style="width:9%; ">test 5</td> <td align="left" valign="top">test 6</td> <td align="left" valign="top">test 7</td> </tr> <tr> <td align="left" valign="top">test 1</td><td align="left" valign="top">test 2</td> <td align="left" valign="top">test 3</td> <td align="left" valign="top" style="width:9%; ">test 4</td> <td align="left" valign="top" style="width:9%; ">test 5</td> <td align="left" valign="top">test 6</td><td align="left" valign="top">test 7</td> </tr> <tr>
. . .
Now we can easily parse the table rows from doc element simply like following
Element table = doc.select("table").get(0); Elements rows = table.select("tr");
Now we have rows which contains the list of all rows within the selected table element. You can now loop through each row and get values of each column element like this
for (int i = 1; i < rows.size()-1; i++) { String from_place=rows.get(i).getElementsByTag("td").get(0).toString(); // getElementsByTag("td").get(0) gives the first row in td element(i.e,test 1) }
Congrats! you have now learned the basics of android web scraping using jsoup library.
No comments:
Post a Comment