android - get Text out of webview

The only thing that comes to my mind in this case is to use javascript. Doing a quick search I found android.webkit.WebView.addJavascriptInterface.

You want to study the "addJavascriptInterface" which in the end will help you solve the problem


The solution provided above provides the text using innerText property which will return you all the text in the webView. The solution that I propose below will help you extract the text from visible part of the webView on the screen.

Step 1: It requires the help of javaScript, hence first enable the javascript.

webView.addJavascriptInterface(new IJavascriptHandler(getActivity().getApplicationContext()),     "Android"); //if your class extends a Fragment class

or

view.addJavascriptInterface(new IJavascriptHandler(this), "Android"); //if your class extends Activity.

Step 2: Create a javaInterface inner class.

final class IJavascriptHandler {

    Context mContext;
    IJavascriptHandler(Context c) {
    mContext = c;
}

//API 17 and higher required you to add @JavascriptInterface as mandatory before your method.   
@JavascriptInterface 
public void processContent(String aContent) 
{ 
   //this method will be called from within the javascript method that you will write.
   final String content = aContent;
   Log.e("The content of the current page is ",content);
} 
}

Step 3: Now you have to add the javascript method. You'll write the method as a string and then load it. The method returns the text based on the parameter provided to it. So, you would need 2 strings. One will load the javascript method and the other will call it.

Method to load the javascript method.

String javaScriptToExtractText = "function getAllTextInColumn(left,top,width,height){"
                +   "if(document.caretRangeFromPoint){"
                +   "var caretRangeStart = document.caretRangeFromPoint(left, top);"
                +   "var caretRangeEnd = document.caretRangeFromPoint(left+width-1, top+height-1);"
                +   "} else {"
                +   "return null;"
                +   "}"
                +   "if(caretRangeStart == null || caretRangeEnd == null) return null;"
                +   "var range = document.createRange();"
                +   "range.setStart(caretRangeStart.startContainer, caretRangeStart.startOffset);"
                +   "range.setEnd(caretRangeEnd.endContainer, caretRangeEnd.endOffset);"
                +   "return range.toString();};";

Method to call the above function.

String javaScriptFunctionCall = "getAllTextInColumn(0,0,100,100)";

//I've provided the parameter here as 0,0 i.e the left and top offset and then 100, 100 as width and height. So, it'll extract the text present in that area.

Step 4: Now, you need to load the above 2 javascripts.

webView.loadUrl("javascript:"+ javaScriptToExtractText);
//this will load the method.


view.loadUrl("javascript:window.Android.processContent("+javaScriptFunctionCall+");");
//this will call the loaded javascript method.

Enjoy.


Java:

    wvbrowser.evaluateJavascript(
        "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
         new ValueCallback<String>() {
            @Override
            public void onReceiveValue(String html) {
                Log.d("HTML", html); 
                // code here
            }
    });

Kotlin:

web_browser.evaluateJavascript("(function() { return ('<html>'+document.getElementsByTagName('span')[0].innerText+'</html>'); })();")
 { html ->
   Toast.makeText(this@Your_activity, html, Toast.LENGTH_SHORT).show()
   // code here
                }

Getting the plain text content from a webview is rather hard. Basically, the android classes don't offer it, but javascript does, and Android offers a way for javascript to pass the information back to your code.

Before I go into the details, do note that if your html structure is simple, you might be better off just parsing the data manually.

That said, here is what you do:

  1. Enable javascript
  2. Add your own javascript interface class, to allow the javascript to communicate with your Android code
  3. Register your own webviewClient, overriding the onPageFinished to insert a bit of javascript
  4. In the javascript, acquire the element.innerText of the tag, and pass it to your javascript interface.

To clarify, I'll post a working (but very rough) code example below. It displays a webview on the top, and a textview with the text-based contents on the bottom.

package test.android.webview;

import android.app.Activity;
import android.os.Bundle;
import android.webkit.WebView;
import android.webkit.WebViewClient;
import android.widget.TextView;

public class WebviewTest2Activity extends Activity {
    /** Called when the activity is first created. */
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        WebView webView = (WebView) findViewById(R.id.webView);
        TextView contentView = (TextView) findViewById(R.id.contentView);

        /* An instance of this class will be registered as a JavaScript interface */ 
        class MyJavaScriptInterface 
        { 
            private TextView contentView;

            public MyJavaScriptInterface(TextView aContentView)
            {
                contentView = aContentView;
            }

            @SuppressWarnings("unused") 

            public void processContent(String aContent) 
            { 
                final String content = aContent;
                contentView.post(new Runnable() 
                {    
                    public void run() 
                    {          
                        contentView.setText(content);        
                    }     
                });
            } 
        } 

        webView.getSettings().setJavaScriptEnabled(true); 
        webView.addJavascriptInterface(new MyJavaScriptInterface(contentView), "INTERFACE"); 
        webView.setWebViewClient(new WebViewClient() { 
            @Override 
            public void onPageFinished(WebView view, String url) 
            { 
                view.loadUrl("javascript:window.INTERFACE.processContent(document.getElementsByTagName('body')[0].innerText);"); 
            } 
        }); 

        webView.loadUrl("http://shinyhammer.blogspot.com");
    }
}

Using the following main.xml:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="fill_parent"
    android:layout_height="fill_parent"
    android:orientation="vertical" >

    <WebView
        android:id="@+id/webView"
        android:layout_width="match_parent"
        android:layout_height="fill_parent"
        android:layout_weight="0.5" />

    <TextView
        android:id="@+id/contentView"
        android:layout_width="match_parent"
        android:layout_height="fill_parent"
        android:layout_weight="0.5" />


</LinearLayout>