How to download search results on google scholar using r?

I can't speak to the legalities of your task, but there are a few ways you can go about this. While I am not strong in XPath, it might be the best way. I believe that you can use the XML package to retrieve the page contents and use XPath to extract the data of the elements you need.

For instance, I use Chrome for a browser, and when I inspected the page with Developer Tools, there does appear to be a structure to the page, with the data "hidden" inside various tags that should you be able to exploit really easily using XPath.

Check out this link for an example of using XPath.

HTH and Good Luck


please consider the updated biobucket-post:

http://thebiobucket.blogspot.com/2011/11/r-function-google-scholar-webscraper.html


There are some Python and Perl scrapers out there that you might be able to adapt, linked at http://bmb-common.blogspot.com/2011/02/does-google-scholar-suck-or-am-i-just.html