screen scrape Salesforce with REST GET call from Apex

Please don't screen-scrape - it's just about the most fragile integration you can imagine. With the release of the Analytics API, it's also now largely unnecessary

Having said that, the Authorization HTTP header only works with API requests. For web pages like /001/o or /home/home.jsp you need to set the sid cookie instead. For example,

String requestUrl = '/001/o';
Http http = new Http();
HttpRequest req = new HttpRequest();
req.setEndpoint(URL.getSalesforceBaseUrl().toExternalForm() + requestUrl);
req.setMethod('GET');
req.setHeader('Cookie','sid='+UserInfo.getSessionId()); 

HTTPResponse res = http.send(req);
String output = res.getBody();
System.debug(output);

Yields

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html class="ext-strict"><head><script type="text/javascript" src="/jslibrary/1351189248000/sfdc/JiffyStubs.js"></script>
<title>Accounts: Home ~ salesforce.com - Developer Edition</title>
<!-- LOTS MORE VALID PAGE DATA -->

Amazing help, Pat(@metadaddy)... this has allowed me to create some marvelous personal developer toys, like a recursive Show All Dependencies!

However, like Simon(@superfell) said, this is not only horribly unsupported and incredibly hard to maintain, but the implementation of this will most likely occur through heart-sinking Pattern and Matcher use, or unimaginable String search methods... since the page most likely won't successfully load into a Dom.Document out-of-the-box.

However, if you can survive without having this be a headless process (and you're really a glutton for punishment), may I suggest approaching this inside a VisualForce page using JavaScript? DOM access will make whatever you're going to do SO much easier.... and after you conquer the hurdle of learning Salesforce page drawing patterns and naming conventions, you'll be in much better shape than from server side coding.

Tags:

Apex

Rest Api