Get images from Rich Text Area via APEX

You can with quite a lot of work.

First you will need to parse the text to find the <img> elements and in particular the src attribute of those. (Use e.g. the DOM classes.)

Then use some code that looks like this for each src attribute (I've hard coded and truncated one here):

// This location is an example from my org; will need to be different for your org
String location = 'https://c.na15.content.force.com/servlet/rtaImage?eid=500...';
do {
    HttpRequest req = new HttpRequest();
    req.setEndpoint(location);
    req.setMethod('GET');
    HttpResponse res = new Http().send(req);
    if (res.getStatusCode() == 302) {
        location = res.getHeader('Location');
    } else if (res.getStatusCode() == 200) {
        location = null;
        Blob b = res.getBodyAsBlob();
        // Return the Blob
    } else {
        // Error
    }
} while (location != null);

You will also have to add your equivalent of https://c.na15.content.force.com to the "Remote Site Settings" to allow the HTTP call to be made.


As of Spring '17 release, I could not get Keith's solution to work for me by making Http Callouts. Yes, the callout occurred, but the data I got back after following the 302 redirect sent back a web page wanting me to log in. Even after trying to set the request's Authorization header as discussed in the comments, no luck for me.

However, I did find similar question asked and their solution of using PageReference hitting the URL then using Blob b = page.getContent() method did get me the Blob as desired.

Get Rich Text Image URLs and Blob Data

My first step was to parse all of the <img> tags out of the rich text field so I knew the URLs. This question and solution inspired my logic:

// use reluctant regex to match each image tag individually
// https://docs.oracle.com/javase/tutorial/essential/regex/quant.html
Matcher imgMatcher = Pattern.compile( '<img(.+?)>' ).matcher( record.richTextField__c );

// iterate each image tag found
while ( imgMatcher.find() ) {

    // get the image tag html
    String imageTag = imgMatcher.group();
    System.debug( 'imageTag=' + imageTag );

    // get the value of the src attribute
    // the leading space is significant to avoid other attributes like data-cke-saved-src
    String imageURL = imageTag.substringBetween( ' src="', '"' );
    System.debug( 'imageURL=' + imageURL );

    // if url contained parameters they might be html escaped, unescape them
    // or, more conservatively, replace '&amp;' with '&'
    String decodedURL = imageURL.unescapeHtml4();
    System.debug( 'decodedURL=' + decodedURL );

    // note, as of API 34.0 or later, getContent() is considered an http callout
    // so take that into consideration for your unit tests and governor limits
    // https://developer.salesforce.com/docs/atlas.en-us.pages.meta/pages/apex_System_PageReference_getContent.htm
    PageReference page = new PageReference( decodedURL );
    Blob b = page.getContent();
    System.debug( 'blob=' + b );

    System.debug( 'Enjoy your Blob, save it as a Document, ContentVersion, whatever!' );

    System.debug(''); // I like blank lines in my logs, easier to scan/read =)

}