stripHtmlTags removes new line \n characters. Bug or feature?

Seems like its desired functionality.

stripHtmlTags removes all the formatting by HTML, \n also generates some white space. so maybe due to that it removes it.

One workaround would be to replace backslash or other desired character with a specific ones, then after calling stripHtmlTags(), restore the characters.

String backlash = '\n';
String backlashReplacement = '---n';

String test = '--&nbsp;\n\n<b>test</b>\n\n--';

test = test.replaceAll(backlash, backlashReplacement);

System.debug( test );

test = test.stripHtmlTags();

test = test.replaceAll(backlashReplacement, backlash);

System.debug( test );

I believe is expected functionality. Since you are using \r \n \t in code it's converting it to the proper html on output. So when you strip them using String.stripHtmlTags()it's going to strip all html tags.

If you're displaying this text you could use <apex:outputPanel escapeHTML="true"> then the formatting will stay the same, and you won't have to use stripHtmlTags() and the formatting will stay the same


It seems to me that you have a couple of options here. The easiest would seem to be to use RegEx along with a pattern matcher class where the capture groups to use would look something like (/i)((<br>)(/n)(/t)(/r)).

You could also look for the indexes of those characters and/or groups, then find strings in between them.

Assuming string S contains the html to be stripped, perhaps a more direct method, but somewhat messy solution, would be along the lines of the following:

string S; // contains the string to operate on

S.replace(<BR>, /r); // for upper case
S.replace(<br>, /r); // for lower case

list<integer>indexR = new list<integer>();
indexR.add(S.indexOfIgnoreCase(/r));

list<integer>indexN = new list<integer>();
indexN.add(S.indexOfIgnoreCase(/n));

list<integer>indexT = new list<integer>();
indexT.add(S.indexOfIgnoreCase(/t));

for(integer i=indexR[0]+1,i<S.length(),i++){
    indexR.add(S.indexOfIgnoreCase(S, /r));

    if(indexR[i] = -1){ //no match found
       indexR[i] = S.length() +1; // set value beyond last index 
       i = S.length(); // causes loop to end 
    }    
}

for(integer i=indexN[0]+1,i<S.length(),i++){
    indexN.add(S.indexOfIgnoreCase(S, /n));

    if(indexN[i] = -1){ //no match found
       indexN[i] = S.length() +1; // set value beyond last index
       i = S.length(); // causes loop to end
    }   
}

for(integer i=indexN[0]+1,i<S.length(),i++){
    indexT.add(S.indexOfIgnoreCase(S, /t));

    if(indexT[i] = -1){ //no match found
       indexT[i] = S.length() +1; // set value beyond last index 
       i = S.length(); // causes loop to end
    }   
}

string RS = S.split(((/n)(/t)(/r))); 
// this is a regular expression inside the outer parens
// splits into list on any of the above

map<integer,string>indexToChar = new map<integer,string>();

for(i=0,i<S.length(),i++)
    if(indexN[i] = i){
        indexToChar.put(i,'/n');
    }
    if(indexR[i] = i){
        indexToChar.put(i,'/r');
    }
    if(indexT[i] = i){
        indexToChar.put(i,'/t');
    }
}

String final = '';
integer count = 0;

for(string a:RS){
    a.stripHtmlTags;
    if(indexToChar.keyset().contains(count){
        a += indexToChar.get(count);
    } 

    final += a;
    count ++;
}

final is the string stripped of html with other characters preserved.

Tags:

Apex