Python and Selenium - get text excluding child node's text

You can remove the child node text from the all text

all_text = driver.find_element_by_xpath("//whatever").text
child_text = driver.find_element_by_xpath("//subchild").text

parent_text = all_text.replace(child_text, '')

Bear in mind that the replacement approach mentioned by @Guy doesn't work for many structures.

For instance, having this structure:

<div>
    Hello World
    <b>e</b>
</div>

The parent text would be Hello World e, the child text would be e, and the replacement would result in Hllo World instead of Hello World.

A safe solution

To get the own text of an element in a safe manner, you have to iterate over the children of the node, and concat the text nodes. Since you can't do that in pure Selenium, you have to execute JS code.

OWN_TEXT_SCRIPT = "if(arguments[0].hasChildNodes()){var r='';var C=arguments[0].childNodes;for(var n=0;n<C.length;n++){if(C[n].nodeType==Node.TEXT_NODE){r+=' '+C[n].nodeValue}}return r.trim()}else{return arguments[0].innerText}"
parent_text = driver.execute_script(OWN_TEXT_SCRIPT, elem)

The script is a minified version of this simple function:

if (arguments[0].hasChildNodes()) {
    var res = '';
    var children = arguments[0].childNodes;
    for (var n = 0; n < children.length; n++) {
        if (children[n].nodeType == Node.TEXT_NODE) {
            res += ' ' + children[n].nodeValue;
        }
    }
    return res.trim()
}
else {
    return arguments[0].innerText
}

I had similar problem recently, where selenium always gave me all the text inside the element including the spans. I ended up splitting the string with newline "\n". for e.g.

all_text = driver.find_element_by_xpath(xpath).text
req_text = str.split(str(all_text ), "\n")[0]

Python and Selenium - get text excluding child node's text

Tags:

Python

Python 3.X

Selenium

Selenium Webdriver

Related

Recent Posts