Getting DOM node text with Puppeteer and headless Chrome

Using await/async and $eval, the syntax looks like the following:

await page.goto('https://stackoverflow.com/users/4794')
const nameElement = await context.page.$eval('h2.user-card-name', el => el.text())
console.log(nameElement)

I found three solutions to this problem, depending on how complicated your extraction is. The simplest option is a related function that I hadn't noticed: page.$eval(). It basically does what I was trying to do: combines page.$() and page.evaluate(). Here's an example that works:

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.$eval('h2.user-card-name', function(heading) {
                return heading.innerText;
            }).then(function(result) {
                console.info(result);
                browser.close();
            });
        });
    });
});

That gives me the expected result:

$ node get_user.js 
Don Kirkby top 2% overall

I wanted to extract something more complicated, but I finally realized that the evaluation function is running in the context of the page. That means you can use any tools that are loaded in the page, and then just send strings and numbers back and forth. In this example, I use jQuery in a string to extract what I want:

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.evaluate("$('h2.user-card-name').text()").then(function(result) {
                console.info(result);
                browser.close();
            });
        });
    });
});

That gives me a result with the whitespace intact:

$ node get_user.js 

                            Don Kirkby

                                top 2% overall

In my real script, I want to extract the text of several nodes, so I need a function instead of a simple string:

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.evaluate(function() {
                return $('h2.user-card-name').text();
            }).then(function(result) {
                console.info(result);
                browser.close();
            });
        });
    });
});

That gives the exact same result. Now I need to add error handling, and maybe reduce the indentation levels.


I use page.$eval

const text = await page.$eval('h2.user-card-name', el => el.innerText );
console.log(text);