Act

mtrunkat/example-hacker-news

  • Builds
  • latest 0.0.9 / 2018-04-05
  • Created 2018-04-05
  • Last modified 2018-04-05
  • grade 5

Description

Example crawler for news.ycombinator.com build using Apify SDK


API

To run the act, send a HTTP POST request to:

https://api.apify.com/v2/acts/mtrunkat~example-hacker-news/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the act. For more information, read the docs.


Example input

Content type: application/json; charset=utf-8

{ "hello": 123 }

Source code

Based on the apify/actor-node-chrome:beta Docker image (see docs).

const Apify = require('apify');

Apify.main(async () => {
    // Get queue and enqueue first url.
    const requestQueue = await Apify.openRequestQueue();
    const enqueueUrl = async url => requestQueue.addRequest(new Apify.Request({ url }));
    await enqueueUrl('https://news.ycombinator.com/');

    // Create crawler.
    const crawler = new Apify.PuppeteerCrawler({
        requestQueue,
        disableProxy: true,

        // This page is executed for each request.
        // If request failes then it's retried 3 times.
        // Parameter page is Puppeteers page object with loaded page.
        handlePageFunction: async ({ page, request }) => {
            console.log(`Request ${request.url} succeeded!`);

            // Extract all posts.
            const data = await page.$$eval('.athing', (els) => {
                return els.map(el => el.innerText);
            });
            
            // Save data.
            await Apify.pushData({
                url: request.url,
                data,
            });
            
            // Enqueue next page.
            try {
                const nextHref = await page.$eval('.morelink', el => el.href);
                await enqueueUrl(nextHref);
            } catch (err) {
                console.log(`Url ${request.url} is the last page!`);
            }
        },

        // If request failed 4 times then this function is executed.
        handleFailedRequestFunction: async ({ request }) => {
            console.log(`Request ${request.url} failed 4 times`);
            
            await Apify.pushData({
                url: request.url,
                errors: request.errorMessages,
            })
        },
    });
    
    // Run crawler.
    await crawler.run();
});