Actor

apify/example-process-crawl-results

  • Builds
  • latest 0.0.7 / 2018-04-01
  • Created 2017-11-15
  • Last modified 2018-04-01
  • grade 7

Description

Example act that iterates through all results from a crawler run and counts them. The act shall be called from the crawler's finish webhook. To do so, simply add the following URL to the finish webhook of your crawler: https://api.apify.com/v2/acts/apify~example-process-crawl-results/runs?token=<YOUR_API_TOKEN> You can use this act as a starting point to develop custom post-processing of data from the crawler.


API

To run the actor, send a HTTP POST request to:

https://api.apify.com/v2/acts/apify~example-process-crawl-results/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the actor. For more information, read the docs.


Example input

Content type: application/json; charset=utf-8

{ "_id": "YOUR_CRAWLER_RUN_ID" }

Source code

Based on the apify/actor-node-basic Docker image (see docs).

const Apify = require('apify');
const _ = require('underscore');

Apify.main(async () => {
    // Get act input and validate it
    const input = await Apify.getValue('INPUT');
    console.log('Input:')
    console.dir(input);
    if (!input || !input._id) {
        throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');
    }
    const executionId = input._id;
    
    // Print info about crawler run
    const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });
    if (!crawlerRunDetails) {
        throw new Error(`There is no crawler run with ID: "${executionId}"`);
    }
    console.log(`Details of the crawler run (ID: ${executionId}):`);
    console.dir(crawlerRunDetails);
    
    // Iterate through all crawler results and count them
    // Here is the place where you can add something more adventurous :)
    console.log(`Counting results from crawler run...`);
    
    const limit = 100;
    let offset = 0;
    let totalItems = 0;
    let results;
    
    do {
        results = await Apify.client.crawlers.getExecutionResults({ 
            executionId,
            limit,
            offset
        });
        
        offset += results.count;
        totalItems += results.items.length;
    } while (results.count > 0);
    
    // Save results
    console.log(`Found ${totalItems} records`);
    await Apify.setValue('OUTPUT', {
        crawlerRunDetails,
        totalItems
    });
    
});