Act

petr_cermak/crawl-manager

  • Builds
  • latest 0.0.2 / 2017-12-14
  • Created 2017-12-14
  • Last modified 2018-05-22
  • grade 7

Description

This actor runs a list of Apify Crawlers in an optimal manner, i.e. it runs as many of them in parallel as possible.


API

To run the act, send a HTTP POST request to:

https://api.apify.com/v2/acts/petr_cermak~crawl-manager/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the act. For more information, read the docs.


Example input

Content type: application/json; charset=utf-8

{
    "parallel": "N_OF_RUNNING",
    "finalWebhook": "FINAL_WEBHOOK",
    "crawlers": [
        {
            "id": "CRAWLER_ID",
            "settings": "CRAWLER_SETTINGS"
        },
        ...
    ]
}

Readme

act-crawl-manager

Apify act for running a list of crawlers in an optimal manner.

This act takes a list of crawlers and runs them in parallel. It always tries to run as many of the crawlers as possible, until all of them are finished. You can limit the maximum number of crawlers running in parallel.

INPUT

Input is a JSON object with the following properties:

{
    // maximum number of crawlers running in parallel
    "parallel": N_OF_RUNNING,

    // final webhook
    "finalWebhook": FINAL_WEBHOOK,

    //list of crawlers
    "crawlers": [
        {
            "id": CRAWLER_ID,
            "settings": CRAWLER_SETTINGS
        },
        ...
    ]
}

If you set the "finalWebhook" attribute, when all of the crawlers finish a POST request will be sent to the "finalWebhook" URL. The body of the request will be as follows:

{  
    //list of finished executions
    "executionIds": [
        EXECUTION_ID_1,
        EXECUTION_ID_2,
        ...
    ]
}

This JSON will also be saved as the act's OUTPUT value.