Act

apify/crawler-results-to-s3

  • Builds
  • latest 0.0.9 / 2018-02-05
  • Created 2017-08-09
  • Last modified 2018-02-05
  • grade 8

Description

Act to upload results from Apify crawler to AWS S3. It is designed to run from crawler finish webhook.


API

To run the act, send a HTTP POST request to:

https://api.apify.com/v2/acts/apify~crawler-results-to-s3/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the act. For more information, read the docs.


Example input

Content type: application/json

{}

Readme

act-crawler-results-to-s3

Apify act to upload results from Apify crawler to AWS S3. It is designed to run from crawler finish webhook.

Usage

For a specific crawler set the following parameters:

Finish webhook URL (finishWebhookUrl)

https://api.apify.com/v2/acts/wLuJuoFw3g3YPgqHf/runs?token=APIFY_API_TOKEN

You can find your API token on your Apify account page.

Finish webhook data (finishWebhookData)

{
  "awsS3Params": {
    "params": {
      "Bucket": "my-bucket"
    },
    "accessKeyId": "JighjGHklkfjh79dfds80",
    "secretAccessKey": "DA4dgweds56hdasdasd",
    "region": "us-west-2"
  },
  "executionResultsParams": {
    "format": "json",
    "simplified": 1
  },
  "itemsPerFile": 1000
}

Note: AWS user must have access to S3 bucket.

Parameters:

awsS3Params - Specifies AWS SDK's S3 constructor parameters used for the upload. Note that AccessKeyId, secretAccessKey and params.Bucket are required.

executionResultsParams - Overwrites Apify crawler execution results API call parameters.

itemsPerFile - Number of web pages to store per file in S3. By default it is 1000.

Files on AWS S3

Act saves files to a specific Bucket with file name: executionId_fileNumber.resultsFormat (e.g: gjGZ6hdj6ZHhs_000000001.json)