crawler-results-to-s3

apify/crawler-results-to-s3

Act to upload results from Apify crawler to AWS S3. It is designed to run from crawler finish webhook.

Modified
Last run
Used 2571 times

act-crawler-results-to-s3

Apify act to upload results from Apify crawler to AWS S3. It is designed to run from crawler finish webhook.

Usage

For a specific crawler set the following parameters:

Finish webhook URL (finishWebhookUrl)

https://api.apify.com/v2/acts/wLuJuoFw3g3YPgqHf/runs?token=APIFY_API_TOKEN

You can find your API token on your Apify account page.

Finish webhook data (finishWebhookData)

{
  "awsS3Params": {
    "params": {
      "Bucket": "my-bucket"
    },
    "accessKeyId": "JighjGHklkfjh79dfds80",
    "secretAccessKey": "DA4dgweds56hdasdasd",
    "region": "us-west-2"
  },
  "executionResultsParams": {
    "format": "json",
    "simplified": 1
  },
  "itemsPerFile": 1000
}

Note: AWS user must have access to S3 bucket.

Parameters:

awsS3Params - Specifies AWS SDK's S3 constructor parameters used for the upload. Note that AccessKeyId, secretAccessKey and params.Bucket are required.

executionResultsParams - Overwrites Apify crawler execution results API call parameters.

itemsPerFile - Number of web pages to store per file in S3. By default it is 1000.

Files on AWS S3

Act saves files to a specific Bucket with file name: executionId_fileNumber.resultsFormat (e.g: gjGZ6hdj6ZHhs_000000001.json)