Actor

mtrunkat/crawler-timeline

  • Builds
  • latest 0.0.36 / 2018-04-23
  • legacy 0.0.27 / 2017-11-02
  • Created 2017-10-27
  • Last modified 2018-09-05
  • grade 4

Description

This act creates a timeline spreadsheet from crawler results. Main use-case is to create a spreadsheet containing changes of some web page in time.


API

To run the actor, send a HTTP POST request to:

https://api.apify.com/v2/acts/mtrunkat~crawler-timeline/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the actor. For more information, read the docs.


Example input

Content type: application/json; charset=utf-8

{ 
    "_id": "PLKF4jLKkFD3o49xZ",
    "actId": "nLohN5kGnNPK9tuRa"
}

Readme

Apify Crawler Timeline

This act creates a timeline spreadsheet from crawler results. Main usecase is to create a spreadsheet containing changes of some web page in time.

Crawler has to satisfy:

  • Returns exactly one page
  • Page function result for that page is simple object

For example if output of your crawler is ...

{
   "someString": "something",
   "someNumber": 123.456
}

... and you set this act as finish webhook of that crawler then it creates a key-value store with the name of your crawler and generates following table there in json and csv formats:

Date someString someNumber
5.6.2017 22:00:00 some value at that date 123.456
5.6.2017 23:00:00 some other value 42
6.6.2017 00:00:00 some other value 2

On each run of your crawler the table gets updated.

Webhook to execute this act from crawler is following url: https://api.apify.com/v2/acts/mtrunkat~crawler-timeline/runs?token=[YOUR_API_TOKEN]

As webhook data you can pass json containing 2 properties:

Name Type Description
dontAddNewKeys boolean If true then new keys found after the first execution are not added into the spreadsheet.
maxRowsPerPage integer If set then if max number of rows is exceeded then new spreadsheet gets created.

Example webhook data:

{
    "dontAddNewKeys": true,
    "maxRowsPerPage": 10000
}