Act

petr_cermak/executions-compare

  • Builds
  • latest 0.0.86 / 2018-03-28
  • Created 2017-11-21
  • Last modified 2018-02-28
  • grade 12

Description

Act for comparing crawler execution results. By default the final result set will contain only new and updated records.


API

To run the act, send a HTTP POST request to:

https://api.apify.com/v2/acts/petr_cermak~executions-compare/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the act. For more information, read the docs.


Example input

Content type: application/json; charset=utf-8

{
  "oldExec": OLD_EXECUTION_ID,
  "newExec": NEW_EXECUTION_ID,
  "idAttr": ID_ATTRIBUTE_NAME,
  "return": WHICH_RECORDS_TO_RETURN,
  "addStatus": ADD_TEXT_STATUS
}

Readme

act-executions-compare

Apify act for comparing crawler execution results

This act fetches results from two crawler executions ("old" and "new"), compares them and creates a new result set based on the act settings. By default the final result set will contain only new and updated records.

INPUT

Input is a JSON object with the following properties:

{
  "oldExec": OLD_EXECUTION_ID,
  "newExec": NEW_EXECUTION_ID,
  "idAttr": ID_ATTRIBUTE_NAME,
  "return": WHICH_RECORDS_TO_RETURN,    // optional, default: "new, updated"
  "addStatus": ADD_TEXT_STATUS,         // optional, default: false
  "statusAttr": STATUS_ATTR_NAME,       // optional, default: "status"
  "addChanges": ADD_CHANGE_INFO,        // optional, default: false
  "changesAttr": CHANGES_ATTR_NAME,     // optional, default: "changes"
  "updatedIf": [                        // optional, column list
    "column_1",
    "column_2",
    ...
  ],
  "useDataset": USE_DATASET_STORE       // optional, default: false
}

The idAttr parameter is a name of an attribute of each record, that will be used as it's ID.
The return parameter can be used to tell the act which records to include in the final result set. Possible values are new, updated, deleted and unchanged, you can provide more than one separated by comma.
The addStatus parameter sets if the act should add a status attribute to each of the resulting records. If true, it's value will be one of NEW, UPDATED, DELETED or UNCHANGED, depending on the value of return parameter.
The statusAttr parameter overrides the default status column name, where the status will be stored.
The addChanges parameter tells the act to include a list of columns that contained changes. This list will be added to a new changes column.
The changesAttr parameter overrides the default changes column name, where the changes will be stored.
The updatedIf parameter can contain an array of column names. If set, the record will be recognized as UPDATED if and only if there was a change in one of those columns. If addChanges is set to true, the changes array will contain the column names that had changes and are also present in the updatedIf array.
The useDataset parameter sets whether the result will be stored in an Apify dataset or in key-value store under the OUTPUT key.

This act can also be run from a crawler webhook, in that case the current execution will be compared with directly preceding execution (unless overridden). To use this act from a webhook, use the Finish webhook data in crawler advanced settings to set up the act.

Example webhook data:

{
  "idAttr": ID_ATTRIBUTE_NAME,
  "return": WHICH_RECORDS_TO_RETURN,
  "addStatus": ADD_TEXT_STATUS,
  "addChanges": ADD_CHANGE_INFO,
  ...
}

If you want to compare the current execution with a specific execution (not the one directly preceding), you can use oldExec parameter to override.