Storage

Key-value store

Key value store is simple storage that can be used for string or file (buffer) records.

Basic usage

Key-value store provides HTTP API to list, get and put records into the store. If you are developing using Node.js or developing act you can also use Apify JavaScript client.

1. Using Javascript Client
const ApifyClient = require('apify-client');

// Initiate Apify client with your API token.
const apifyClient = new ApifyClient({
    token: '[YOUR_API_TOKEN]',
});
const keyValueStores = apifyClient.keyValueStores;

// Get or create store with given name:
const store = await keyValueStores.getOrCreateStore({ storeName: 'my-store' });

// Set obtained store ID to be used in all following commands:
apifyClient.setOptions({ storeId: store.id });

// Put some record into the store:
await keyValueStores.putRecord({
     key: 'my-json-record',
     body: JSON.stringify({ "foo": "bar" }),
     contentType: 'application/json; charset=utf-8',
});
await keyValueStores.putRecord({
     key: 'my-text-record',
     body: 'This record contains plain text!',
     contentType: 'text/plain; charset=utf-8',
});

// Get record value from store.
const record = await keyValueStores.getRecord({ key: 'my-json-record' });

Resulting record will be following object:

{
  "contentType": "application/json; charset=utf-8",
  "body": {
    "foo": "bar"
  }
}

Fore more detailed information check HTTP API Apify JavaScript client documentation.

2. Using HTTP API

Send following HTTP POST request to get or create store with given name:

POST /v2/key-value-stores?token=[YOUR_API_TOKEN]&name=my-store HTTP/1.1
Content-Type: application/json; charset=utf-8
Host: api.apify.com

Response is JSON containing store ID:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Location: https://api.apify.com/v2/key-value-stores/yyJuWzcZvJpMt6nsH

{
  "data": {
    "id": "yyJuWzcZvJpMt6nsH",
    "name": "my-store",
    "userId": "9jwB4uQBWJQCKDkMi",
    "createdAt": "2017-10-09T08:03:52.351Z",
    "modifiedAt": "2017-10-09T08:03:52.351Z",
    "accessedAt": "2017-10-09T08:03:52.351Z"
  }
}

Then you can use store ID to put records into the store with following HTTP PUT requests:

PUT /v2/key-value-stores/yyJuWzcZvJpMt6nsH/records/my-json-record HTTP/1.1
Content-Type: application/json
Host: api.apify.com

{ "foo": "bar" }
PUT /v2/key-value-stores/yyJuWzcZvJpMt6nsH/records/my-text-record HTTP/1.1
Content-Type: text/plain
Host: api.apify.com

This record contains plain text!

And read record from store using following HTTP GET request:

GET /v2/key-value-stores/yyJuWzcZvJpMt6nsH/records/my-json-record HTTP/1.1
Accept-Encoding: gzip
Host: api.apify.com

Response is record value with appreciate Content-Type header:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
        
{ "foo": "bar" }

Fore more detailed information check API V2 Reference.

Use in actor

Each act run has assigned own key-value store containing it's input and possibly output. ID of this key-value store is available under run.defaultKeyValueStoreId.

In act you can use two shorthand methods to save and read records from it's default key-value store - Apify.setValue() [see docs] and Apify.getValue() [see docs]. So to fetch act's INPUT and set OUTPUT value call:

const Apify = require('apify');

Apify.main(async () => {
    // Get input of your act
    const input = await Apify.getValue('INPUT');

    ...

    await Apify.setValue('OUTPUT', output);
});

If you want to use other than default key-value store, for example some store that you share between the acts or between the act runs, then you have instance of Apify JavaScript client preconfigured with your API token available at Apify.client:

...

const record = await Apify.client.keyValueStores.getRecord({
    storeId: 'yyJuWzcZvJpMt6nsH',
    key: 'my-redord',
});

...

Dataset

The dataset is a storage that enables saving and retrieval of sequential data objects — typically results of some long running operation such as craping or data extraction.

Basic usage

Dataset provides HTTP API to list, get and put items into the dataset. If you are developing using Node.js or developing act then you can also use Apify JavaScript client.

1. Using Javascript Client
const ApifyClient = require('apify-client');

// Initiate Apify client with your API token.
const apifyClient = new ApifyClient({
    token: '[YOUR_API_TOKEN]',
});
const datasets = apifyClient.datasets;

// Get or create dataset with given name:
const dataset = await datasets.getOrCreateDataset({ datasetName: 'my-dataset' });

// Set obtained dataset ID to be used in all following commands:
apifyClient.setOptions({ datasetId: dataset.id });

// Put some items into the dataset:
await dataset.putItems({ foo: 'bar' });
await dataset.putItems({ hello: ['world', 'universe'] });

// Put multiple items in one API call:
await dataset.putItems([
    { foo: 'hotel' },
    { foo: 'restaurant' },
]);

// Get items from dataset.
const items = await datasets.getItems();

Resulting items will be following array:

[
    { foo: 'bar' },
    { hello: ['world', 'universe'] },
    { foo: 'hotel' },
    { foo: 'restaurant' },
]

Fore more detailed information check Apify JavaScript client documentation.

2. Using HTTP API

Send following HTTP POST request to get or dataset with given name:

POST /v2/datasets?token=[YOUR_API_TOKEN]&name=my-dataset HTTP/1.1
Content-Type: application/json; charset=utf-8
Host: api.apify.com

Response is a JSON object representing dataset and containing dataset ID:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Location: https://api.apify.com/v2/datasets/yyJuWzcZvJpMt6nsH

{
  "data": {
    "id": "yyJuWzcZvJpMt6nsH",
    "name": "my-dataset",
    "userId": "9jwB4uQBWJQCKDkMi",
    "createdAt": "2017-10-09T08:03:52.351Z",
    "modifiedAt": "2017-10-09T08:03:52.351Z",
    "accessedAt": "2017-10-09T08:03:52.351Z",
    "itemsCount": "0",
    "inflatedBytes": "0",
  }
}

Then you can use dataset ID to put items into the dataset with following HTTP POST requests:

POST /v2/datasets/yyJuWzcZvJpMt6nsH/items HTTP/1.1
Content-Type: application/json
Host: api.apify.com

{ "foo": "bar" }
POST /v2/datasets/yyJuWzcZvJpMt6nsH/items HTTP/1.1
Content-Type: application/json
Host: api.apify.com

{ "hello": ["world", "universe"] }

You can also put multiple items on one call as an array:

POST /v2/datasets/yyJuWzcZvJpMt6nsH/items HTTP/1.1
Content-Type: application/json
Host: api.apify.com

[{ "foo": "hotel" }, { "foo": "restaurant" }]

And read items from dataset using following HTTP GET request:

GET /v2/datasets/yyJuWzcZvJpMt6nsH/items HTTP/1.1
Accept-Encoding: gzip
Host: api.apify.com

Response is a JSON array of items:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
        [
    { "foo": "bar" },
    { "hello": ["worls", "universe"] },
    { "foo": "hotel" },
    { "foo": "restaurant" }
]

Fore more detailed information and explanation how to get items in other formats check API V2 Reference.

Use in actor

Each act run has assigned own dataset, which is created when first item is stored into it. ID of this dataset is available under run.defaultDatasetId.

In act you can use a shorthand methos to save items into default dataset - Apify.pushData() [see docs].

const Apify = require('apify');

Apify.main(async () => {
    // Put one item into the dataset:
    await Apify.pushData({ foo: 'bar' });

    // Put multiple items into the dataset:
    await Apify.pushData([
        { foo: 'hotel' },
        { foo: 'restaurant' },
    ]);
});

If you want to use other than default dataset, for example some dataset that you share between the acts or between the act runs, then you have instance of Apify JavaScript client preconfigured with your API token available at Apify.client:

...

await Apify.client.datasets.putItems({
    datasetId: 'yyJuWzcZvJpMt6nsH',
    data: { foo: 'bar' }
});

...