Storage

Apify platform contains 3 storage types that you can use in your act and also outside the Apify platform using HTTP API and Javascript client.

Key-value store

Key value store is simple storage that can be used for string or file (buffer) records.

Basic usage

Each act run has assigned own key-value store containing it's input and possibly output. ID of this key-value store is available under run.defaultKeyValueStoreId.

In act you can use two shorthand methods to save and read records from it's default key-value store - Apify.setValue() [see docs] and Apify.getValue() [see docs]. So to fetch act's INPUT and set OUTPUT value call:

const Apify = require('apify');

Apify.main(async () => {
    // Get input of your act
    const input = await Apify.getValue('INPUT');

    ...

    await Apify.setValue('OUTPUT', imageBuffer, { contentType: 'image/jpeg' });
});

If you want to use other than default key-value store, for example some store that you share between the acts or between the act runs, then you can use Apify.openKeyValueStore() [see docs]:

const store = await Apify.openKeyValueStore('some-name');

const value = await store.getValue('some-value-key');

API and Javascript client

Key-value store also provides HTTP API to manage key-value stores and their records. If you are developing a Node.js application then you can also use Apify JavaScript client.

Dataset

The dataset is a storage that enables saving and retrieval of sequential data objects — typically results of some long running operation such as scraping or data extraction. Dataset is immutable - ie. data can be only added and cannot be changed.

Basic usage

Each act run has assigned own dataset, which is created when first item is stored into it. ID of this dataset is available under run.defaultDatasetId.

In act you can use a shorthand methods to save items into default dataset - Apify.pushData() [see docs].

const Apify = require('apify');

Apify.main(async () => {
    // Put one item into the dataset:
    await Apify.pushData({ foo: 'bar' });

    // Put multiple items into the dataset:
    await Apify.pushData([
        { foo: 'hotel' },
        { foo: 'restaurant' },
    ]);
});

If you want to use other than default dataset, for example some dataset that you share between the acts or between the act runs, then you can use Apify.openDataset() [see docs]:

const dataset = await Apify.openDataset('some-name');

await dataset.pushData({ foo: 'bar' });

API and Javascript client

Dataset provides HTTP API to manage datasets and to add/retrieve their data. If you are developing a Node.js application then you can also use Apify JavaScript client.

Request queue

The request queue is a storage type that enables enqueueing and retrieval of requests (ie. urls with HTTP method and other parameters). This is usefull not only for web crawling but anywhere you need to process high number of urls and to be able to enqueue newly find links on on the way.

Basic usage

Each act run has assigned own request queue, which is created when first request is added into it. ID of this request queue is available under run.defaultRequestQueueId. You can also create a named queue which can be shared between the acts or between the runs.

To open a request queue use Apify.openRequestQueue() [see docs].

const Apify = require('apify');

Apify.main(async () => {
    // Open default request queue of the run:
    const queue1 = await Apify.openRequestQueue();

    // Open request queue with name "my-queue":
    const queue2 = await Apify.openRequestQueue('my-queue');
});

If queue is opened then you can use it:

// Add requests to queue
await queue.addRequest(new Apify.Request({ url: 'http://example.com/aaa'});
await queue.addRequest(new Apify.Request({ url: 'http://example.com/bbb'});
await queue.addRequest(new Apify.Request({ url: 'http://example.com/foo/bar'}, { forefront: true });

// Get requests from queue
const request1 = queue.fetchNextRequest();
const request2 = queue.fetchNextRequest();
const request3 = queue.fetchNextRequest();

// Mark some of them as handled
queue.markRequestHandled(request1);

// If processing fails then reclaim it back to the queue
queue.reclaimRequest(request2);

API and Javascript client

Request queue provides HTTP API to manage queues and to add/retrieve requests. If you are developing a Node.js application then you can also use Apify JavaScript client.