Storage

The Apify platform contains three storage types that you can use in your actors and also outside the Apify platform using HTTP API and JavaScript client.

Key-value store

The key-value store is simple storage that can be used for string or file (buffer) records.

Basic usage

Each actor run is assigned its own key-value store containing its input and possibly output. The ID of this key-value store is available under run.defaultKeyValueStoreId.

In an actor you can use two shorthand methods to save and read records from its default key-value store - Apify.setValue() [see docs] and Apify.getValue() [see docs]. So to fetch an actor's INPUT and set OUTPUT value, call:

const Apify = require('apify');

Apify.main(async () => {
    // Get input of your actor
    const input = await Apify.getValue('INPUT');

    ...

    await Apify.setValue('OUTPUT', imageBuffer, { contentType: 'image/jpeg' });
});

If you want to use something other than the default key-value store, e.g. some store that you share between actors or between actor runs, then you can use Apify.openKeyValueStore() [see docs]:

const store = await Apify.openKeyValueStore('some-name');

const value = await store.getValue('some-value-key');

API and JavaScript client

The key-value store also provides a HTTP API to manage key-value stores and their records. If you are developing a Node.js application then you can also use the Apify JavaScript client.

Dataset

The dataset is storage that enables the saving and retrieval of sequential data objects - typically the results of some long-running operation such as scraping or data extraction. The dataset is immutable - i.e. data can only be added and cannot be changed.

Basic usage

Each actor run is assigned its own dataset, created when the first item is stored to it. The ID of this dataset is available under run.defaultDatasetId.

In your actor you can use shorthand methods to save items into the default dataset - Apify.pushData() [see docs].

const Apify = require('apify');

Apify.main(async () => {
    // Put one item into the dataset:
    await Apify.pushData({ foo: 'bar' });

    // Put multiple items into the dataset:
    await Apify.pushData([
        { foo: 'hotel' },
        { foo: 'restaurant' },
    ]);
});

If you want to use something other than the default dataset, e.g. some dataset that you share between actors or between actor runs, then you can use Apify.openDataset() [see docs]:

const dataset = await Apify.openDataset('some-name');

await dataset.pushData({ foo: 'bar' });

API and JavaScript client

The dataset provides a HTTP API to manage datasets and to add/retrieve their data. If you are developing a Node.js application, then you can also use the Apify JavaScript client.

Request queue

The request queue is a storage type that enables the enqueueing and retrieval of requests (i.e. URLs with HTTP method and other parameters). This is useful not only for web crawling, but anywhere you need to process a high number of URLs and to be able to enqueue new links.

Basic usage

Each actor run is assigned own request queue, created when the first request is added to it. The ID of this request queue is available under run.defaultRequestQueueId. You can also create a named queue which can be shared between actors or between actor runs.

To open a request queue, use Apify.openRequestQueue() [see docs].

const Apify = require('apify');

Apify.main(async () => {
    // Open default request queue of the run:
    const queue1 = await Apify.openRequestQueue();

    // Open request queue with name "my-queue":
    const queue2 = await Apify.openRequestQueue('my-queue');
});

If queue is opened then you can use it:

// Add requests to queue
await queue.addRequest(new Apify.Request({ url: 'http://example.com/aaa'});
await queue.addRequest(new Apify.Request({ url: 'http://example.com/bbb'});
await queue.addRequest(new Apify.Request({ url: 'http://example.com/foo/bar'}, { forefront: true });

// Get requests from queue
const request1 = queue.fetchNextRequest();
const request2 = queue.fetchNextRequest();
const request3 = queue.fetchNextRequest();

// Mark some of them as handled
queue.markRequestHandled(request1);

// If processing fails then reclaim it back to the queue
queue.reclaimRequest(request2);

API and Javascript client

The request queue provides a HTTP API to manage queues and to add/retrieve requests. If you are developing a Node.js application, then you can also use the Apify JavaScript client.