Under maintenance

3 days trial then $20.00/month - No credit card required now

View all Actors

This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors

Keboola Uploader

hckr-studio/keboola-uploader

Try for free

3 days trial then $20.00/month - No credit card required now

Reliable uploader of Apify Datasets to Keboola Connection (aka KBC). Integration-ready.

Reliable uploader of Apify Datasets to Keboola Connection. We are using Storage API Importer with optimal defaults. This actor is helpful in workflows or for ad-hoc data uploads.

This actor is generalisation of our custom-made uploaders for many of our projects. It uses minimum dependencies and optimizes for speed and reliability.

gracefully handles migrations
implements retry policy for failed uploads
supports Actor Integration
allows to fine tune the batch size for you optimal usage of resources

Your Apify Dataset will be split into batches, converted to CSV and uploaded with gzip compression enabled. You should choose the batchSize according to the nature of you data. Primitive properties from your Dataset will be 1:1 mapped to CSV table. Complex properties (arrays and objects) will be serialized to JSON, so you can use Snowflake support for JSON in your transformations.

Inputs

Dataset ID

ID of Apify Dataset that should be uploaded to Keboola. When you use this actor in Integrations workflow, this field is optional. Default Dataset of previous actor in the flow will be used.

Keboola Stack

Hostname of your Keboola stack import endpoint. See Keboola documentations for more details. Default is import.keboola.com for AWS US-East region. You can alternatively set KEBOOLA_STACK environment variable instead.

Current multi-tenant stacks are:

region	hostname
US Virginia AWS	`import.keboola.com`
US Virginia GCP	`import.us-east4.gcp.keboola.com`
EU Frankfurt AWS	`import.eu-central-1.keboola.com`
EU Ireland Azure	`import.north-europe.azure.keboola.com`
EU Frankfurt GCP	`import.europe-west3.gcp.keboola.com`

If you are single tenant user then your hostname is in format import.CUSTOMER_NAME.keboola.com.

Keboola Storage API Key

Your API Key to Keboola project where you want to upload the data. You should generate new API key just for this actor with limited rights to write only to destination bucket. You can alternatively set KEBOOLA_STORAGE_API_KEY environment variable instead.

Bucket

Name of the destination Keboola bucket. eg. in.c-apify

Table

Name of the destination Keboola table. eg. scrape_results

Headers

Array of header names of destination Keboola table. You can use this to select subset of properties to result table or to reorder the columns - the order of headers is preserved in result table. You can leave it blank if your Dataset items have all properties always specified (without undefined values). In this case properties of the first Dataset item are used. Our recommendation is to be explicit to prevent unexpected data loss.

Batch Size

Size of the batch to upload. Dataset will be split into more batches if it has more items that this number. Batches will be uploaded sequentially. Choose the batch size according to the nature of you data and parallelization of you process. Generally speaking, Keboola Importer works best if you send less frequent bigger portions (dozens of MB gzipped) of data. On the other side you are constrained by the Actor size. You can easily hit OOM condition when this number is too high.

Incremental load

When enabled, imported data will be added to the existing table. When disabled, table will be truncated - all existing data will be deleted from the table. Default is enabled (true).

Developer

hckr.studio

Actor metrics

1 monthly users
33.3% runs succeeded
days response time
Created in May 2024
Modified about 6 hours ago

Categories

Integrations

Developer tools

Automation

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Apify

63.2k

Website Content Crawler

apify/website-content-crawler

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

Apify

13.2k

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.2k

Content Checker

jakubbalada/content-checker

Monitor a website or web page for content changes. Automatically saves before and after screenshots and sends an email notification when content changes are detected.

Jakub Balada

AI Web Agent

apify/ai-web-agent

Use natural language prompts to browse the web, click on elements, fill and submit forms, extract data, and take screenshots using the OpenAI API.

Apify

423

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

3.3k

Merge, Dedup & Transform Datasets

lukaskrivka/dedup-datasets

The ultimate dataset processor. Extremely fast merging, deduplications & transformations all in a single run.

Lukáš Křivka

1.7k

Actor fail manager

lukaskrivka/actor-fail-manager

Automatically triggered on a failed run to analyze if the run should be resurrected and to create an error report for the author.

Lukáš Křivka

2.6k

Send Email

apify/send-mail

The actor automatically sends an email to a specific address. This actor is useful for notifications and reporting. With only 3 lines of javascript code, you'll be on top of your scraping actors and never miss important results or issues.

Apify

2.4k

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

604

Black Friday in Czechia — magical prices one year on

Connecting web scrapers: a guide to Actor-to-Actor integrations

Groupon reaches new merchants thanks to web data collection

Build new tools

Are you a developer? Build your own Actors and run them on Apify.

Learn more

Get a custom solution

Get a custom web scraping or RPA solution.

Book a demo

Keboola Uploader