This crawler takes last crawler run result and stores new items in Google Docs Spreadsheet.
Simply extracts article text and other meta info from given url. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.
Aliexpress.com - own orders
Get all your orders from aliexpress.com in machine readable format.
Example crawler for news.ycombinator.com build using Apify SDK
Extracts all tweets for given hashtag.
This act accepts a url list and downloads HTML of each page. It has input parameter - "sources" (see soursec parameter of UrlList https://www.apify.com/docs/sdk/apify-runtime-js/beta#RequestList).
Crawls given list of urls with one crawler execution per url.
Skoda-auto.cz - model variants
Get all model-engine-equipment package variants of Škoda Auto cars.
Scrapes the links with their rank from HN Show. Created for this blogpost https://medium.com/p/8cccfa25f5cb/edit
This act creates a timeline spreadsheet from crawler results. Main use-case is to create a spreadsheet containing changes of some web page in time.
Example how to use Puppeteer in parallel using 'es6-promise-pool' npm package.
This act loads list of urls from INPUT.sources. Each of these links should point to a xml file. It downloads all the files and saves them to it's default dataset. Groups parameter in INPUT allows to choose Apify proxy groups to us...
This actor simply tests given array of URLs against selected proxy URLs or Apify proxy groups.
This act can be used as synchronous API. Returns a JSON containing actor runs finished in the last 24 hours along with information about their default datasets and request queues. Actors might be filtered via input array "actIds".
Deletes all untitled acts from your account. In a minute. For free. With one click!
This act can be used as crawler's finish webhook. It transforms crawler's result into sitemap XML file and stores it in key-value-store named "sitemaps".