- Actor: Memory option for actor runs now supports only values that are power of 2 (ie. 128MB, 256MB, 512MB, 1024MB, 2048MB, ...)!
- Crawler: Proxy configuration of crawler now offers "automatic" mode that rotates all the proxies available for a user.
- Actor: Each actor run can now start a web server accessible at a certain unique URL. This enables you to run a web server inside the actor to provide real-time snapshots or receive tasks on the fly. See documentation for more details.
- API: Added API endpoints to abort Actor run and build.
- Proxy: New Apify Proxy service launched!
- Keboola integration: Added support for running Actors. Check knowledge article for more information.
- Actor: Minimum memory for actor runs is now 128MB.
- SDK: Bunch of improvements and new features. Check the changelog.
- Crawler: Now it is not possible to combine custom proxies and Apify proxy groups.
- Actor: Run console now shows information about current/max/avegare CPU and memory.
- Actor: Actors are now notified 120s before migration to another worker machine. Check documentation for more information.
- API: Added a new API end-point to obtain information about a user account
- API: Storage API now also supports use of
[username]~[storage-name] instead of Dataset ID and Key-value store ID.
- CLI: We have just released an Apify CLI (command line tool) to simplify local development, debugging and deployment to Apify.
- Request queue: New storage type for Actor platform that helps to manage dynamic queue of URLs to be processed. Check storage documentation for more information.
apify NPM package contains a lot of new features. Check its changelog for details.
- Actor: limit for number of processes per actor run was increased to
2 × [memory megabytes] so with 2 GB memory your limit is 4000 processes.
- Actor: host machine now sends
migrating event to actor process in a case of upcoming restart or shutdown. Check documentation.
- Actor: actor runs have now fixed amount of CPU capacity reserved and therefore each run should take about the same time. We also added a new checkbox "Use spare CPU capacity" in actor settings allowing actors to use spare CPU capacity at host machine as free boost.
Community: we released a new version of our open souce
apify npm package containing a lot of new stuff to help you with your web scraping and automation projects.
Check its npm page,
source code at GitHub repository
and the documentation.
apify/actor-node-puppeteer Docker image is now deprecated. Use
apify/actor-node-chrome image instead.
- Actor: we have added
apify/actor-node-chrome-xvfb image that supports non-headless Chrome. If you choose this image then
Apify.launchPuppeteer() opens Puppeteer with non-headless Chrome by default.
- Actor: we did improvements of our infrastructure to improve actor starts and overall performance.
- Actor: logs are now rate-limited. Each actor run and build has 10 000 lines log credit with 10 lines added each second. Log lines over the limit won't be available in both UI and API.
- Web: launched Page Analyzer tool to enable setting
up crawlers with less manual steps. Read more on
- Infrastructure: Major improvements to our Linux server configuration
to improve stability and performance of the system
- Actor: actors can now run with 16GB memory (available for users with Medium and large plans see https://www.apify.com/docs/actor#limits
- Actor: actor runs and their default key-value stores and datasets are now being deleted after data retention period.
- App: We've added support for PayPal payments for all subscription plans
- Actor: the actor source code can now come from a GitHub Gist, which is much
simpler than having a full Git repository (read the docs)
- Support: We have re-launched the Knowledge
base with a new design and much better search options.
- API: Added API
endpoint to run an actor and get its output in a single HTTP request.
- Actor: We've added a new storage type Dataset. This enables you to
store results in a way similar to Apify Crawler.
- Actor: Actor usage statistics are now available in user account.
- Community: Released the proxy-chain NPM package as open
- Actor: Smarter allocation of tasks to servers to improve performance
- Actor: Environment variables can now also be passed to actor builds (as docker
- Actor: Added option to automatically restart actor runs on error
- Crawler: Fixed URL in the
link element of RSS formatted last
crawler execution result. This bug was causing that some RSS readers never
refreshed the data
- Crawler: Added support for automatic
rotation of user agents
- Open source: Released a new NPM package called proxy-chain
to support usage of proxies with password from headless Chrome
- API: Added support for XLSX
output Format for crawler results
- App: Upgraded the web app to Meteor 1.6 and thus greatly improved the
speed of the app
- Internal: Improved internal notifications, performance and infrastructure
- Actor: Added feature to enable actor to be anonymously runnable
- Apifier is dead, long live Apify! On 9th October we launched our
biggest upgrade yet.
- The old website at www.apifier.com was
public static website www.apify.com and the app running at my.apify.com
- A new product called Actor was
introduced. Read more in our blog
- Added actor support to scheduler.
- Git and Zip file source type added to actor.
- API: API endpoint providing results in XML format now allows to set XML tag
- API: Added support for JSONL output format
- Web: Created Crawler request form
to help customers specify the crawlers they would like to have built
- Crawler: Added finish webhook
data feature that enables sending of additional info in webhook request
- Web: Added a feature to delete user account
- Internal: Improvements in logging system
- General: Officially launched Zapier integration
- Crawler: Added a new
context.actId property that enables users to
fetch information about their crawler.
- Internal: Consolidated logging in the web application, improvements in Admin
- Crawler: Added proxy groups crawler setting to simplify usage of proxy
- Web: Added Schedule button to the crawler details page to simplify
scheduling of the crawlers
- Internal: Improvements in administration interface
- Web: Performance optimizations in UI
- Web: Added a tool to test the crawler on a single URL only (see Run
console on the crawler details page)
- Internal: Improved reports in admin section
- Web: Changed Twitter handle from @ApifierInfo to @apifier.
- Crawler: Bugfix - cookies set in the last page function were not persisted
- Internal: Deployed some upgrades in data storage infrastructure to improve
performance and reduce costs
- Web: Added sorting to Community crawlers.
- Web: Bugfixes, performance and cosmetic improvements.
- Internal: improvements in administration interface.
- Web: Extended public user profile pages in Community
- API: Bugfix in exports of results in XML format.
- Crawler: Added a new
context.actExecutionId property that enables
users to stop crawler during its execution, fetch results etc.
- Web: Improvements in internal administration interface.
- Web: Launched an external Apifier Status
page to keep our users informed
about system status and potential outages.
- Web: Numerous improvements on Community
crawlers page, added user profile page, enabled anonymous sharing
- API: Improved sorting of columns in CSV/HTML results table - values are now
sorted according to numerical indexes (e.g. "val/0", ..., "val/9", "val/10")
- Web: Launched Apifier community page
- General: Invoices are now in the PDF format and are sent to customers by email
- We didn't launch anything today, just wishing you a happy Valentine's Day
- Web: New testimonials from ePojisteni.cz
on our Customers page. Thanks Dušan and Andy!
- Web: Released a major upgrade of billing and invoicing infrastructure to support
European value-added tax (VAT)
- Web: Added a new Video tutorials page
- Crawler: Improved normalization of URLs which is used by the crawler to
determine whether a page has already been visited
(see Request.uniqueKey property in
docs for more details)
- Infrastructure: changed CDN provider from CloudFlare to AWS CloudFront to
improve performance of web and API
- API: Bugfix in the start
execution API endpoint -
synchronous wait would sometimes time out after 60 seconds
- Internal: further improvements in administration interface
- Web: improved aggregation of usage statistics, now it refreshes automatically
- Crawler: Request.proxy is now
available even inside of the page function
- Web: improved Invoices page
- Internal: improvements in administration interface
- Web: displaying snapshot of the crawling queue in the Run console
- API: all paginated API
endpoints now support
desc=1 query parameter to sort records in
- API: added support for XML
attributes in results
- General: added support for RSS output format to enable creating RSS feeds
for any website
- General: launched a new discussion forum
- Crawler: custom proxy used by a particular request is now saved in
(see Custom proxies in docs)
- Crawler: performance improvements
- API: enabled rate limiting
Major API upgrades:
- added new
endpoints to update and delete crawlers
- support for synchronous
execution of crawlers
- all endpoints that return lists now support pagination
- API Reference was greatly improved
- Web: Added new Tag and Do not start crawler if previous still
running settings to schedules
- General: Added new Initial
setting to enable users to edit cookies used by their crawlers
- Web: Added a list of
invoices to Account page
- Web: Added a new usage stats chart to Account page
- Internal: Large improvements in the deployment system completed
- General: Increased the length limit for Start URLs to 2000 characters
- Web: Showing more relevant statistics in crawler progress bar
- Web: Released a new shiny API reference
- Internal: Performance and usability improvements in admin interface
- Internal: Migrated our main database to MongoDB 3.2, deployed new integration
test suite, new metrics in admin interface
- Web: Showing current service limits on the Account page, various internal
improvements in user handling code
- Web: Added new example crawlers to demonstrate how to use page's internal
- New feature: Released Schedules
enable to automatically run crawlers at certain times.
- Web: Switched to Intercom to manage communication with our
- Web: Added functionality to test finish webhooks
- Web: Security fix - added
rel="noopener" to all external links
in order to avoid exploitation of the
- Web: Displaying Internal ID field on crawler details page, and
User ID and API token token on the
Account page to simplify setup of
- Web: Added a new Jobs page, because we're hiring!
- Web: Deployed various performance optimizations and bugfixes
- Internal: Updated our Meteor application to use ES2015 modules
- Web: Published a new testimonial from Shopwings
our Customers page. Thanks Guillaume!
queuePosition can now also be overridden in
- Web: Performance improvements of results exports
- Web: Added new example crawler to demonstrate a basic SEO analysis tool
- Internal: Upgraded Meteor platform from version 1.3 to 1.4
- Docs: Added API property name and type next to each crawler settings (see docs)
- Crawler: Added a new
context.stats property to pass statistics from
the current crawler to user code
- Crawler: Added a new signature for
that enables placing new pages
to beginning of the crawling queue and overriding
- Crawler: Enabled users to define custom User-Agent HTTP header, updated the
default value to
resemble latest Chrome on Windows.
- Web: Implemented optimization that enables user to export even large result sets
to CSV/HTML format.
- Web: Created this wonderful page to keep our users up-to-date with new