Act

jancurn/url-to-pdf

  • Builds
  • latest 0.0.8 / 2017-11-02
  • beta 0.1.4 / 2017-11-18
  • Created 2017-11-01
  • Last modified 2017-11-18
  • grade 71

Description

Opens a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object such as: { "url": "https://www.wikipedia.org/", "sleepMillis": 2000, "pdfOptions": { ... } } The optional "sleepMillis" setting indicates how many milliseconds should the act wait after loading the page before printing it to PDF. The "pdfOptions" object is passed to Puppeteer's page.pdf() function - see https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagepdfoptions Output is a PDF file.


API

To run the act, send a HTTP POST request to:

https://api.apify.com/v2/acts/jancurn~url-to-pdf/runs?token=<YOUR_API_TOKEN>

The POST payload will be passed as input for the act. For more information, read the docs.


Example input

Content type: application/json; charset=utf-8

{
  "url": "https://www.wikipedia.org/",
  "sleepMillis": 2000,
  "pdfOptions": {
    "format": "a4"
  }
}

Source code

Based on the apify/actor-node-puppeteer Docker image (see docs).

const Apify = require('apify');

Apify.main(async () => {
    console.log('Fetching input...');
    const input = await Apify.getValue('INPUT');
    if (!input || typeof(input.url) !== 'string') {
        throw new Error('Input must be an object with the "url" property');
    }
    
    console.log('Launching headless Chrome...');
    const browser = await Apify.launchPuppeteer();
    const page = await browser.newPage();
    
    console.log(`Loading page (url: ${input.url})...`);
    await page.goto(input.url);
    
    if (input.sleepMillis > 0) {
        console.log(`Sleeping ${input.sleepMillis} millis...`);
        await new Promise((resolve) => setTimeout(resolve, input.sleepMillis));
    }
    
    const opts = input.pdfOptions || {};
    delete opts.path; // Don't store to file
    console.log(`Printing to PDF (options: ${JSON.stringify(opts)})...`);
    const pdfBuffer = await page.pdf(opts);
    
    console.log(`Saving PDF (size: ${pdfBuffer.length} bytes) to output...`);
    await Apify.setValue('OUTPUT', pdfBuffer, { contentType: 'application/pdf' });
    
    const storeId = process.env.APIFY_DEFAULT_KEY_VALUE_STORE_ID;
    
    // NOTE: Adding disableRedirect=1 param, because for some reason Chrome doesn't allow pasting URLs to PDF
    // that redirect into the browser address bar (yeah, wtf...)
    console.log('PDF file has been stored to:');
    console.log(`https://api.apify.com/v2/key-value-stores/${storeId}/records/OUTPUT?disableRedirect=1`);
});