Download a PDF from a URL

Simple function to download a PDF, robustly.

download_pdf(url, file, quiet = FALSE, overwrite = FALSE, pause = TRUE)

Arguments

url: The URL for a PDF
file: File to which the PDF will be downloaded
quiet: Suppress a message about which URL is being processed [default=FALSE]
overwrite: Overwrite an existing file of the same name [default=FALSE]
pause: Whether to pause for 0.5-3 seconds during scraping [default=TRUE]

Value

A data.frame with url, destination, success, pdfCheck

Details

Scraping PDFs from the web can run into little hitches that make writing a scraper annoying. This simplifies PDF scraping by creating a dedicated function and support functions to, e.g., test for PDFness. Ensures URL encoding, handles missing URLs gracefully. The filename is the basename of the URL with " " replaced with "_". Includes the pause parameter to limit the rate at which requests hit the hosting servers.

TODO: Have the overwrite check work on the MD5 hash of files in the download sudb rather than relying on file names.

Examples

## Not run: ------------------------------------
#   result <- download_pdf(url = "https://goo.gl/I3P3A3",
#                          file = "~/Downloads/test.pdf")
## ---------------------------------------------

Arguments

Value

Details

Examples

Contents