Simple function to download a PDF, robustly.
download_pdf(url, file, quiet = FALSE, overwrite = FALSE, pause = TRUE)
A data.frame with url, destination, success, pdfCheck
Scraping PDFs from the web can run into little hitches that make
writing a scraper annoying. This simplifies PDF scraping by creating a
dedicated function and support functions to, e.g., test for PDFness. Ensures
URL encoding, handles missing URLs gracefully. The filename is the basename
of the URL with " " replaced with "_". Includes the pause parameter
to limit the rate at which requests hit the hosting servers.
TODO: Have the overwrite check work on the MD5 hash of files in the download
sudb rather than relying on file names.
## Not run: ------------------------------------ # result <- download_pdf(url = "https://goo.gl/I3P3A3", # file = "~/Downloads/test.pdf") ## ---------------------------------------------