Robust Downloads of Portable Document Format (PDF) Files • pdfdown

An R package to simplify Portable Document Format (PDF) file downloads.

Scraping PDFs from the web can run into little hitches that make writing a scraper annoying. This simplifies PDF scraping by creating a dedicated function and support functions to, e.g., test for PDFness. Ensures URL encoding, handles missing URLs gracefully. The main function, download_pdf, includes the pause parameter (random 0-3s) to limit the rate at which requests hit the host server. We mostly use this to facilitate scraping U.S. Government documents that are only available as PDFs.

Installation

Use devtools to install pdfdown:

devtools::install_github("Defenders-ESC/pdfdown")

Usage

Get the five-year review for the Pecos puzzle sunflower:

url <- "https://ecos.fws.gov/docs/five_year_review/doc4599.pdf"
helpar5y <- download_pdf(url, "~/Downloads/doc4599.pdf")

Help

Find a bug or have a question? Submit an issue on GitHub! Alternatively, get in touch.

Contributing

Want to add features or fix a bug? Fork the repo and submit a pull request! Thanks!

pdfdown

Installation

Usage

Help

Contributing

License

Developers