An R package to simplify Portable Document Format (PDF) file downloads.
Scraping PDFs from the web can run into little hitches that make writing a scraper annoying. This simplifies PDF scraping by creating a dedicated function and support functions to, e.g., test for PDFness. Ensures URL encoding, handles missing URLs gracefully. The main function, download_pdf, includes the pause parameter (random 0-3s) to limit the rate at which requests hit the host server. We mostly use this to facilitate scraping U.S. Government documents that are only available as PDFs.
Get the five-year review for the Pecos puzzle sunflower:
url <- "https://ecos.fws.gov/docs/five_year_review/doc4599.pdf"
helpar5y <- download_pdf(url, "~/Downloads/doc4599.pdf")Find a bug or have a question? Submit an issue on GitHub! Alternatively, get in touch.
Want to add features or fix a bug? Fork the repo and submit a pull request! Thanks!