Skip to content

Trove newspaper harvester

Download large quantities of digitised newspaper articles from Trove using the Trove Harvester tool.


Tools, tips, and examples

  • Using TroveHarvester to get newspaper articles in bulk
    An easy introduction to the Trove Harvester command line tool. Edit a few cells and you'll be harvesting metadata and full text of thousands of newspaper articles in minutes.

  • Exploring your TroveHarvester data
    This notebook shows some ways in which you can analyse and visualise the article metadata you've harvested — show the distribution of articles over time and space; find which newspapers published the most articles. (Under construction)

  • Exploring harvested text files
    This notebook suggests some ways in which you can aggregate and analyse the individual OCRd text files for each article — look at word frequencies ; calculate TF-IDF values. (Under construction)

Useful apps

These are Jupyter notebooks designed to run in ‘app mode’ with the code cells hidden. The Binder buttons will automatically open the notebooks in app mode, but you can always view and edit the code by clicking the ‘Edit App’ button.

  • Trove Harvester web app
    A simple web interface to the TroveHarvester, the easiest way to harvest data from Trove.

    Screen capture of Trove Harvester