GLAM data from government portals¶
This is an attempt to assemble some useful information about Australian GLAM (Galleries, Libraries, Archives, Museums) datasets.
As a first step, I've harvested GLAM-related datasets from the various national and state data portals. I did this by identifying relevant organisations and groups, and then harvesting all the packages associated with them. I also added in a few extra packages that looked relevant.
Tools, tips, and examples¶
- Harvesting GLAM data from government portals
This notebook attempts to harvest the details of GLAM datasets from state and national data portals. It also attempts some analysis of the results.
Results (April 2018)¶
There are duplicates in the data because some datasets are listed on more than one portal. While my interest is in datasets containing collection data, the list also includes datasets created by the operations of GLAM organisations, such as borrowing data or FOI reports. I might filter these out later on.
There are currently 790 datasets in this list.
Here's the number of datasets by data portal:
data.gov.au 271 data.qld.gov.au 214 data.sa.gov.au 173 data.wa.gov.au 96 data.nsw.gov.au 30 data.vic.gov.au 6
And the number of datasets by organisation:
State Library of South Australia 121 Housing and Public Works 117 State Library of Western Australia 114 Natural Resources, Mines and Energy 79 State Library of Queensland 78 LINC Tasmania 74 State Records 41 State Records Office of Western Australia 41 South Australian Governments 26 State Library of New South Wales 21 State Archives NSW 19 Environment and Science 14 History Trust of South Australia 12 State Library of NSW 6 State Library of Victoria 6 National Library of Australia 5 Aboriginal and Torres Strait Islander Partnerships 4 Museum of Applied Arts and Sciences 3 National Archives of Australia 3 National Portrait Gallery 2 Mount Gambier Library 2 Australian Museum 1 City of Sydney 1
I've attempted to identify the format of each dataset by checking the file extension. If there's no file extension I use the
format value in the package metadata. These values don't always seem reliable. Here's the number of datasets by format:
For each dataset, I've fired off a
HEAD request for the url to see if the link still works. Here's the number of datasets by HTTP status code (
200 is ok,
404 is not found):
200 746 404 39 400 3 403 2
Just the CSVs¶
There are 499 CSV-formatted datasets in this list.
Here are results of the HEAD requests for CSV-formatted datasets:
200 493 404 4 400 2