GLAM data from government portals¶
This is an attempt to assemble some useful information about Australian GLAM (Galleries, Libraries, Archives, Museums) datasets.
As a first step, I've harvested GLAM-related datasets from the various national and state data portals.
You can visualise the contents of the CSV datasets I've harvested by using the GLAM CSV Explorer.
Tools, tips, and examples¶
Harvesting GLAM data from government portals
This notebook attempts to harvest the details of GLAM datasets from state and national data portals. I did this by identifying relevant organisations and groups, and then harvesting all the packages associated with them. I also added in a few extra packages that looked relevant. It also attempts some analysis of the results.
Harvest GLAM datasets from data.gov.au
This is a quick attempt to harvest datasets published by GLAM institutions using the new data.gov.au API. To create the list of organisations, I searched the organisations on the data.gov.au site for 'library', 'archives', 'records', and 'museum'.
Results (March 2019)¶
Datasets by format:
Datasets by organisation:
Queensland State Archives 172 State Library of Western Australia 147 State Library of South Australia 128 State Library of Queensland 101 Libraries Tasmania 71 State Records Office of Western Australia 44 State Records 41 South Australian Museum 33 State Library of New South Wales 21 NSW State Archives 19 History Trust of South Australia 17 Western Australian Museum 14 State Library of Victoria 6 State Library of NSW 6 National Library of Australia 5 Museum of Applied Arts and Sciences 3 National Archives of Australia 3 Tasmanian Museum and Art Gallery 2 Mount Gambier Library 2
Datasets by licence:
Creative Commons Attribution 310 Creative Commons Attribution 4.0 168 Creative Commons Attribution 3.0 Australia 156 Creative Commons Attribution 4.0 International 110 License not specified 41 Creative Commons Attribution 2.5 Australia 15 Creative Commons Attribution-NonCommercial 10 notspecified 5 Other (Open) 4 Creative Commons Attribution 3.0 3 Creative Commons Attribution Share-Alike 3 Creative Commons Non-Commercial (Any) 2 Other (Non-Commercial) 1 Creative Commons Attribution Share Alike 4.0 International 1
Results (April 2018)¶
There are duplicates in the data because some datasets are listed on more than one portal. While my interest is in datasets containing collection data, the list also includes datasets created by the operations of GLAM organisations, such as borrowing data or FOI reports. I might filter these out later on.
There are currently 790 datasets in this list.
Here's the number of datasets by data portal:
data.gov.au 271 data.qld.gov.au 214 data.sa.gov.au 173 data.wa.gov.au 96 data.nsw.gov.au 30 data.vic.gov.au 6
And the number of datasets by organisation:
State Library of South Australia 121 Housing and Public Works 117 State Library of Western Australia 114 Natural Resources, Mines and Energy 79 State Library of Queensland 78 LINC Tasmania 74 State Records 41 State Records Office of Western Australia 41 South Australian Governments 26 State Library of New South Wales 21 State Archives NSW 19 Environment and Science 14 History Trust of South Australia 12 State Library of NSW 6 State Library of Victoria 6 National Library of Australia 5 Aboriginal and Torres Strait Islander Partnerships 4 Museum of Applied Arts and Sciences 3 National Archives of Australia 3 National Portrait Gallery 2 Mount Gambier Library 2 Australian Museum 1 City of Sydney 1
I've attempted to identify the format of each dataset by checking the file extension. If there's no file extension I use the
format value in the package metadata. These values don't always seem reliable. Here's the number of datasets by format:
For each dataset, I've fired off a
HEAD request for the url to see if the link still works. Here's the number of datasets by HTTP status code (
200 is ok,
404 is not found):
200 746 404 39 400 3 403 2
There are 499 CSV-formatted datasets in this list.
Here are results of the HEAD requests for CSV-formatted datasets:
200 493 404 4 400 2