GLAM data from government portals¶
This is an attempt to assemble some useful information about Australian GLAM (Galleries, Libraries, Archives, Museums) datasets.
As a first step, I've harvested GLAM-related datasets from the various national and state data portals.
You can visualise the contents of the CSV datasets I've harvested by using the GLAM CSV Explorer.
Tools, tips, and examples¶
This notebook attempts to harvest the details of GLAM datasets from state and national data portals. I did this by identifying relevant organisations and groups, and then harvesting all the packages associated with them. I also added in a few extra packages that looked relevant. It also attempts some analysis of the results.
This is a quick attempt to harvest datasets published by GLAM institutions using the new data.gov.au API. To create the list of organisations, I searched the organisations on the data.gov.au site for 'library', 'archives', 'records', and 'museum'.
Results (July 2019)¶
- Human readable list of all GLAM datasets harvested from data.gov.au
- GLAM datasets from data.gov.au – all formats (CSV)
- GLAM datasets from data.gov.au – CSVs only (CSV)
Datasets by format:
Datasets by organisation:
State Library of Queensland 204 Queensland State Archives 172 State Library of Western Australia 147 State Library of South Australia 140 Libraries Tasmania 71 State Records 41 PROV Public Record Office 33 South Australian Museum 33 State Library of New South Wales 21 NSW State Archives 19 History Trust of South Australia 19 State Records Office of Western Australia 7 State Library of NSW 6 Western Australian Museum 6 National Library of Australia 5 State Library of Victoria 5 Australian Museum 4 Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) 3 National Archives of Australia 3 Museum of Applied Arts and Sciences 3 Mount Gambier Library 2 Tasmanian Museum and Art Gallery 2 National Portrait Gallery 2
Datasets by licence:
Creative Commons Attribution 250 Creative Commons Attribution 3.0 Australia 244 Creative Commons Attribution 4.0 237 Creative Commons Attribution 4.0 International 146 Creative Commons Attribution 2.5 Australia 32 Creative Commons Attribution-NonCommercial 10 Other (Open) 5 notspecified 5 Creative Commons Attribution Share-Alike 4.0 3 Creative Commons Attribution 3.0 3 Creative Commons Attribution Non-Commercial 4.0 2 Custom (Other) 1
Results (April 2018)¶
There are duplicates in the data because some datasets are listed on more than one portal. While my interest is in datasets containing collection data, the list also includes datasets created by the operations of GLAM organisations, such as borrowing data or FOI reports. I might filter these out later on.
There are currently 790 datasets in this list.
Here's the number of datasets by data portal:
data.gov.au 271 data.qld.gov.au 214 data.sa.gov.au 173 data.wa.gov.au 96 data.nsw.gov.au 30 data.vic.gov.au 6
And the number of datasets by organisation:
State Library of South Australia 121 Housing and Public Works 117 State Library of Western Australia 114 Natural Resources, Mines and Energy 79 State Library of Queensland 78 LINC Tasmania 74 State Records 41 State Records Office of Western Australia 41 South Australian Governments 26 State Library of New South Wales 21 State Archives NSW 19 Environment and Science 14 History Trust of South Australia 12 State Library of NSW 6 State Library of Victoria 6 National Library of Australia 5 Aboriginal and Torres Strait Islander Partnerships 4 Museum of Applied Arts and Sciences 3 National Archives of Australia 3 National Portrait Gallery 2 Mount Gambier Library 2 Australian Museum 1 City of Sydney 1
I've attempted to identify the format of each dataset by checking the file extension. If there's no file extension I use the
format value in the package metadata. These values don't always seem reliable. Here's the number of datasets by format:
For each dataset, I've fired off a
HEAD request for the url to see if the link still works. Here's the number of datasets by HTTP status code (
200 is ok,
404 is not found):
200 746 404 39 400 3 403 2
There are 499 CSV-formatted datasets in this list.
Here are results of the HEAD requests for CSV-formatted datasets:
200 493 404 4 400 2