Skip to content

GLAM data from government portals

This is an attempt to assemble some useful information about Australian GLAM (Galleries, Libraries, Archives, Museums) datasets.

As a first step, I've harvested GLAM-related datasets from the various national and state data portals.

You can visualise the contents of the CSV datasets I've harvested by using the GLAM CSV Explorer.

Binder

Tools, tips, and examples

  • Harvesting GLAM data from government portals
    This notebook attempts to harvest the details of GLAM datasets from state and national data portals. I did this by identifying relevant organisations and groups, and then harvesting all the packages associated with them. I also added in a few extra packages that looked relevant. It also attempts some analysis of the results.

  • Harvest GLAM datasets from data.gov.au
    This is a quick attempt to harvest datasets published by GLAM institutions using the new data.gov.au API. To create the list of organisations, I searched the organisations on the data.gov.au site for 'library', 'archives', 'records', and 'museum'.

Results (March 2019)

Datasets by format:

CSV           447
XML            79
JSON           73
XLSX           54
ESRI REST      41
HTML           34
DOCX           33
PLAIN          16
ZIP            13
GEOJSON         8
API             8
DATA            6
OTHER           4
RSS             2
JPEG            2
KML             2
MPK             2
APP             1
CSS             1
JAVASCRIPT      1
PDF             1
HMTL            1
WFS             1
WMS             1

Datasets by organisation:

Queensland State Archives                    172
State Library of Western Australia           147
State Library of South Australia             128
State Library of Queensland                  101
Libraries Tasmania                            71
State Records Office of Western Australia     44
State Records                                 41
South Australian Museum                       33
State Library of New South Wales              21
NSW State Archives                            19
History Trust of South Australia              17
Western Australian Museum                     14
State Library of Victoria                      6
State Library of NSW                           6
National Library of Australia                  5
Museum of Applied Arts and Sciences            3
National Archives of Australia                 3
Tasmanian Museum and Art Gallery               2
Mount Gambier Library                          2

Datasets by licence:

Creative Commons Attribution                                  310
Creative Commons Attribution 4.0                              168
Creative Commons Attribution 3.0 Australia                    156
Creative Commons Attribution 4.0 International                110
License not specified                                          41
Creative Commons Attribution 2.5 Australia                     15
Creative Commons Attribution-NonCommercial                     10
notspecified                                                    5
Other (Open)                                                    4
Creative Commons Attribution 3.0                                3
Creative Commons Attribution Share-Alike                        3
Creative Commons Non-Commercial (Any)                           2
Other (Non-Commercial)                                          1
Creative Commons Attribution Share Alike 4.0 International      1

Results (April 2018)

Here's a CSV containing details of all the datasets I found. I've also uploaded it to Google Sheets.

There are duplicates in the data because some datasets are listed on more than one portal. While my interest is in datasets containing collection data, the list also includes datasets created by the operations of GLAM organisations, such as borrowing data or FOI reports. I might filter these out later on.

There are currently 790 datasets in this list.

Here's the number of datasets by data portal:

data.gov.au        271
data.qld.gov.au    214
data.sa.gov.au     173
data.wa.gov.au      96
data.nsw.gov.au     30
data.vic.gov.au      6

And the number of datasets by organisation:

State Library of South Australia                      121
Housing and Public Works                              117
State Library of Western Australia                    114
Natural Resources, Mines and Energy                    79
State Library of Queensland                            78
LINC Tasmania                                          74
State Records                                          41
State Records Office of Western Australia              41
South Australian Governments                           26
State Library of New South Wales                       21
State Archives NSW                                     19
Environment and Science                                14
History Trust of South Australia                       12
State Library of NSW                                    6
State Library of Victoria                               6
National Library of Australia                           5
Aboriginal and Torres Strait Islander Partnerships      4
Museum of Applied Arts and Sciences                     3
National Archives of Australia                          3
National Portrait Gallery                               2
Mount Gambier Library                                   2
Australian Museum                                       1
City of Sydney                                          1

I've attempted to identify the format of each dataset by checking the file extension. If there's no file extension I use the format value in the package metadata. These values don't always seem reliable. Here's the number of datasets by format:

csv                           499
xml                            66
wms                            35
xlsx                           27
json                           25
docx                           17
xls                            16
txt                            15
zip                            14
doc                            12
api                            12
geojson                         8
other                           7
data                            6
pdf                             4
jpg                             2
html                            2
rss                             2
website link                    2
kml                             2
rtf                             2
kmz                             1
css, java, php, javascript      1
php                             1
xsd                             1
csv, json, web services         1
mp3                             1
js                              1
museum                          1
website                         1
app                             1
jpeg                            1
url                             1
.txt                            1
wfs                             1
plain                           1

For each dataset, I've fired off a HEAD request for the url to see if the link still works. Here's the number of datasets by HTTP status code (200 is ok, 404 is not found):

200    746
404     39
400      3
403      2

I've created a CSV of just the CSV-formatted datasets. I've also uploaded it to Google Sheets.

There are 499 CSV-formatted datasets in this list.

Here are results of the HEAD requests for CSV-formatted datasets:

200    493
404      4
400      2