Latest update: 18 September 2024
This page give access to reports that have been generated by the GLAMorousToHTML tool for a variety of cultural heritage and scientific institutions, such as museums, libraries, archives and similar public organisations (GLAMs for short).
For every GLAM, this report lists all Wikipedia articles in which images from (the Wikimedia Commons category tree associated with) this institution are being used. The same data is also provided as an Excel file. The structure of the reports is explained below.
In the paragraph on Image contributions, category trees and template contamination, you can read more about who contributed the images, the accuracy of category trees and image thumbnails & template contamination.
For the broader context, see the home page of the GLAMorousToHTML repo.
Currently, reports are available for the following GLAM institutions, countries and regions:
See also this LinkedIn post
See also this LinkedIn post
See also this LinkedIn post
See also this LinkedIn post
All Excel files live in the /data folder, and its subfolders.
All images related to a GLAM institution and its collections are gathered in a corresponding Wikimedia Commons category, and its subcategories when finer categorization is needed.
Roughly speaking, these category trees can range between holding ‘professional’ (collection) images taken and uploaded by the GLAM itself (so-called media donations), or holding ‘amateur’ photos of the collections, buildings exteriors and interiors, staff members etc. of the GLAM taken and uploaded by members of the Wikimedia community.
Examples
So, in the context of the GLAMorousToHTML tool, when we speak about “images from museum X” or “images related to library Y”, or “archive Z and its images”, we need to take the above distinctions into account, avoiding to falsely imply or claim ownership and/or credits over images that were not taken and/or uploaded by that GLAM itself. And of course we only refer to images that are available on Wikimedia Commons!
When defining the top-level category to be used as input for generating a report, we first look for a single category, and its subcategories, containing (mainly) ‘professional’ images related to the collection(s), contributed by the GLAM itself. The name of such category typically starts with something like
With such a focus, we aim to avoid images related to the institution’s buildings (example), staff members & directors (example) or events that took place in the institution (example), as well as other non-collection files related to the GLAM.
When no such single collection-specific top-level category is available, we look for a more general top-level category related to the GLAM, and its subcategories. Examples:
When crawling down such category trees, unintended capturing of non-collection or extra-institutional subcategories and/or images (‘overshooting’) can’t be always avoided, especially for larger category depths where trees may branch out quickly. One might also capture images in subcategories that were erroneously subcategorized by Wikimedia community members or bots. Unfortunately, there is no workaround for this, as we rely on the data provided by the GLAMorous tool, which does not provide straightforward ways to filter out specific subcategories.
For example, with a collection-specific top-level category missing, we selected Category:NEMO and one level of subcategories as the category tree for the report of NEMO. This undesirably but unavoidably captured images showing Amsterdam city views from the rooftop of NEMO. On the plus side, the odds of these images actually being used in Wikipedia articles are slim (and actually zero).
When interpreting GLAM reports, you must be aware of the effect that (small) image thumbnails in templates have on the number of listed articles. For instance, when inspecting the 3.915 articles in Minangkabau that use images from the Wereldmuseum Amsterdam (the former Tropenmuseum), you will notice that almost all articles seem to be void of any images, let alone manifest Wereldmuseum-related images. So why are these articles included in the report?
If you look closely at the bottom left of the article Geringging Baru, Benai, Kuantan Singingi, you will see a small thumbnail image of a house, next to “Artikel batopik…”.
It turns out that this footer is rendered by the Templat:Kelurahan-stub, a template containing this original image from the collection of the museum. This template, and similar ones containing other thumbnail images from the Wereldmuseum Amsterdam, is used in 548 Minangkabau articles, making them included in the report, even though the articles have no connections to the Wereldmuseum Amsterdam at all.
Another example of “template contamination” is illustrated by the 1.550 Persian articles related to art in which this portrait of Van Gogh in this template is used as a ‘meta-personification of all things art’ (*) in these art related articles, that are mainly about other topics than the life and works of Van Gogh. For instance, see the templated-inserted Van Gogh thumb at the bottom right of the article about Adolf Ulrik Wertmüller, a Swedish 18th century painter.
Unfortunately, there is no easy way to filter out template contamination, as we rely on the data provided by the GLAMorous tool, which does not provide straightforward ways to filter out templates or articles using templates.
() *In a similar way that the physicist Albert Einstein has become a ‘meta-icon’ for all fields of science and being smart/brainy in general.