GLAMorousToHTML

Creates a HTML page and a corresponding Excel file listing all Wikipedia articles (in all languages) in which (one or more) images from a given category on Wikimedia Commons are used.

Latest update: 1 March 2024

What does it do?

The script GLAMorousToHTML.py creates a HTML page and a corresponding Excel file listing all Wikipedia articles (in all languages) in which (one or more) images/media from a given category on Wikimedia Commons are used. It does so by converting the XML output of the GLAMorous tool.

What problem does it solve?

The KB uses the GLAMorous tool to measure the use of KB media files (as stored in Wikimedia Commons) in Wikipedia articles. This tool rapports 4 things :

Please note: ‘Total image usages’ does NOT equal the number of unique WP articles! A single unique KB image can illustrate multiple unique WP articles, and/or the other way around, 1 unique WP article can contain multiple unique KB images. In other words: images-articles have many-to-many relationships.

What was still missing was the functionality to measure

That is why we made the GLAMorousToHTML tool. This script uses the XML-output of GLAMorous to make an HTML page listing unique WP articles (in which one or more KB media files are used), grouped by language.

Per 14-02-2024 it also delivers an Excel file with equivalent data.

Configuration of GLAMorous

The script relies on the XML output of GLAMorous, which needs to be configured so that it only lists pages from Wikipedia

1) that are in the main namespace (a.k.a Wikipedia articles) (&ns0=1)

2) and not pages from Wikimedia Commons, Wikidata or other Wikimedia projects (projects[wikipedia]=1)

The base URL looks like https://glamtools.toolforge.org/glamorous.php?doit=1&use_globalusage=1&ns0=1&projects[wikipedia]=1&format=xml&category=. The Commons category of interest needs to be added to the end, omitting the Category: prefix. It is defined (and can be adapted) in the xml_base_url variable in setup.py.

By default the depth of the GLAMorous output is set to 0, meaning no subcategories are read. If you want to include images from subcategories in your outputs, you can change the depth variable in setup.py.

Running the script yourself

If you want to run this script for your own Commons category and create HTML and Excel overviews for your own institution, you can clone/download the repo and run it on your own machine. You will need to make some simple adaptations to the existing code to make it work for the Commons category of your choice. These are:

1) Adapt the category_logo_dict.json for your own needs, making sure the existing syntax is maintained.

2) Add a small logo of the institution (256x256 px or so) as a .png of .jpg to the site/logos folder, and add the filename “icon_xxxxx.png/jpg” to the json file.

3) In setup.py, change

That’s all, you should now be able to run the main GLAMorousToHTML script. The generated HTML page will be added to the site/ folder and the Excel to the data/ folder.

In case you can’t get the script up and running, please open an issue in this repo.

Examples

KB, national library of the Netherlands

Media contributed by Koninklijke Bibliotheek

Atlas de Wit 1698

Atlas van der Hagen

Media from Atlas of Mutual Heritage - Koninklijke Bibliotheek

Nederlandsche vogelen van Nozeman en Sepp

Der naturen bloeme - KB KA 16

Catchpenny prints from Koninklijke Bibliotheek

Bookbindings from Koninklijke Bibliotheek

Other institutions

Netherlands

See also this LinkedIn post

USA

See also this LinkedIn post

Nordic countries

See also this LinkedIn post

Norway
Sweden
Finland
Denmark

Australia and New Zealand

See also this LinkedIn post

Australia
New Zealand

See also

Change log

14 March 2024

29 February 2024

14 February 2024

Features to add