wikimedia-commons_copyright-templates

Technical notes (under construction)

Latest update: xx May 2025

Work on this: This page gives more info about

The scripts ‘extract_copyright_templates.py’ and ‘template_usage_summary.py’ and 2 The data files in the data folder
The datavisualisation from datawrapper, created via the datawrapper API
…xxx

TO ADD

Uses the MediaWiki API to search for Commons files in the desired category.
Fetches the raw wikitext of each file page.
Isolates wrapper templates like , , , and .
Extracts relevant templates from top-level usage or embedded fields like:
- permission=
- date=
- publication date=
Handles multiline and nested template values reliably.
Extracts a simplified creation date from various formats:
- , , , etc.
Supports date formats: YYYY, YYYY-MM, YYYY-MM-DD
Returns the most recent valid year if multiple are present.
Excludes known irrelevant templates via a robust filtering system.
Outputs results to:
- Console (one line per file with all extracted info)
- Excel file (*_commons_templates_output_<date>.xlsx) with URLs and linked templates
  - Excel file (*_commons_templates_output_<date>-cleaned.xlsx) is a munually cleaned version of the first file, where any non-copyright templates, incorrect dates and other ‘noise’ that we did not manage to get filtered out by the Python script have been manually removed as a post-processing step.

Olaf Janssen, Wikimedia coordinator @KB national library of the Netherlands (via ChatGPT)
Last updated: 9 April 2025
User-Agent: OlafJanssenBot/1.0

This script is CC0, so released into the public domain. You may freely use, adapt, and redistribute it.

This site is open source. Improve this page.