wikimedia-commons_copyright-templates

« Back to stories index

Free to use? Exploring public domain claims in Wikimedia Commons files sourced from Delpher (May 2025)

Olaf Janssen, xx May 2025

This article is also available as PDF.

Logo Wikimedia Commons

Logo Delpher

Delpher offers access to millions of digitized pages from Dutch historical newspapers, books, and magazines — a valuable resource frequently used on Wikimedia Commons. In the first part of this data story, we examine how the Wikimedia community has assigned public domain status to Commons files that have been sourced from Delpher.
In the second part, we explore the validity of these claims and assess whether they align with the actual copyright status of the works. We identify common mistakes made by the Wikimedia community when applying public domain templates to files. Finally, we examine whether these errors have resulted in any serious copyright violations.

Key figures and findings

The most important key figures and findings of this story are:

Intro, preamble, and background

Why did I write this article?

Much of the historical content from Delpher falls into the public domain due to its age and can therefore be uploaded to Wikimedia Commons without concern. At the same time, the KB — being the operator of Delpher — has a contractual obligation towards authors and publishers to monitor potential copyright infringements and to prevent them as much as possible, including Delpher content that has been uploaded to Wikimedia Commons.

For this reason, the KB wants to gain a better understanding of which newspaper articles, books, magazines and other materials from Delpher have been uploaded to Wikimedia Commons, and how public domain claims to those files have been assigned by the Wikimedia community. In doing so, it is important to emphasize that the KB has absolutely no intention to act as a copyright police force. The goal is to work together with the Wikimedia community to handle copyright matters responsibly, with respect for both creators and users.

What this article aims to do

  1. Provide a practical case study of how public domain claims are applied in a real-world environment — specifically, how Wikimedia Commons contributors handle copyright claims for files sourced from Delpher.
  2. Offer insight into the complexity of public domain claims on Wikimedia Commons — even for the relatively simple case where files originate from a single source (Delpher) from a single country (the Netherlands).
  3. Explore how accurately the Wikimedia community applies public domain claims, and assess to what extent potential copyright violations may occur — including whether serious violations are present.
  4. Share a practical data story of how to machine-analyze and visualize copyright claims for files in (subsets of) Wikimedia Commons using data analysis and visualization techniques.

What this article does not aim to do

  1. Provide a comprehensive overview of all public domain claims on Wikimedia Commons. This article focuses specifically on files sourced from Delpher, which is a manageable subset of the total number of files in Commons.
  2. Provide a formal and/or detailed legal analysis of every public domain claim for these files — such an approach would be far too extensive for the scope of this data story.
  3. XXXXXXXXX Identify and flag every potential copyright infringement — aside from highlighting a few obvious and illustrative cases mentioned later in this story. XXXXXXXXXXXX (besides the 5 obvious cases mention below)
  4. Offer recommendations or proposals on how to simplify public domain claims on Wikimedia Commons. This article takes the current public domain landscape “as is,” observing how it functions in practice without suggesting reform.

Who is this article relevant for?

This analysis of public domain template usage on Wikimedia Commons applied to files sourced from Delpher may be of interest to:

Copyrights templates in Wikimedia Commons

Wikimedia Commons is one of the largest open-access media repositories in the world, used daily by Wikipedia and countless other businesses and projects. To protect the open and reusable nature of its content, strict legal rules must be followed for files that are uploaded to Commons:

1) All uploaded files must either be:

2) These copyright claims must be explicitly and unambiguously added to the file description page. See for instance the public domain claim stated in this portrait made by the Dutch photograhper Toni Arens-Tepe (1883–1947).

3) These claims are typically expressed through standardized copyright templates (also known as license tags). These templates are meant to ensure clarity, uniformity and standardization when declaring copyright status of files. Templates on Commons can be recognized by the double curly brackets they are called by, for instance

But here’s the problem: although the purpose of these templates is to provide a clear and standardized way to declare copyright status, the practical reality is that the license tagging system — built over the years by the international Wikimedia community — has become very complex. The number of different copyright templates in use on Commons is enormous.

To get a sense of this complexity, take a look at this summary of the most common template types or explore this nested overview of several thousands(!) of copyright templates being used on Commons.

Both insiders and outsiders will struggle to find their ways in this system, it can feel like working through a jungle of overlapping licensing options and confusing terminology, undermining the intended simplicity and standardization.

However, this complexity is not entirely surprising. Wikimedia Commons accepts media from any country and any historical period, and must therefore be able to handle the copyright rules and exceptions from dozens of legal systems worldwide. This elaborate system is necessary because Wikimedia Commons is a global, evolving platform. Templates are regularly added or updated as contributors find new sources for uploads or as local copyright regulations change.

=============================

Zooming in: public domain templates in Wikimedia Commons

Even if we narrow the scope and look only at public domain templates, things remain pretty complicated. And even if we only cionsider public domain templates, the variety and numbers are still pretty large.

https://commons.wikimedia.org/wiki/Commons:Copyright_tags/General_public_domain - give a ioverview of over 70 General public domain templates

The comlexity becomwes https://commons.wikimedia.org/wiki/Category:PD_license_tags and its subcategpries, whetrw amny country specigfic pd templas (reflecting local copyright law) are listed.

In addition to all these couintry specigid, an extra layer of complexity is added: Every file on Wikimedia Commons must include a U.S. public domain justification**
because the Commons servers are located in the United States, and U.S. copyright law applies. To add to the complexity, becausee Wikimedia servers are in the US, all fies stored there must also compy to US copyright and public domain regulations, which can be very complicatred and different from the jursdixctins in the country of origin.

In practice This means that for many Commons files, multiple templates are required:

A non-exhaustive collage of screenshots of public domain template description pages used in Wikimedia Commons files that have been sourced from Delpher.
A (non-exhaustive) collage of screenshots of public domain template description pages, as used in Wikimedia Commons files that have been sourced from Delpher (Click to enlarge).
Image license: CC-BY-SA 4.0 / Olaf Janssen, KB national library of the Netherlands.

===============================================

Zooming in further: licensing templates used in Delpher fioles

Creating the dataset

The Wikimedia community has been uploading newspaper articles, advertisements, obituaries, book pages, portraits from magazines, and other materials from Delpher (and its predecessor projects) to Wikimedia Commons since March 2008. Because these files were originally scattered across Commons without consistent categorization, the first step was to bring them together into a single, central place: Category:Media from Delpher. This category currently contains just over over 62K files.

Delpher source template
We added a {{Delpher}} source template to all of these files (example). This is not only to visually and textually communicate that Delpher is the source of these files, but also to automatically include the files into the said category.

Screenshot of the rendered Delpher source template on Wikimedia Commons.
A screenshot of the rendered Delpher source template on Wikimedia Commons.

Excluding scans from the Internet Archive
As you can see in the category, a significant part is filled with files claimed to be uploaded from the Internet Archive, but that find their real origins in Delpher. These are the PDFs with IA ddd …mpeg21 in their titles (example). In total there are 55,761 files from the Internet Archive (d.d. 9 April 2025) that were originally sourced from Delpher.

All of these files are marked with the {{PD-old-70-expired}} copyright template, which means that they are safely in the public domain in the Netherlands (and its predecessors), the rest of the EU and the United States.

Because such a large part - 89.6% of the files - in the Category:Media from Delpher come from the Internet Archive, we decided to exclude them from our further analysis. And because all of them are marked with the exact same copyright template, including them would make our analysis too biased (or skewed) towards these files and templates.

Extracting copyright templates
This left us with 6,496 ‘non-Internet Archive’ files from Delpher. For these files, we wanted to extract the associated copyright templates. With some help from ChatGPT, we developed a (rather monsterous) Python script to extract public domain or public domain-like license templates (e.g., Creative Commons). This script was not 100% perfect, some manual post-processing was needed to clean up the data.

Excluding files without publication/creation dates
As we plan to assess the validity of copyright claims against the actual publication or creation dates of the underlying works, we also designed the script to extract simplified date information. Files that provided no publication or creation dates were excluded from further analysis. We will discuss the date extraction process in more detail in SECTIONXXXXXXXXXXXXXXXXXXX.

The final dataset
In the end, we were able to retrieve 6,248 distinct files that contained both (one or more) copyright templates and a publication or creation date. This is the dataset used in our further analysis. XXXXXXXXXXXXX

Zoom in on Categortry: Media from Delpher Delpher

Section 3

After having done the scan, from the Excel, we could quite ealiy look for opublications that possibly did copyvio, based on the yerar of publication. We looked for content that was published in the last 70 years, and that still was marked as public domain or under CC-licensing. We found 4 files that were published in the last 70 years (after 1955), but that were marked as public domain or under CC-licensing.In total 4 deletion trequerst due to copyvio, all of which were granted and the fielswere deleted very quicly.

Quote 1

Quote 2

Section 2.1

Explan which templates have been found

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Section 2.1 - Zooming in on copyrights expired because of age

Next., lets’;s zoom in on the copyright templates that are used for files that are in the public domain because of age. The blue colors. In total theis comprises 24 templates 6191 used in xx distinct files

Section 3 Compliance of the community to the copyrights stramtrwents

Are thre any violations of big mistakes?

forthe “pd becauase of age groop” (98% of uses) we will llok at the year in which the orginalwork was published or creaed (column F “DateOfPublicationOrCreation” in the Excel)

Interesting cases to study, in the Excel

Make Datarwapper for trhat https://commons.wikimedia.org/wiki/File:Proclamatie1955-Amigoe.jpg File:Proclamatie1955-Amigoe.jpg M147748690 Klik 1 1955 https://commons.wikimedia.org/wiki/Template:PD-anon-70-EU Klik Copyrights expired because of age https://commons.wikimedia.org/wiki/File:1957_Foto-album_van_burgemeester_P.M.J.S._Cremers%2C_1957_18.jpg File:1957_Foto-album_van_burgemeester_P.M.J.S._Cremers,_1957_18.jpg M150325640 Klik 1 1957 https://commons.wikimedia.org/wiki/Template:Cc-by-4.0 Klik Copyrights waived or made free https://commons.wikimedia.org/wiki/File:Hindeloopen_vlag_1650.svg File:Hindeloopen_vlag_1650.svg M81840054 Klik 1 1957 https://commons.wikimedia.org/wiki/Template:PD-self Klik Copyrights waived or made free https://commons.wikimedia.org/wiki/File:Handtekening_George_van_den_Bergh.jpg File:Handtekening_George_van_den_Bergh.jpg M89816821 Klik 1 1960 https://commons.wikimedia.org/wiki/Template:PD-signature Klik Not eligible for copyrights due to lack of sufficient originality https://commons.wikimedia.org/wiki/File:Expositie_van_18_jonge_Nederlandse_striptekenaars_in_Kunstcentrum_Lijnbaan%2C_1971.jpg File:Expositie_van_18_jonge_Nederlandse_striptekenaars_in_Kunstcentrum_Lijnbaan,_1971.jpg M112239095 Klik 1 1971 https://commons.wikimedia.org/wiki/Template:Cc-zero Klik Copyrights waived or made free https://commons.wikimedia.org/wiki/File:IJ_with_two_acute_accents_in_Staatsblad_van_het_Koninkrijk_der_Nederlanden%2C_no._394%2C_1996%2C_p._17.png File:IJ_with_two_acute_accents_in_Staatsblad_van_het_Koninkrijk_der_Nederlanden,_no._394,_1996,_p._17.png M129412274 Klik 1 1996 https://commons.wikimedia.org/wiki/Template:PD-text Klik Not eligible for copyrights due to lack of sufficient originality

Section 4: Commonmly made mistakes of the community when aplying PD templtes to Delphetr files

Section 5: Recommendactions to the

Raw data

All data used for the visualisations and analytics in this article is available on Github. You can also download the main Excel file directly.

About the authors

Portrait of Olaf Janssen in 2018.

Logo of the KB, the national library of the Netherlands

Olaf Janssen is the Wikimedia coördinator of the KB, the national library of the Netherlands. He contributes to Wikipedia, Wikimedia Commons and Wikidata as User:OlafJanssen. ORCID: 0000-0002-9058-9941.

Reusing this article

The text and data visualisations of this article have been released under Creative Commons Attribution CC-BY 4.0 license.
Logo of the CC-BY license

Citation: Janssen, O.D. (2025). ‘xxxxxx. https://doi.org/10.5281/zenodo.xxxx.

Attribution: KB, national library of the Netherlands / Olaf Janssen, CC-BY 4.0

Raw data: CC0, so released into the public domain. You may freely use, adapt, and redistribute it.

Identifiers and URLs of this article

Persistent:

Non-persistent: