Free to use? Exploring public domain claims in Wikimedia Commons files sourced from Delpher (April 2025)

^{Olaf Janssen (KB) & Maarten Zeinstra (IPsquared), 22 June 2025}

Logo Delpher

Delpher offers access to millions of digitized pages from Dutch historical newspapers, books, and magazines — a valuable resource frequently used on Wikimedia Commons.
*MZ: Ik mis nog een stellingname. Wat wil je communiceren met het artikel. Dat dit veel maatschappelijke meerwaarde is? Dat er een juridisch risico is? etc.*
In the first part of this article, we examine how the Wikimedia community has assigned public domain status to Commons files that have been sourced from Delpher.
In the second part, we explore the validity of these claims and assess whether they align with the actual copyright status of the works. We identify common mistakes made by the Wikimedia community when applying public domain templates to files. Finally, we examine whether these errors have resulted in any serious copyright violations.

Key figures and findings

The most important key figures and findings of this story are:

kf 1
kf 2
kf 3

Introduction and background

Introductie Delpher

Een introductiezin wat Delpher precies is
Hoeveel records het heeft,
Welke type media het ontsluit en
Wat de onstaansgeschiedenis is.
En ook hoe Delpher zelf PD markeert?

Much of the historical content from Delpher falls into the public domain due to its age and can therefore be uploaded to Wikimedia Commons without much concern. At the same time, the KB — being the operator of Delpher — has contractual obligations towards authors and publishers to monitor potential copyright infringements and to prevent them as much as possible. This includes Delpher content that has been uploaded to Wikimedia Commons by Wikimedia contributors. Due to this obligation content on Delpher can be seen as high quality material that is rightfully marked as public domain.

The KB wants to gain a better understanding of which newspaper articles, books, magazines and other materials from Delpher have been uploaded to Wikimedia Commons, and how public domain claims to those files have been assigned by the Wikimedia community. In doing so, it is important to emphasize that the KB has absolutely no intention to patrol potential infringing copies on Wikimedia Commons. The goal of the study is to improve collaboration with the Wikimedia community to handle copyright matters responsibly, with respect for both the rights of creators and rightsholders and the needs and desires of users.

What this article aims to do

This article aims to

Provide a practical case study of how public domain claims are applied in a real-world open environment — specifically, how Wikimedia Commons contributors handle copyright claims for files sourced from Delpher.
Offer insight into the complexity of public domain claims on Wikimedia Commons — even for the relatively simple case where files originate from a single source (Delpher) from a single country (the Netherlands).
Explore how accurately Wikimedia contributors apply public domain claims, and assess to what extent potential copyright violations may occur — including whether any really serious violations are present.
Share a practical data story of how to machine-analyze and visualize copyright claims for files in (subsets of) Wikimedia Commons using data analysis and visualization techniques.

At the same time, this article is not trying to

Provide a comprehensive overview of all public domain claims on Wikimedia Commons. This article focuses specifically on files sourced from Delpher, which is a manageable subset of the total number of files in Commons.
Provide a formal and/or detailed legal analysis of every public domain claim for these files — such an approach would be far too deep for the scope of this data story.
Identify and flag every potential or small copyright infringement — aside from highlighting a few obvious and illustrative cases mentioned in the Deleting obvious copyright violations paragraph.
Offer recommendations or proposals on how to simplify public domain claims on Wikimedia Commons. This article takes the current public domain landscape “as is”, observing how it functions in practice. Suggesting reform or improvements are out of scope for this article. We do however make some recommendations to stakeholders (see below) in the XXXXXXXXXXXXXXXXXXXXSection 5: Recommendations to stakeholders section, based on our findings.

Who is this article relevant for?

This analysis of public domain template usage on Wikimedia Commons applied to files sourced from Delpher may be of interest to the following stakeholders:

The Wikimedia community – to gain insights into how (accurately) they have implemented public domain copyright templates, especially for Delpher-sourced files.
The Delpher development team and user community – to better understand how a decentralized, international community of content reusers deals with public domain Delpher-sourced materials in a real-world scenario, i.e. on Wikimedia Commons.
Other GLAM institutions with collections on Wikimedia Commons – to explore how this Delpher case study could be replicated for their own Wikimedia Commons files, supported by the freely available code, data and documentation shared via this article.
KB copyright lawyers and the wider legal/copyright community – to see how copyright law and public domain issues play out in a real-world, open, community-driven environment, and to reflect on the practical implications for heritage institutions like the KB.
Rights holders, publishers and collective rights organizations – to assess whether there should be reasons for serious concern about large-scale copyright violations by the Wikimedia community (spoiler: our findings suggest there is little to no cause for such concern).

In the next section we will take a closer look at public domain claims used in files on Wikimedia Commons in general, and in Commons files sourced from Delpher in particular.

Copyrights templates in Wikimedia Commons

Wikimedia Commons is one of the largest open-access media repositories in the world, used daily by Wikipedia and countless other businesses and projects. To protect the open and reusable nature of its content, strict legal rules must be followed for files that are uploaded to Commons:

1) All uploaded files must either be:

Out of copyright — meaning they are in the public domain, either passively because copyrights have expired, or because the rights holders have waived any copyrights on the files, actively releasing them into the public domain, for instance by using a CC0 license.
Freely licensed — under licenses that allow reuse and modification, most commonly CC-BY, CC-BY-SA, or equivalent.

2) These copyright claims must be explicitly and unambiguously added to the file description page. See for instance the public domain claim stated in this portrait made by the Dutch photographer Toni Arens-Tepe (1883–1947).

_{Public domain claim used in the portrait of Jos. Schrijnen on Wikimedia Commons (Click to enlarge).
Image license: CC-BY-SA 4.0 / Olaf Janssen, KB national library of the Netherlands.}

3) These claims are typically expressed through standardized copyright templates (also known as license tags). These templates are meant to ensure clarity, uniformity and standardization when declaring copyright status of files. Templates on Commons can be recognized by the double curly brackets they are called by, for instance:

{{PD-old-70}} — The file is in the public domain because the creator of the underlying work died more than 70 years ago.
{{CC-BY-SA-4.0}} — Creative Commons Attribution-ShareAlike 4.0 license.
{{PD-ineligible}} — The file is in the public domain because it (and/or its underlying work) lacks sufficient originality to be eligible for copyright protection.

The copyright template jungle

But here’s the problem: although the purpose of these templates is to provide a clear and standardized way to declare copyright status, the practical reality is that the license tagging system — built over the years by the international Wikimedia community — has evolved into a very complex beast. The number of different copyright templates in use on Commons is enormous.

To get a sense of this complexity, take a look at this summary of the most common template types or explore this nested overview of several thousands(!) of copyright templates being used on Commons.

Both insiders and outsiders will struggle to find their ways in this system, it can feel like working through a jungle of overlapping licensing options and confusing terminology, undermining the intended simplicity and standardization.

This complexity is not entirely surprising. Wikimedia Commons accepts media in many formats (image, document, audio, video etc.) from any country, any jurisdiction and any historical period, and must therefore be able to handle the copyright rules and exceptions from dozens of legal systems worldwide. The elaborate system is necessary because Wikimedia Commons is a global, evolving platform. Templates are regularly added or updated as contributors find new sources for uploads or as local copyright regulations change.

Zooming in: public domain templates

To somewhat trim down this jungle, we can narrow the scope and look only at public domain templates, used for files that are out of copyright. Yet even within this limited scope, things remain complicated, as the number and variety of such templates is still pretty large.

The general public domain templates page provides an overview of more than 70 templates based on general criteria, not tied to a specific country or source of the work. The complexity becomes more apparent when examining the Category:PD license tags and its subcategories. These include numerous country-specific public domain templates, each reflecting the legal nuances of copyright legislation in the country of origin.

Adding to this complexity is a crucial requirement: Every file on Wikimedia Commons must also include a justification for its public domain status under U.S. law. This requirement arises from the fact that Wikimedia’s servers are located in the United States. Therefore, all hosted content must comply not only with the copyright laws of the country of origin but also with those of the U.S., which can be particularly intricate and often differ substantially from other jurisdictions.

Kader vor Maarten:
aaaaa bbbbb
ccccccccccccc

dddddddd

In practice, this means that many Commons files require multiple templates:

One or more templates describing the copyright status in the country of origin;
An additional template confirming the file’s public domain status in the United States.

Zooming in further: templates used in Delpher files

For the purposes of this article, we aim to narrow the scope even further. We are interested only in public domain copyright templates used for files sourced from Delpher, the Dutch platform providing access to millions of full-text pages from Dutch historical newspapers, books, and magazines. Delpher is a frequently used resource for illustrating Wikipedia articles and for uploads to Wikimedia Commons.

By limiting our focus to this single source, our dataset and analysis become relatively straightforward: we are primarily dealing with materials from one provider (Delpher) and largely from one country (the Netherlands). Nevertheless, as we will explore below, there remains sufficient complexity to make this investigation both meaningful and nuanced.

_{A (non-exhaustive) collage of screenshots of public domain template description pages, as used in Wikimedia Commons files that have been sourced from Delpher (Click to enlarge).
Image license: CC-BY-SA 4.0 / Olaf Janssen, KB national library of the Netherlands.}

Creating the dataset

To examine how the Wikimedia community has assigned public domain status to Commons files sourced from Delpher, we first needed a robust and reliable dataset. Let’s look at the steps we took to create it.

The Commons community has been uploading newspaper articles, advertisements, obituaries, book pages, portraits from magazines, and other materials from Delpher (and its predecessor projects) to Wikimedia Commons since March 2008. Because these files were originally scattered across Commons without consistent categorization, the first step was to bring them together into a single, central place: Category:Media from Delpher. This category currently contains just over 62K files.

Delpher source template

Adding a {{Delpher}} source template to all of these files (example) allows us to visually and textually communicate that Delpher is the source of these files. It also automatically includes the files into the Delpher category. This in turn allows researchers to investigate the dataset and the uploaded media files.

_{A screenshot of the rendered Delpher source template on Wikimedia Commons.}

Excluding scans from the Internet Archive

As you can see in the category, a significant part is filled with files claimed to be uploaded from the Internet Archive, but that find their real origins in Delpher. These are the PDFs with IA ddd …mpeg21 in their titles (example). In total there are 55,761 files from the Internet Archive (d.d. 9 April 2025) that were originally sourced from Delpher.

XXXXXXXXXXXXX MZ: dit mag nog wel wat meer uitleg. Hoe komen ze van Delpher naar IA en wat voor markering hebben ze daar?

All of these files are marked with the {{PD-old-70-expired}} copyright template, which means that they are safely in the public domain in the Netherlands (and its predecessors), the rest of the EU and the United States.

Because such a large part - 89.6% of the files - in the Category:Media from Delpher come via the Internet Archive, we decided to exclude all of them from our further analysis. And because they are marked with the exact same copyright template, including them would make our analysis too biased (or skewed) towards these files and templates.

Extracting copyright templates

This left us with 6,496 ‘non-Internet Archive’ files from Delpher. For these files we wanted to detect the associated copyright templates. Assisted by ChatGPT, we developed a (rather monsterous) Python script to extract public domain or public domain-like (e.g. Creative Commons) license templates . As this script was not 100% perfect, we needed to do some manual post-processing to clean up the data.

Excluding files without publication/creation dates

As we plan to assess the validity of copyright claims against the actual publication or creation dates of the underlying works, we also designed the script to extract simplified date information. Files that provided no publication or creation dates were excluded from further analysis. We will discuss the date extraction process in more detail in SECTIONXXXXXXXXXXXXXXXXXXX.

Deleting obvious copyright violations

After the extraction of templates and associated dates, we did a preliminary scan to identify obvious instances of copyright infringement, which we wanted to exclude from our dataset. Specifically, we examined content published within the last 70 years (post-1955) that were nonetheless marked as public domain or Creative Commons-licensed. This process led to the identification of four copyvio files for which we subsequently submitted deletion requests to Wikimedia Commons administrators:

An article from the Dutch newspaper De Telegraaf from 1985, still protected by copyright, as it was published less than 70 years ago. It cannot have a CC0 license. XXXXXXXXXXXXX MZ: waarom niet?
An article from the Dutch newspaper Trouw from 1974. Still under copyright, as it was published less than 70 years ago. We must assume that the copyright is held by the newspaper publisher, unless proven otherwise.
An article from the Dutch newspaper Algemeen Dagblad from 1966. Still under copyright, as it was published less than 70 years ago. We must assume that the copyright is held by the newspaper publisher, unless proven otherwise. Furthermore, it cannot have a CC0 license.
The text Het Binnenhof en Het Vaderland from 1956 by Duco Wilhelm Sickinghe. According to Dutch copyright law, this article is still under copyright, as the author died in 1983 and the article was published less than 70 years ago. So we must assume that the copyright is still with the (heirs of the) author or with the newspaper publisher, unless proven otherwise. Furthermore, it cannot have a CC-BY license.

All deletion requests were granted immediately and the files were deleted quickly.

The final dataset

In the end, we were able to retrieve 6,248 distinct files that contained (one or more) copyright templates (6,329 in total), as well as a publication or creation date. This is the XXXXXXXXXXXX dataset used in our further analysis. XXXXX FIX LINK XXXXXXXX

Why are Delpher sourced files in the public domain, according to Wikimedia Commons?

One of the main reasons to write this data story is to gain a better understanding of which Delpher-sourced materials have been uploaded to Wikimedia Commons and how public domain claims to those files have been assigned by the Wikimedia community. What insights can be gained from the dataset?

Five main reasons for public domain classification

For this we first needed to group the copyright templates according to their underlying rationale for placing files in the public domain. This resulted in five distinct main reasons:

Copyrights expired because of age: This is the most common reason, because the underlying work is too old to carry copyrights. Its digital reproduction (2D scan, photo) is generally also considered to be in the in the public domain.
Example templates: {{PD-old-70}} or {{PD-old-70-expired}}.
Copyrights waived or made free: For files that have been released into the public domain or under free licences by their creators or rights holders.
Example templates: {{CC-zero}} or {{CC-BY-SA-4.0}}.
Note: for the readability and flow of this article, we will not make further distinctions between files that were given CC0(-like) templates and the (very limited number of) files that were given CC-BY or CC-BY-SA copyright claims. For the purposes of this article, these are all considered to be part of the public domain.
Government work, not subject to copyright: For files that are created by government employees in the course of their official duties, which are not subject to copyright protection in many jurisdictions, including the Netherlands and the United States.
Example template: {{PD-DutchGov}}.
Not eligible for copyrights due to lack of sufficient originality: This includes files that are not eligible for copyright protection because they lack sufficient originality, such as simple logos, signatures, or other works that do not meet the threshold for copyrightability.
Example templates: {{PD-ineligible}} or {{PD-textlogo}}.
Other reasons: For files that are in the public domain for other reasons, such as being published before the introduction of copyright laws, or because they are not eligible for copyright protection for other reasons.
Example template: {{PD-because}}.

If we break down the data, we see that our 6,248 files are classified into the public domain by 6,329 templates for the following reasons:

Reason for public domain classification (NoCopyrightReason)	Number of template usages	Percentage
Copyrights expired because of age	6,191	97,8%
Copyrights waived or made free	97	1,5%
Government work, not subject to copyright	20	0,3%
Not eligible for copyrights due to lack of sufficient originality	18	0,3%
Other reasons	3	0,03%
Total	6,329	100%

These results are also visualized in the donut chart below. For instance, the files in the blue sector use 6,191 templates indicating they are in the public domain because the historical newspapers, books and magazines they were sourced from, are too old to carry copyrights. Please note that a single file can contain multiple out-of-copyright claims, see this example.

39 distinct public domain claims

Next, let’s look in more detail at the templates that are in each of the five categories. It turns out that Delpher-sourced files use a total of 39 distinct templates to communicate their public domain status. These are detailed in the table below, where they are sorted and grouped by the reason why they categorise files into the public domain (NoCopyrightReason). Each group is color-coded for clarity. You can click on the names of the templates in the first column to view their description pages on Wikimedia Commons.

TODO: add static version of this chart for PDF

Usage of public domain templates

We can also look at how often each of the 39 copyright template is used. The bar chart below shows the number of usages for each template, grouped and color-coded by the reason why the files are in the public domain, as explained before. The total number of template usages is 6,329 across 6,248 distinct files.

We can for instance see that the template {{PD-anon-70-EU}} is used most frequently, 2,044 times, to indicate copyright has expired in (among others) the EU and the author’s identity was never disclosed. The second most used template is {{PD-old-70-expired}}, which is used 1,329 times to indicate that the author died more than 70 years ago and the work was first published in the US more than 95 years ago.

Zooming in: copyrights expired because of age

Let’s now zoom in on the blue bars, that is on the copyright templates for files that are in the public domain because of age. This group represents 24 templates, used 6,191 times in 6,114 distinct files.

The table below shows the number of usages for each template, similar to the previous bar chart. Where possible and applicable, it also lists the specific expiration period(s) implied by the template, as well as any remarks. Copyrights on a work can expire when enough time has passed since

the author died: we see values of 100 years (eg. in Mexico), 70 years (eg. in the EU), or N/A because the author’s year of death is unknown, or because the author is unknown for anonymous or pseudonymous works;
the work was first published: we see values of 120 years (for collective works), 95 years (eg. in the US), 70 years (eg. in the EU for anonymous or pseudonymous works) and 50 or 25 years in Indonesia;
the work was created: we see values 120 years (if author’s death date is unknown, or for unpublished US works) or N/A (for anonymous or pseudonymous works).

Finally, when we look at the Remarks column, we can identify groups of templates for anonymous or pseudonymous works, for faithful digital reproductions of 2D public domain (art)works, and for Indonesian copyright laws.

Overall, this table makes clear that for Delpher-sourced materials in Wikimedia Commons, there are a lot of variables and jurisdictional differences at play when it comes to the expiration periods of copyrights. As a result of this, for Wikimedia contributors it is not always straightforward to determine which template(s) is best suited for which file. Suboptimal choices or even mistakes can easily be made, even by experienced Wikimedians who contribute with best intentions.

In the second part of this article, we will explore how accurately contributors have applied public domain templates, and assess to what extent any serious copyright violations have occurred.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

H2 kop - Section xx - Compliance of the community to the copyrights statements

Are there any violations of big mistakes?

for the “pd because of age group” (98% of uses) we will look at the year in which the orginal work was published or created (column F “DateOfPublicationOrCreation” in the Excel)

The years of publication or creation have been extracted from the Wikitext, typically the “Date-field”
for most files the date could be extracted as a single year, for instance files taken from newspapers that were published in 1863, 1926 or 1952.
In case this date was not a single year the latest/most recent year has been taken, with a safety margin where possible. For instance 1920s (–> 1930) , before 1880 (–> 1880) or circa 1949(–> 1949).

Interesting cases to study, in the Excel

1. Files from publications published or created in 1955 or later, that are marked as public domain

File	Year of publication or creation	License Template	Copyright Status	Remarks
Proclamatie1955-Amigoe.jpg	1955	PD-anon-70-EU	Copyrights expired because of age	aa
1957 Foto-album van burgemeester P.M.J.S. Cremers, 1957 18.jpg	1957	Cc-by-4.0	Copyrights waived or made free	aa
Hindeloopen vlag 1650.svg	1957	PD-self	Copyrights waived or made free	aa
Handtekening George van den Bergh.jpg	1960	PD-signature	Not eligible for copyrights due to lack of sufficient originality	aa
Expositie van 18 jonge Nederlandse striptekenaars in Kunstcentrum Lijnbaan, 1971.jpg	1971	Cc-zero	Copyrights waived or made free	aa
IJ with two acute accents in Staatsblad van het Koninkrijk der Nederlanden, no. 394, 1996, p. 17.png	1996	PD-text	Not eligible for copyrights due to lack of sufficient originality	aa

TODO: Make Datawrapper for this table

2. Files classified “Copyrights waived or made free”

Section 4: Commonly made mistakes of the community when applying PD templates to Delpher files

Commons mistakes:

fail to include separate template for PD in the US, see for instance this file that is marked with the {{PD-old-70}} template, but lacks a US-specific template such as {{PD-US-expired}} template.
Apply Creativ Commons licenses for fiels that arein the pblic domain becuase copyright have expioered. The case of CC0 where normally you would use PDM, to make sure any right are waived.

Section 5: Recommendations to stakeholders

Wikimedia community to improve the copyright templates and their usage –> avoind commn misyakes, see above
Delpher/ KB team: do not worry, the community is doing a good job in applying the copyright templates to Delpher sourced files.
KB copyright lawyers: same as above
CBO’s & publishers : same as above

Quote 1

Quote 2

Raw data

All data used for the visualisations and analytics in this article is available on Github. You can also download the main Excel file directly.

About the authors

Portrait of Olaf Janssen in 2018.

Logo of the KB, the national library of the Netherlands

Olaf Janssen is the Wikimedia coördinator of the KB, the national library of the Netherlands. He contributes to Wikipedia, Wikimedia Commons and Wikidata as User:OlafJanssen. ORCID: 0000-0002-9058-9941.

Reusing this article

The text and data visualisations of this article have been released under Creative Commons Attribution CC-BY 4.0 license.
Logo of the CC-BY license

Citation: Janssen, O.D. (2025). ‘xxxxxx. https://doi.org/10.5281/zenodo.xxxx.

Attribution: KB, national library of the Netherlands / Olaf Janssen, CC-BY 4.0

Raw data: CC0, so released into the public domain. You may freely use, adapt, and redistribute it.

Identifiers and URLs of this article

Persistent:

DOI (Zenodo): https://doi.org/10.5281/zenodo.xxx
Wikimedia Commons: https://commons.wikimedia.org/entity/xxxx

Non-persistent:

Github: https://kbnlwikimedia.github.io/xxxx.html

This site is open source. Improve this page.

wikimedia-commons_copyright-templates