Wikidata & KB national library of the Netherlands, an overview
A (non-exhaustive) overview of how Wikidata is used by/in/for both the linked open datasets (thesauri) and public domain heritage collections of the KB, national library of the Netherlands.
Latest update: 21 November 2023
This page is a textual summary of the course Verdieping: Wikidata & de KB for employeees of KB, national library of the Netherlands on 14 November 2023, 15:00-16:15.
See also
Contents
Table of contents generated with markdown-toc
Intro
Required basic knowledge about Wikidata
[](https://commons.wikimedia.org/w/index.php?title=File:Wegwijzer_in_Wikidata,_Introductiecurus_Wikidata_-_Koninklijke_Bibliotheek,_6_juni_2023.pdf)
See the course Wegwijzer in Wikidata (Introduction to Wikidata), June 6, 2023 (in Dutch)
Course objectives
To provide more understanding about
- Why we use Wikidata at KB
- How we use Wikidata for KB thesauri & heritage collections
- What value this adds for KB
Course layout
BLOCK 1 - What does Wikidata add for the KB?
Open doors
(Captain Obvious mode) For KB & its services: Be findable in Google - Be present on Facebook - Be present on Instagram - Be present on YouTube - Be present on Twitter. –> Summary (open door): Be present on the large (web-scale) platforms
So also open doors:
- Add your collection knowledge to Wikipedia
- Add your collection images to Wikimedia Commons
- Add your collection data to Wikidata
Wikidata characteristics
Wikidata is one of the largest and most popular LOD platforms in the world.
Characteristics:
- Central part of the (web-scale) Wikimedia infrastructure (Wikipedia, Commons, 700+ Wikimedia platforms)
- Free, public utility for data (no IT costs)
- Centralized, no data silos, 1 language (w.r.t. SPARQL and API calling)
- Global scope, (much) broader than KB/library/heritage/Netherlands domain
- Connection point for 8330+ external databases worldwide
- Multilingual, language independent, 300+ languages
- Collaborative –> International community, 25K+ content creators
- For humans (GUI) and machines (API, SPARQL, JSON, RDF, Python etc.)
- LOD, the least scary of all LOD platforms –> Understandable & warm, thanks to community!
- No copyright on data (CC0)
- Strong growing, positive outlook & sustainable
Effective result: advantages of scale and community & network effects
Added value of Wikidata for KB
What values does Wikidata add for the KB & its services?
- Increased visibility, findability and reusability of our collections
- Greater public reach of KB collections, worldwide
- KB data in cross-domain, global, multilingual context –> Increasing interoperability KB with the outside world
- Community: External expertise, skills, tools and enthusiasm to enrich & connect KB data
- New functionalities for our data (and images) –> See block 4
- Functionalities that we do not or cannot offer in our own KB services
- Regarding Search, Data enrichment, data quality control, data visualization and data formats, Image metadata, Machine interactions
- Both for our thesauri and heritage collections
- For people and machines
- ‘KB collections as LEGO’
- Toolkit & platform to create and publish new KB LOD
- Internal KB LOD renewal process is not yet delivering public results
- Developing and sharing knowledge & skills related to LOD
- Both internally and externally
- Strengthening our cooperation with KB network partners via Wikidata/media
BLOCK 2 - Wikidata & KB thesauri (NTA + DBNLa)
KB datasets (thesauri): http://data.bibliotheken.nl/
Criteria for suitability of KB thesauri for Wikidata
- Persons (authors) are more popular and in demand on Wikidata than (eg.) keywords or organizations
- NTA is internationally the only major authoritative dataset on ‘Dutch authors’
- NTA is very useful for Wikidata, in an international context
- Flat/simple data is more suitable than layered/complex data
- People are easier to add to Wikidata than titles (WEMI = complex)
- Small datasets are easier than large ones
Ergo: Focus on NTA and DBNL authors with regard to the KB thesauri-Wikidata activities.
a) From the NTA to Wikidata
Persons in the NTA with a Wikidata URI:
- Eg. Darlene Dixon : http://data.bibliotheken.nl/doc/thes/p208140131 –> schema:sameAs –> http://www.wikidata.org/entity/Q88505402
- All persons via this SPARQL query in http://data.bibliotheken.nl:
# Which NTA items have a link to Wikidata?
SELECT * WHERE {
?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
?nta rdfs:label ?ntaLabel.
?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
} LIMIT 1000
- 499K of 2.75M NTA items have a Wikidata link (source)
b) From the DBNLa to Wikidata
Persons in DBNLa with a Wikidata URI (via the NTA)
- Eg. Hans Aarsman (1951-) : http://data.bibliotheken.nl/id/dbnla/aars001 –> owl:sameAs –> http://data.bibliotheken.nl/id/thes/p068680937 –> schema:sameAs –> http://www.wikidata.org/entity/Q325922
- All persons via this SPARQL query in http://data.bibliotheken.nl:
# Which DBNLa authors have a link to Wikidata?
SELECT *
WHERE {
?dbnl schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/dbnla> .
?dbnl rdfs:label ?dbnlLabel.
?dbnl owl:sameAs ?nta .
?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
?nta rdfs:label ?ntaLabel.
?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
} LIMIT 1000
- 14.5K of 109K DBNLa items have a Wikidata link (source)
- Achille Van Acker ‘acke001’ in the DBNLa and in Wikidata
- Get additional data about ‘acke001’ from Wikidata. We want to retrieve the following data from the Wikidata item:
- Image (P18) – Educated at (P69) – Member of political party (P102)
- We use this SPARQL query in https://data.bibliotheken.nl:
# Get supplementary data about DBNL author 'acke001' from Wikidata
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT *
WHERE {
?dbnl schema:mainEntityOfPage/owl:sameAs <http://data.bibliotheken.nl/doc/dbnla/acke001> .
?dbnl rdfs:label ?dbnlLabel.
?dbnl owl:sameAs ?nta .
?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
?nta rdfs:label ?ntaLabel.
?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
SERVICE <https://query.wikidata.org/sparql> {
?wikidata wdt:P18 ?imageURL. #P18 = image
?wikidata wdt:P69 ?edcucatedAt. #P69 = educated at
?wikidata wdt:P102 ?MemberOfPoliticalParty. #P102 = member of political party
}
}
Checks are OK:
c) From Wikidata to the NTA - P1006
Persons in Wikidata with an NTA id
- P1006 = Nationale Thesaurus voor Auteursnamen ID
- Eg. Harry Mulisch : https://www.wikidata.org/wiki/Q927#P1006 –> P1006 – > https://data.bibliotheken.nl/doc/thes/p06854796X
- All persons via this SPARQL query
SELECT ?item ?itemLabel ?NTAurl
{
?item wdt:P1006 ?NTAid.
BIND(IRI(CONCAT('http://data.bibliotheken.nl/doc/thes/p', ?NTAid)) AS ?NTAurl)
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" }
}
LIMIT 1000
Insights in the usage of P1006
https://www.wikidata.org/wiki/Property_talk:P1006
- Wikidata contains 550K links to the NTA: see ‘Current uses’ at bottom of this page, or via this SPARQL query
- Map of birthplaces of people with an NTA id: https://w.wiki/7rsT
- Famous people with an NTA id: https://w.wiki/85si (famous people have extensive Wikidata entries) with many statements
P1006 and data quality
Two pages provide insight into the data quality (and possible improvements) of both Wikidata and the NTA
For example:
- Missing birth date
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Humans_with_missing_claims/P1006#Missing_date_of_birth_(P569)
- The missing dates of birth may be added to Wikidata from the NTA
- Missing Dutch labels
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P1006#%22Label_in_‘nl’_language%22_violations
- Via SPARQL: https://w.wiki/85xT
- E.g.: Anna Bhau Sathe, https://www.wikidata.org/wiki/Q55759 –> NL label is missing
- The missing NL label can be added from the NTA
- The same NTA id appears in multiple Q items
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P1006#Unique_value
- Via SPARQL: https://w.wiki/85zm
- E.g. Andreas Kaiser: https://data.bibliotheken.nl/doc/thes/p068685564 appears in both Q498631 (error) and in Q106361537 (good)
- Q498631 should get a different value at P1006
- One Wikidata item with multiple NTA ids
- https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P1006#Single_value
- Via SPARQL: https://w.wiki/85$o
- E.g. Douglas Adams : Q43 contains both http://data.bibliotheken.nl/doc/thes/p339433876 and http://data.bibliotheken.nl/doc/thes /p068744307
- These NTA items are almost identical, consider merging into the NTA
Usage of NTA ids in Wikipedia articles
Wikidata: Category:Articles with NTA identifiers
- English Wikipedia: 234K articles on WP:EN have NTA ids. E.g. https://en.wikipedia.org/wiki/50_Cent –> http://data.bibliotheken.nl/id/thes/p262032139
- Turkish Wikidia: 25K articles on WP:TR have NTA ids
- Czech Wikipedia: 36K articles on WP:CS have NTA ids
- Japanese Wikipedia: 51K articles on WP:JA have NTA ids
In summary: via Wikidata the NTA is used as an authority in 100,000 Wikipedia articles in many languages. (but not Dutch!)
Summary for NTA/P1006
- By integrating NTA data into Wikidata we get a lot of new functionalities regarding data quality, connections and visualization that we cannot offer via our own KB-LOD service data.bibliotheken.nl! Also Wikipedia is having advantage from the NTA!
-
- Project to include NTA in Wikidata and v.v. : WikiProject Dutch National Thesaurus for Author Names
d) From Wikidata to the DBNLa - P723
Persons in Wikidata with an DBNLa id
- P723 = Digitale Bibliotheek voor de Nederlandse Letteren author ID
- Eg. Harry Mulisch : https://www.wikidata.org/wiki/Q927#P723 –> P723 – > http://www.dbnl.org/auteurs/auteur.php?id=muli002
- All persons via this SPARQL query
SELECT ?item ?itemLabel ?DBNLaUrl
{
?item wdt:P723 ?DBNLaId.
BIND(IRI(CONCAT('http://data.bibliotheken.nl/id/dbnla/', ?DBNLaId)) AS ?DBNLaUrl)
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" }
}
LIMIT 1000
Insights into the usage of P723
https://www.wikidata.org/wiki/Property_talk:P723
- Wikidata contains 31K links to the DBNLa: https://www.wikidata.org/wiki/Property_talk:P723 (bottom, ‘Current uses’)
P723 and data quality
Two pages provide insight into the data quality (and possible improvements) of both Wikidata and the DBNLa
Historical metrics of Wikidata and NTA & DBNLa
Historical metrics of the usage of NTA and DBNLa identifiers in Wikidata, and v.v.: https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Koninklijke_Bibliotheek_en_Nationaal_Archief/Resultaten/KPIs/KPI10#Historische_ontwikkeling_van_KPI_10
Look at File:Atlas_de_Wit_1698-pl017-Leiden-St_Pancraskerk.jpg on Wikimedia Commons (=Saint Pancras Church in Leiden, now called Hooglandse Kerk )
- Manifest textual and visual KB source references
- ‘Manual’ multilingualism of the title in Latin, Dutch and French
- Source code appears to be structured, but really is unstructured metadata (free text)
- Tab ‘Structured Data’
Structured Data on Commons
Structured Data on Commons (SDoC) is a project to add multilingual structured information from Wikidata to files on Wikimedia Commons that can be understood by humans, with enough consistency that it can also be uniformly processed by machines.
Added value of SDoC
- Images are linked to Wikidata
- Images are provided with real structured (and therefore machine-readable) data
- Linked open data for Commons files is created, files become part of the LOD cloud
- Not only for images, eg. see the structured data on this PDF file
- Files are made searchable via SPARQL
- For KB: Structured 5* LOD metadata for 31,348 KB files
SPARQL queries for KB images
What is depicted on KB images?
Let’s summarize: KB images on Commons are searchable in 3 ways
1) Via regular metadata (= free text search)
2) Via structured metadata
3) By content (What is depicted in KB images?)
The (super handy!) tool Hay’s Structured Search offers all three options. It is a visual, multilingual search engine to find images with (and without) structured data in Wikimedia Commons.
In summary: The search functionalities shown (SPARQL, structured search, multilingual search, search by content) are much more advanced than the propriatary KB (image) services such as Het Geheugen!
This manual from 2020 explains step by step how to make images from the KB collection more discoverable, visible and reusable by indicating (tagging) which things (entities) can be seen on those images. This is done by connecting Wikidata items to those things. Available on Wikimedia Commons and Zenodo
Results per 1 november 2023
BLOCK 4 - Wikidata & KB heritage collections
Examples of KB heritage collections: Medieval manuscripts - Maps and atlases - Armorials - Alba amicorum - Catchpenny prints - Children’s picture books - Flora and fauna books
Criteria for suitability of KB heritage collections for Wikidata
1) Collection highlights, canonical objects: The most important objects of the KB must be present on Wikidata (and Commons)
2) Copyright free objects: Public domain = no hassle with copyright
3) Limited collection size: 10-100s of images are easier to process than 10-100Ks
4) Visually rich collections: What is depicted on the images, see Block 3
5) Connectable to other things: Making semantic links between the KB collections and persons, places, events etc. described in Wikidata
6) Collections consisting of similar, unique objects with narrow, flat, well-defined data models/classes: Similar values for instance of and/or subclass of. Not OK: hetereogenous ephemera.
WikiProject KB Collection highlights (2020-present)
KB collection highlights are part of our national heritage, just like e.g.
Presentation of collection highlights on kb.nl
Collection highlights on (previous) KB website from Febr 2020
Typical presentation of collection highlights on kb.nl, for instance for Atlas Ortelius 1571
1) Catalog record –> Metadata
2) Hi-res flip book –> Images
3) Contextual article –> Stories, context
This presentation on kb.nl has limited functionalities and reuse options. This presentation represents an old way of thinking: Collection highlights (on kb.nl) are only for reading and viewing, inviting for passive consumption. More explanation in this article.
A new paradigm for collection highlights
A new way of thinking:
- KB collection highlights are building blocks and invite for active reuse and creation.
- Building blocks for tech community: Developers, app builders, tech companies, AIs, digital humanities, data scientists, hackathons, Wikimedia communities, LOD world, NDE, Europeana etc.
- KB collection highlights as a toolbox of Technical LEGO
- Contents of this toolbox: Eg. 5-star Linked Open Data - Automatic image recognition (AI) - Semantic tagging - Data dumps & bulk downloads - SPARQL - Images searchable by content - Data visualizations - Python - Machine-readable data - Flexible REST APIs - Manifest legal terms - IIIF - Data as JSON, XML, CSV - Automatic multilingualism - External LOD Identifiers
- All these building blocks are available in the Wikimedia infrastructure: the combination of Wikidata (for metadata), Wikimedia Commons (for images) and Wikipedia (for contextual stories) - and their associated international communities - providing a coherent technical and social infrastructure to make KB’s collection highlights much more visible, findable and reusable.
Wikification of KB collection highlights
Wikifying KB’s collection highlights
E.g. Atlas Ortelius:
1) Catalog record KB –> Metadata to Q67465742 on Wikidata, with collection = Koninklijke Bibliotheek, and qualifier subject has role = collection highlight
2) Hi-res flip book KB –> Images to Atlas Ortelius 1571 on Wikimedia Commons
3) Contextual article KB –> Context to Theatrum Orbis Terrarum on Dutch Wikipedia
The WikiProject KB Collection highlights (2020-present) aims to improve the findabilty, visibilty and reusability of KB’s collection highlights for both humans and machines by
- creating and improving the Wikidata descriptions for all digitised KB collection highlights,
- uploading their public domain images to Wikimedia Commons, reusing data from Wikidata as much as possible to create image metadata
- creating and improving the Wikipedia articles about them on Dutch and English Wikipedia
Result of the project: All cool and value adding functionalities, tools and community capacities of the Wikimedia infrastructure are now available for our KB collection highlights. The party can start!
50 cool new things you can do now with KB’s collection highlights
The party can start, let’s build cool new things! –> See the article 50 cool new things you can do now with KB’s collection highlights
In this series of 5 articles we show the added value of putting images and metadata of digitised collection highlights of the KB, national library of the Netherlands, into the Wikimedia infrastructure. By putting our collection highlights into Wikidata, Wikimedia Commons and Wikipedia, dozens of new functionalities have been added. As a result of Wikifying this collection in 2020, you can now do things with these highlights that were not possible before.
This article has 5 parts:
Examples
- All functionalities for KB images regarding SPARQL, structured search, multilingual search, search by content, as explained in Block 3
- Gallery of KB collection highlights on Dutch Wikipedia (never mind the new WP layout!)
- Persons/roles involved in each collection highlight
- Contributors to the Album Jacob Heyblocq
- Works by these contributors in DBNL
- Works by these contributors elsewhere, via Europeana, as Excel: See for example Govert Flinck on Europeana + this explanation, see Point 48
Questions or remarks can be sent to Olaf Janssen, Wikimedia coordinator of the KB - olaf.janssen@kb.nl - @ookgezellig
Reuse and licensing
This overview can be reused freely and openly, it is available under the CC-BY 4.0 license, so attribution is required. Use something like
Wikidata & KB national library of the Netherlands, an overview, Olaf Janssen & KB national library of the Netherlands, https://github.com/KBNLwikimedia/Wikidata-KB-Overview
