KBCollectionHighlights

Banner

50 cool new things you can now do with KB’s collection highlights - Part 5, Reuse

Latest update 16-06-2021

In this series of 5 articles I show the added value of putting images and metadata of digitised collection highlights of the KB, national library of the Netherlands, into the Wikimedia infrastructure. By putting our collection highlights into Wikidata, Wikimedia Commons and Wikipedia, dozens of new functionalities have been added. As a result of Wikifying this collection, you can now do things with these highlights that were not possible before.

In the previous (fourth) part of this series I discussed 11 tools of the right hand knife. We looked at which new functionalities have become available for individual highlight images. We talked about the ability to download images in multiple resolutions, file level descriptive metadata with manifest attributions and copyrights status, geo coordinates linking images to various map services, linking images to Wikidata, as well as enabling multilingual search by content (What is depicted in the images?)

In this fifth part I am going to unfold the last group of tools. I am going to illustrate how you can programmatically reuse KB’s collection highlights, for instance for/in your own websites, services, apps, hackathons and projects. I’m going to talk about SPARQL, APIs, Python scripts, JSON, XML, image bulk downloading and machine interactions with our highlights. Cool LEGO Technic® blocks for KB’s target group of developers, app builders, digital humanists, data scientists, LOD afficionados and other nice nerds.

I’ll try to follow the same order as in Part 2 , 3 and 4, so

I’ll illustrate how you can retrieve the same images, data and texts we requested via the GUI (so in HTML) in these previous parts, but now in their raw, machine readable formats (JSON, XML etc.) using Wikimedia’s APIs and SPARQL services. This will give you more control & flexibilty over the exact outputs, custom made for your needs.

Reuse - all highlights

38) Let’s start with recreating the image grid we started out with in Part 2 using the Wikidata SPARQL query service. A short SPARQL query does the job:

   # Thumbnail gallery of KB collection highlights
   #defaultView:ImageGrid
   SELECT DISTINCT ?item ?itemLabel ?image WHERE {
     # the thing is part of the KB collection, and has role 'collection highlight' within that collection
     ?item (p:P195/ps:P195) wd:Q1526131; p:P195 [pq:P2868 wd:Q29188408]. 
     OPTIONAL{?item wdt:P18 ?image.}
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
   } ORDER BY ?itemLabel

This query results into a SPARQL driven thumbnail gallery of KB highlights.


The image grid of KB highlights for the above SPARQL query. Screenshot Wikidata query service d.d. 23-04-2021


39) Next, let’s look at lists and tables. The list of highlights on the KB website is only availabe as HTML. For effective reuse you’d prefer it in a structured and open format such as JSON, XML or RDF. Let’s look how we can request structured lists of KB highlights, both simple and more elaborate from the Wikidata query service:


40) You might want to programatically check for Wikipedia articles about KB highlights, for instance in Dutch, using this query:

   #Articles about KB collection highlights on Dutch Wikipedia
   select ?item ?itemLabel ?articleNL where {
   ?item (p:P195/ps:P195) wd:Q1526131; p:P195 [pq:P2868 wd:Q29188408]. 
   OPTIONAL {
     ?articleNL schema:about ?item.
     ?articleNL schema:isPartOf <https://nl.wikipedia.org/>.
   }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,nl". }
   }

As before, the results can be requested in JSON and in XML as well.

Reuse - individual highlights

41) In Part 3 we looked at individual pages (14), double page openings (15) and miniatures/page details (16) that are available for public domain highlights. Let’s see how we can request image URLs from the Wikimedia Commons query API using this documentation.

	#!/usr/bin/python3
	import requests, json
	S = requests.Session()
	URL = "https://commons.wikimedia.org/w/api.php"
	PARAMS = {
	   "action": "query",
	   "gcmtitle":"Category:Chroniques_de_Froissart,_vol_1_-_Den_Haag,_KB_:_72_A_25_(details)",
	   "gcmlimit":"500",
	   "generator":"categorymembers",
	   "format":"json",
	   "gcmtype":"file",
	   "prop":"imageinfo",
	   "iiprop":"url"
	}
	R = S.get(url=URL, params=PARAMS)
	DATA = R.json()
	PAGES = DATA['query']['pages']
	print(json.dumps(PAGES, indent=2))

Running this script in my Python IDE gives the following output:


Titles and URLs of miniatures from Chroniques de Froissart formatted as JSON. Screenshot Pycharm IDE, d.d. 29-04-2021


42) If you don’t want to use the Wikimedia Commons API for getting image URLs, no problem, there a some readily available bulk image download tools for obtaining the hires image URLs and/or the hires images themselves from a specific KB highlight category on Wikimedia Commons.

Using the Wiki Loves Downloads tool you can easily get all the direct download URLs of the hires images of eg. the Reward letter of King Filip II of Spain to family of Balthasar Gerards, 1590. Because this tool was developed by Wikimedia Deutschland, the default user interface is in German. We can use Google Translate to get a English user interface for international audiences. As stated in the tool, it divides the images of a category from Wikimedia Commons into a desired number of lists and generates these in the form of (zipped) text files with links to the respective images so that they can be downloaded with the help of a download manager (or a script).


Getting the direct download URLs of the hires images of the Reward letter of King Filip II of Spain to family of Balthasar Gerards, 1590. Screenshot Wiki Loves Downloads with translated English user interface, d.d. 11-05-2021

If you prefer the images themselves rather than only the URLs, the Java based Imker tool is the way to go. It downloads all images from a specific category (or page) on Wikimedia Commons (or any other Wikimedia site) to your local machine.


Downloading the hires images of the Reward letter of King Filip II of Spain to family of Balthasar Gerards, 1590. Screenshot of the Imker tool, d.d. 11-05-2021


43) Every KB highlight is described by a Wikidata item (Qnumber). Let’s see how we can request highlight information from the Wikidata API directly from that Qnumber. We can use the wbgetentities action for that.


44) An alternative way is to request full Wikidata items directly from the Qnumber via a Special:EntityData URL. The ouput can be obtained in no fewer than seven different formats:

For exploring the JSON response in further detail we can tweak this Python script that Matt Miller from the Library of Congress explains in Demo: Programmatic Wikidata from his YouTube series Programming for Cultural Heritage.

For instance, we can make a list of all Wikidata properties that are used in Q16641064

  import requests
  import json
  url = "https://www.wikidata.org/wiki/Special:EntityData/"
  qnumbers = ['Q16641064'] # Haags liederenhandschrift // The Hague song manuscript
  for qnum in qnumbers:
      useurl = url + qnum + '.json'
      headers = {
  	'Accept' : 'application/json',
  	'User-Agent': 'User OlafJanssen - Haags liederenhandschrift'
      }
      r = requests.get(useurl, headers=headers)
      data = json.loads(r.text)
      properties = list(data['entities'][qnum]['claims'].keys())
      print(properties)

returning a list in Python

  ['P31', 'P18', 'P195', 'P217', 'P571', 'P973', 'P953', 'P373', 'P5008', 'P276', 'P1476', 'P170', 'P127', 'P767', 
  'P291', 'P2670', 'P2048', 'P2049', 'P186', 'P1104', 'P935', 'P8791', 'P1343', 'P528', 'P6216', 'P2671']

If we modify the last couple of code lines into

  .... # as previous
  data = json.loads(r.text)
  nlen= len(data['entities'][qnum]['claims']['P170'])
  for i in range(0, nlen):
      creatorid = data['entities'][qnum]['claims']['P170'][i]['mainsnak']['datavalue']['value']['id']
      creatorurl= "https://www.wikidata.org/w/api.php?action=wbgetentities&ids=" + str(creatorid) + 
      	"&props=labels&languages=en&format=json"
      creatorresponse = requests.get(creatorurl, headers=headers)
      creatordata = json.loads(creatorresponse.text)
      print(str(i+1)+": "+creatordata['entities'][creatorid]['labels']['en']['value'])

we can retrieve the English names (labels) of the three creators (P170) of this manuscript:

  1: Noydekijn
  2: Augustijnken
  3: Freidank

45) Talking about creators, let’s see how we can request a structured overview of persons and institutions related to a set of highlights, such as authors, makers, contributors, publishers, printers, illustrators, translators, owners, collectors etc. This is actually the machine readable equivalent of points 9 and 10 from Part 2.

Let’s do this for three KB highlights at the same time: 1) Admirandorum quadruplex spectaculum (Q42302438), 2) Kunst en samenleving (Art and society, Q72752446) and the above mentioned 3) Haags liederenhandschrift (Q16641064).

We use this SPARQL query in Wikidata:

# Overview of persons & institutions related to 
# 1 Admirandorum quadruplex spectaculum (Q42302438), 
# 2 Kunst en samenleving (Q72752446) and 
# 3 Haags liederenhandschrift (Q16641064) 
# such as authors, makers, contributors, publishers, printers, illustrators, translators, owners etc. 
SELECT DISTINCT ?hl ?hlLabel
(GROUP_CONCAT(DISTINCT ?creatorLabel ; separator = " ---- ") as ?creators)
(GROUP_CONCAT(DISTINCT ?authorLabel ; separator = " ---- ") as ?authors)
(GROUP_CONCAT(DISTINCT ?contributorLabel ; separator = " ---- ") as ?contributors)
(GROUP_CONCAT(DISTINCT ?editorLabel ; separator = " ---- ") as ?editors)
(GROUP_CONCAT(DISTINCT ?translatorLabel ; separator = " ---- ") as ?translators)
(GROUP_CONCAT(DISTINCT ?illustratorLabel ; separator = " ---- ") as ?illustrators)
(GROUP_CONCAT(DISTINCT ?publisherLabel ; separator = " ---- ") as ?publishers)
(GROUP_CONCAT(DISTINCT ?owned_byLabel ; separator = " ---- ") as ?owned_bys)
WHERE {
  # the thing is part of the KB collection, and has role 'collection highlight' within that collection
  ?hl (p:P195/ps:P195) wd:Q1526131; p:P195 [pq:P2868 wd:Q29188408].
  # limit to Q42302438, Q72752446 and Q16641064
  VALUES ?hl {wd:Q42302438 wd:Q72752446 wd:Q16641064}
  OPTIONAL{?hl wdt:P170 ?creator.}
  OPTIONAL{?hl wdt:P50 ?author.}
  OPTIONAL{?hl wdt:P767 ?contributor.}
  OPTIONAL{?hl wdt:P98 ?editor.}
  OPTIONAL{?hl wdt:P655 ?translator.}
  OPTIONAL{?hl wdt:P110 ?illustrator.}
  OPTIONAL{?hl wdt:P123 ?publisher.}
  OPTIONAL{?hl wdt:P127 ?owned_by.}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". ?hl rdfs:label ?hlLabel.
                           ?creator rdfs:label ?creatorLabel. ?author rdfs:label ?authorLabel. 
                           ?contributor rdfs:label ?contributorLabel. ?editor rdfs:label ?editorLabel. 
                           ?translator rdfs:label ?translatorLabel. ?illustrator rdfs:label ?illustratorLabel. 
                           ?publisher rdfs:label ?publisherLabel. ?owned_by rdfs:label ?owned_byLabel.}
  }
  GROUP BY ?hl ?hlLabel
  ORDER BY ?hlLabel

resulting into


Persons and institutions related to Q42302438, Q72752446 and Q16641064. Screenshot Wikidata query service, d.d. 11-05-2021

In this query the properties that return no results for any of the these highlights have been omitted (eg. none of the three highlights has values for P872 - Printer, so P872 was not included in the query)

Of course you can also request the JSON reponse or use a Pyhton script to make the request to the Wikidata SPARQL query service:

# pip install sparqlwrapper
# https://rdflib.github.io/sparqlwrapper/

import sys, json
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_url = "https://query.wikidata.org/sparql"
query = """
SELECT DISTINCT ?hl ?hlLabel
(GROUP_CONCAT(DISTINCT ?creatorLabel ; separator = " ---- ") as ?creators)
(GROUP_CONCAT(DISTINCT ?authorLabel ; separator = " ---- ") as ?authors)
(GROUP_CONCAT(DISTINCT ?contributorLabel ; separator = " ---- ") as ?contributors)
(GROUP_CONCAT(DISTINCT ?editorLabel ; separator = " ---- ") as ?editors)
(GROUP_CONCAT(DISTINCT ?translatorLabel ; separator = " ---- ") as ?translators)
(GROUP_CONCAT(DISTINCT ?illustratorLabel ; separator = " ---- ") as ?illustrators)
(GROUP_CONCAT(DISTINCT ?publisherLabel ; separator = " ---- ") as ?publishers)
(GROUP_CONCAT(DISTINCT ?owned_byLabel ; separator = " ---- ") as ?owned_bys)
WHERE {
  ?hl (p:P195/ps:P195) wd:Q1526131; p:P195 [pq:P2868 wd:Q29188408].
  VALUES ?hl {wd:Q42302438 wd:Q72752446 wd:Q16641064}
  OPTIONAL{?hl wdt:P170 ?creator.}
  OPTIONAL{?hl wdt:P50 ?author.}
  OPTIONAL{?hl wdt:P767 ?contributor.}
  OPTIONAL{?hl wdt:P98 ?editor.}
  OPTIONAL{?hl wdt:P655 ?translator.}
  OPTIONAL{?hl wdt:P110 ?illustrator.}
  OPTIONAL{?hl wdt:P123 ?publisher.}
  OPTIONAL{?hl wdt:P127 ?owned_by.}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". ?hl rdfs:label ?hlLabel.
                           ?creator rdfs:label ?creatorLabel. ?author rdfs:label ?authorLabel. 
                           ?contributor rdfs:label ?contributorLabel. ?editor rdfs:label ?editorLabel. 
                           ?translator rdfs:label ?translatorLabel. ?illustrator rdfs:label ?illustratorLabel. 
                           ?publisher rdfs:label ?publisherLabel. ?owned_by rdfs:label ?owned_byLabel.}
  }
  GROUP BY ?hl ?hlLabel
  ORDER BY ?hlLabel
"""
def get_results(endpoint_url, query):
    user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
    # User-Agent policy, see https://w.wiki/CX6
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

results = get_results(endpoint_url, query)
for result in results["results"]["bindings"]:
    print(result)

giving an output of three Python dictionaries:

{'hl': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q42302438'}, 'hlLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Admirandorum quadruplex spectaculum'}, 'creators': {'type': 'literal', 'value': 'Jan van Call'}, 'authors': {'type': 'literal', 'value': ''}, 'contributors': {'type': 'literal', 'value': ''}, 'editors': {'type': 'literal', 'value': ''}, 'translators': {'type': 'literal', 'value': ''}, 'illustrators': {'type': 'literal', 'value': 'Jan van Call'}, 'publishers': {'type': 'literal', 'value': 'Peter Schenk the Elder'}, 'owned_bys': {'type': 'literal', 'value': 'Aleida Betsy Terpstra'}}
{'hl': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q16641064'}, 'hlLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Haags liederenhandschrift'}, 'creators': {'type': 'literal', 'value': 'Freidank ---- Augustijnken ---- Noydekijn'}, 'authors': {'type': 'literal', 'value': ''}, 'contributors': {'type': 'literal', 'value': 'Eerste stadhouderlijke binderij'}, 'editors': {'type': 'literal', 'value': ''}, 'translators': {'type': 'literal', 'value': ''}, 'illustrators': {'type': 'literal', 'value': ''}, 'publishers': {'type': 'literal', 'value': ''}, 'owned_bys': {'type': 'literal', 'value': 'William V ---- William IV, Prince of Orange ---- Matilda of Guelders ---- Stadhouderlijke bibliotheek'}}
{'hl': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q72752446'}, 'hlLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Kunst en samenleving'}, 'creators': {'type': 'literal', 'value': ''}, 'authors': {'type': 'literal', 'value': 'Walter Crane'}, 'contributors': {'type': 'literal', 'value': 'Gerrit Willem Dijsselhof'}, 'editors': {'type': 'literal', 'value': 'Jan Veth'}, 'translators': {'type': 'literal', 'value': 'Jan Veth'}, 'illustrators': {'type': 'literal', 'value': ''}, 'publishers': {'type': 'literal', 'value': 'Scheltema en Holkema'}, 'owned_bys': {'type': 'literal', 'value': ''}}

46) In points 22-25 of Part 3 we looked at the portraits, genders, occupations and lifespans of the people who contributed to the Album amicorum Jacob Heyblocq and created an on-Wiki portrait gallery/facebook of album contributors directly from a Wikidata SPARQL query.

Let’s now look at three approaches for generating off-Wiki image galleries from the Wikimedia infrastructure. The following examples are detailed in (and extracted from) the article Reusing the album amicorum Jacob Heyblocq - Image gallery of album contributors here on Github.

1) An HTML facebook based on the Wikimedia Commons API - From the JSON response of the Wikimedia Commons query API and using this Python script we can generate a basic HTML portrait gallery of contributors.

2) An HTML facebook based on the Wikidata SPARQL service with a JSON response - From the JSON response of the Wikidata query service and using this Python script we can create this basic HTML portrait gallery.


Two approaches for making a HTML portrait gallery of contributors to the Album amicorum Jacob Heyblocq. Left: using the Wikimedia Commons API. Right: using the Wikidata SPARQL service with a JSON response. Screenshots d.d. 14-05-2021

3) An HTML facebook from a Wikidata SPARQL query using an embedded iframe - Using the same query as above, we can also embed the result into an HTML page by means of an iframe:

  <!DOCTYPE html>
  <html>
      <head>
          <title>Facebook of contributors to the album amicorum Jacobus Heyblocq - Wikidata SPARQL + HTML iframe</title>
      </head>
      <body>
      	<h1>Facebook of contributors to the album amicorum Jacobus Heyblocq - Wikidata SPARQL + HTML iframe</h1>
      	<iframe style="position: absolute; height: 100%; width: 100%; border: none" src="https://w.wiki/phx" referrerpolicy="origin" sandbox="allow-scripts allow-same-origin allow-popups"></iframe>
    </body>
  </html>

This results into a plain, unstyled facebook.

We can expand this approach into a design that fits seamlessly into the pages about the album on the KB website, resulting into a KB styled facebook.


Another approach for making a HTML portrait gallery of contributors to the Album amicorum Jacob Heyblocq, using a Wikidata SPARQL query and an embedded iframe. Left: A plain, unstyled facebook. Right: The same iframe, but now embedded into a KB styled portrait gallery. Screenshots d.d. 14-05-2021


47) In items 33 and 35 of Part 4 we already looked at things (Wikidata entities) that can be seen in KB collection highlights, making images not only discoverable via the regular metadata, but also multilingually searchable by content (What’s depicted in it?). Let’s now look at how we can retrieve depicted entities programmatically. We’ll use Atlas de Wit 1698 for this . We can do this either via a) the Wikimedia Commons SPARQL query service, b) the Wikimedia Commons API or c) via the Petscan tool.

a) To retrieve the depicted entities via the Wikimedia Commons SPARQL query service, we use this query:

 #Things depicted in Atlas de Wit 1698
 SELECT ?file (GROUP_CONCAT(DISTINCT ?depictionLabel ; separator = " -- ") as ?ThingsDepicted)
 WHERE {
   ?file wdt:P6243 wd:Q2520345 .
   ?file wdt:P180 ?depiction .
   SERVICE <https://query.wikidata.org/sparql> {  
     SERVICE wikibase:label {
         bd:serviceParam wikibase:language "en" .
         ?depiction rdfs:label ?depictionLabel.
         ?file rdfs:label ?fileLabel.
     }
   }
 }
 GROUP BY ?file

giving this result, which can also be requested as JSON.


Things depicted in Atlas de Wit 1698. Screenshot Wikimedia Commons SPARQL query service d.d. 15-05-2021

b) The Wikimedia Commons API allows us to retrieve depicted entities for individual images. Let’s use https://commons.wikimedia.org/wiki/File:Atlas_de_Wit_1698-pl048-Montfoort-KB_PPN_145205088.jpg as an example. As can be seen from the Concept URI link in the Tools navigation on the left, this file can also be requested via the URI https://commons.wikimedia.org/entity/M32093127, where ‘32093127’ is the Page ID that is listed in the Page information, also in the left hand navigation. This Mnumber is Wikimedia Commons’ equivalent of the Wikidata Qnumber.

From that Mnumber (M+Page ID) we can request the (Wikidata Qnumbers of the) depicted entities via the API call https://commons.wikimedia.org/w/api.php?action=wbgetentities&format=json&ids=M32093127 as JSON:


Wikidata Qnumbers of things depicted in File:Atlas de Wit 1698-pl048-Montfoort-KB PPN 145205088.jpg (M32093127). Result of this API call. Screenshot Wikimedia Commons API, d.d. 15-05-2021

If we want to list all things depicted in all images in Category:Atlas de Wit 1698, we can write a small Python script to iterate over all images in that category, using the API call we saw in item 41 to request the pageIDs and titles of the files in that category:

import requests
import json
baseurl = "https://commons.wikimedia.org/w/api.php?action="
cat = "Category:Atlas_de_Wit_1698"
headers = {'Accept' : 'application/json', 'User-Agent': 'User OlafJanssen - Category:Atlas_de_Wit_1698'}

filesurl= baseurl + "query&generator=categorymembers&gcmlimit=500&gcmtitle=" + cat +  "&format=json&gcmnamespace=6"
files = requests.get(filesurl, headers=headers)
filesdata = json.loads(files.text)
pageids=list(filesdata['query']['pages'].keys())

for pageid in pageids:
    mnumber="M"+str(pageid)
    pageurl= baseurl + "wbgetentities&format=json&ids=" + str(mnumber)
    pageresponse = requests.get(pageurl, headers=headers)
    pagedata = json.loads(pageresponse.text)
    pagetitle=pagedata.get('entities').get(mnumber).get('title')
    p180s = pagedata.get('entities').get(mnumber).get('statements').get('P180', 'XX')
    if str(p180s) != "XX":
        depictslist=[]
        for p in range(0, len(p180s)):
            qnum= p180s[p]['mainsnak']['datavalue']['value']['id']
            depictsurl = "https://www.wikidata.org/w/api.php?action=wbgetentities&ids=" + str(qnum) + "&props=labels&languages=en&format=json"
            depictsresponse = requests.get(depictsurl, headers=headers)
            depictsdata = json.loads(depictsresponse.text)
            depicts = depictsdata.get('entities', 'XX').get(qnum).get('labels', 'XX').get('en', 'XX')
            if str(depicts) != "XX":
                a = str(depicts.get('value')) + " (" + str(qnum) + ")"
                depictslist.append(a)
        print(str(mnumber) + " ||  " + str(pagetitle) + " ||  " + ' -- '.join(depictslist))

This gives the following result:

M32246841 || File:Atlas de Wit 1698-pl017-Leiden-de burcht.jpg || tree (Q10884) -- peafowl (Q201251) -- dog (Q144) -- Burcht van Leiden (Q2345558) -- gate (Q53060)
M32092934 || File:Atlas de Wit 1698-pl017-Leiden-KB PPN 145205088.jpg || Leiden (Q43631) -- Rhine (Q584) -- Oude Rijn (Q2478570) -- Nieuwe Rijn (Q671841) -- Nieuwe Rijn (Q57945772) -- Pieterskerk (Q1537972) -- Hooglandse Kerk (Q1537970) -- Rapenburg (Q2597656) -- Academy Building (Q2515805) -- Hortus Botanicus Leiden (Q2468128) -- Zijlpoort (Q2326072) -- bolwerk (Q891475) -- fortified town (Q677678) -- Morschpoort, Leiden (Q2688448) -- Marepoort (Q1817627) -- Burcht van Leiden (Q2345558)
M32246845 || File:Atlas de Wit 1698-pl017-Leiden-Pieterskerk.jpg || Pieterskerk (Q1537972) -- tree (Q10884) -- dog (Q144) -- weather vane (Q524738)
M32246848 || File:Atlas de Wit 1698-pl017-Leiden-St Pancraskerk.jpg || Hooglandse Kerk (Q1537970) -- cloud (Q8074) -- weathercock (Q2157687) -- leadlight (Q488094) -- door (Q36794) -- woman (Q467) -- child (Q7569) -- dog (Q144) -- hat (Q80151) -- carriage (Q235356) -- walking stick (Q1347864) -- horse (Q726) -- tree (Q10884) -- crow-stepped gable (Q1939660) -- clock (Q376) -- Burcht van Leiden (Q2345558)
M32246852 || File:Atlas de Wit 1698-pl017-Leiden-stadhuis.jpg || Leiden City Hall (Q2191676) -- dog (Q144) -- crow-stepped gable (Q1939660) -- cow (Q11748378)
M32092941 || File:Atlas de Wit 1698-pl017a-Leiden, Stadhuis-KB PPN 145205088.jpg || Leiden City Hall (Q2191676) -- Hooglandse Kerk (Q1537970) -- Leiden (Q43631) -- Burcht van Leiden (Q2345558) -- dog (Q144) -- cow (Q11748378) -- peafowl (Q201251) -- Pieterskerk (Q1537972) -- coach (Q4655519) -- crow-stepped gable (Q1939660) -- gate (Q53060)
....
M32092951 || File:Atlas de Wit 1698-pl018-Amsterdam-KB PPN 145205088.jpg || Amsterdam (Q727) -- Royal Palace of Amsterdam (Q1056152) -- fortified town (Q677678)
M32092959 || File:Atlas de Wit 1698-pl018a-Amsterdam, Dam-KB PPN 145205088.jpg || Dam Square (Q839050) -- Royal Palace of Amsterdam (Q1056152) -- dog (Q144) -- horse (Q726) -- fire extinguisher (Q190672) -- fire department (Q6498663) -- Nieuwe Kerk (Q1419675) -- weigh house (Q1407236) -- Oude Kerk (Q623558) -- pump (Q134574) -- fire hose (Q1410061) -- firewater (Q5452025) -- coat of arms of Amsterdam (Q683829)
M32092960 || File:Atlas de Wit 1698-pl018b-Amsterdam, Stadhuis-KB PPN 145205088.jpg || Royal Palace of Amsterdam (Q1056152)
M32092964 || File:Atlas de Wit 1698-pl018c-Amsterdam, profiel (Joan de Ram)-KB PPN 145205088.jpg || Amsterdam (Q727) -- boat (Q35872) -- river (Q4022)
M32092969 || File:Atlas de Wit 1698-pl018d-Amsterdam, Oude Kerk-KB PPN 145205088.jpg || Royal Palace of Amsterdam (Q1056152) -- weigh house (Q1407236) -- Nieuwe Kerk (Q1419675) -- market (Q37654) -- horse (Q726) -- Euronext Amsterdam (Q478720) -- exchange building (Q10882966) -- Oude Kerk (Q623558) -- péniche (Q7578326) -- porter (Q1509714) -- coat of arms of Amsterdam (Q683829) -- dog (Q144)
.....

c) An alternative way of finding the pageIDs of the category members is by using the JSON response of the PetScan tool for the given category. I leave it to the reader to implement this approach into the Python script.


48) In item 46 we looked at portrait galleries of the contributors to the Album amicorum Jacob Heyblocq, where the portraits were stored in the Wikimedia infrastructure (Wikimedia Commons to be exact). Let’s now look at external (non-Wikimedia) databases describing these persons, their images, works and their lives. For instance let’s look at

Each database has its associated Wikidata property:

Let’s start by querying Wikidata to see which AAJH contributors have any of these properties connected to them:

  SELECT DISTINCT ?contr ?contrLabel ?EuropeanaID ?EuropeanaURI ?RKDID ?RKDURI ?BPID ?BPURI ?DBNLaID ?DBNLaURI
  WHERE {
    wd:Q72752496 wdt:P767 ?contr.
    OPTIONAL { ?contr wdt:P7704 ?EuropeanaID.
             BIND(URI(CONCAT(concat("https://www.europeana.eu/api/entities/",?EuropeanaID ,".json?wskey=apidemo")) as ?EuropeanaURI)}
    OPTIONAL {?contr wdt:P650 ?RKDID. 
             BIND(URI(CONCAT("https://api.rkd.nl/api/record/artists/",?RKDID ,"?format=json")) as ?RKDURI)} 
    OPTIONAL {?contr wdt:P651 ?BPID. 
             BIND(URI(CONCAT("http://www.biografischportaal.nl/persoon/json/", ?BPID)) as ?BPURI)} # http://www.biografischportaal.nl/about/bioport-api-documentation
    OPTIONAL {?contr wdt:P723 ?DBNLaID. 
             BIND(URI(CONCAT("http://data.bibliotheken.nl/doc/dbnla/",?DBNLaID ,".json")) as ?DBNLaURI)}  
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    }
  ORDER BY DESC(?EuropeanaID)

Via the BIND(URI(CONCAT())) operators we constructed direct URIs for retrieving JSON responses from the databases. This query results into this JSON result, or this table:


Europeana, RDK, BiografischePortaal and DBNL identifiers and JSON URIs for the contributors of the Album amicorum Jacob Heyblocq, as retrieved from Wikidata. Screenshot Wikidata query service, d.d. 16-05-2021

Once retrieved, we can now use these JSON URIs as starting points to further dive into the REST APIs of these databases and retrieve information from them that is not available in the Wikimedia infrastructure. Let’s elaborate this for the Europeana REST APIs (columns 3 and 4 in the above screenshot).

For instance, let’s look at Govert Flinck with Wikidata Q550401, Europeana agent/base/63198 and EuropeanaURI https://www.europeana.eu/api/entities/agent/base/63198.json?wskey=apidemo.

The works by Govert Flinck can be retrieved via http://data.europeana.eu/agent/base/63198, which redirects to the Europeana website https://www.europeana.eu/en/collections/person/63198-govert-flinck. This result set can also be requested as JSON via the API call https://www.europeana.eu/api/v2/search.json?wskey=api2demo&media=true&start=1&rows=100&profile=minimal&query=%22http://data.europeana.eu/agent/base/63198%22, retrieving results 1-100 (&start=1&rows=100&), as documented here.


Works by Govert Flinck as listed on the Europena website (left) and in the Europeana API, first 100 results as JSON (right). Screenshots Europeana website and API, d.d. 01-06-2021

One of Flinck’s works on Europeana is Schutters van de compagnie van kapitein Joan Huydecoper en luitenant Frans van Waveren, which can be represented in JSON via https://api.europeana.eu/record/2024903/photography_ProvidedCHO_KU_Leuven_9990688460101488.json?wskey=api2demo.

This work is in the collection of the KU Leuven University, where the full image is available via https://lib.is/IE2392190/stream?quality=LOW. A thumbnail can be generated using the Europeana API: https://api.europeana.eu/thumbnail/v2/url.json?uri=https%3A%2F%2Flib.is%2FIE2392190%2Fstream%3Fquality%3DLOW&type=IMAGE


Image and metadata for Schutters van de compagnie van kapitein Joan Huydecoper en luitenant Frans van Waveren by Govert Flinck on the Europena website (left, middle) and as JSON from the Europeana API (right). Screenshots Europeana, d.d. 01-06-2021

So using its API, we can use Europeana to programmatically find various pieces of interesting information about a single artwork by a single contributor to the Album amicorum Jacob Heyblocq. We can extend this approach to include all artworks (as far as they are known in Europeana) by all album contributors (as far as they are known in Europeana). We did this by writing this Python script. Please note this script is work in progress, so it is not fully finished, complete and/or reliable, but it should give the reader an idea of an approach for programmatically retrieving data from Europeana. Using Pycharm IDE for running it, the console output looks like this:


Console output of a Python script for finding works in Europeana by the contributors to the Album amicorum Jacob Heyblocq. The results for Govert Flinck are dislayed. Screenshot Pycharm IDE, d.d. 01-06-2021

The output is also written to this Excel file.


49) We can combine a SPARQL query in Wikidata with simultaneous queries in other SPARQL endpoints. This is called federated SPARQL querying and we can use it to extract some base information from Wikidata and combine that with additional, enriching information from other, external (linked open) databases.

Let’s say we want to look for Dutch literary works written by the contributors of the Album amicorum Jacob Heyblocq, as stored in the DBNL website and retrieve (the URLs of) the first pages of those works. We can construct this federated SPARQL query for that:

# Look for Dutch literary works written by the contributors of the Album amicorum Jacob Heyblocq in www.dbnl.org
# and retrieve (the URLs of) the first pages of those works
PREFIX schema: <http://schema.org/>

SELECT DISTINCT ?WDcontr ?WDcontrLabel ?WDDBNLaID #Wikidata stuff
?DBNLauthorURL ?DBNLauthorName #DBNL author stuff, as is ?authorid
?DBNLworkID ?DBNLworkURL ?DBNLworkTitle #DBNL work stuff
?DBNLwebsiteURL ?DBNLtextURL #DBNL website stuff

WHERE {wd:Q72752496 wdt:P767 ?WDcontr.
      ?WDcontr wdt:P723 ?WDDBNLaID. 
      SERVICE <http://data.bibliotheken.nl/sparql>{
         ?DBNLauthorURL schema:identifier ?WDDBNLaID;
             schema:name ?DBNLauthorName;
             schema:mainEntityOfPage ?page.
         ?page schema:mainEntity ?authorid.
         ?DBNLworkURL schema:author ?authorid;
             schema:identifier ?DBNLworkID;
             schema:name ?DBNLworkTitle;
             schema:url ?DBNLwebsiteURL.
      }
      BIND(URI(CONCAT("http://www.dbnl.org/tekst/", ?DBNLworkID, "_01/",?DBNLworkID,"_01_0001.php")) as ?DBNLtextURL)
      SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
LIMIT 150

After having retrieved the base information about a contributor (author) from Wikidata (?WDcontr, ?WDcontrLabel, ?WDDBNLaID), we use WDDBNLaID to find information about the author in http://data.bibliotheken.nl, the LOD triplestore of the KB (?DBNLauthorURL, ?DBNLauthorName, ?authorid). From there we find information about the works that the author wrote (?DBNLworkID, ?DBNLworkURL, ?DBNLworkTitle). In the last step we find two links to the DBNL website (?DBNLwebsiteURL, ?DBNLtextURL), where the latter is the link to the first page of the work.

This query results into a list or a JSON response (query might be slow).


Dutch literary works in the DBNL website written by the contributors of the Album amicorum Jacob Heyblocq, obtained by a federated SPARQL query in both Wikidata and data.bibliotheken.nl, the LOD triple store of the KB. The last column shows (the URLs of) the first page of the works. Screenshot Wikidata query service, d.d. 19-05-2021

Reuse - individual highlight images

Finally, to wrap up this (long) article, let’s look at an example of reusing individual highlight images. Let’s use the map of the Iberian Peninsula we looked at in the beginning of Part 4.

50) Using a Wikimedia Commons API tool created by (the great) Magnus Manske, we can programmatically request (meta)data associated with an individual image in XML, such as


URLs associated with the map of the Iberian Peninsula as returned by Magnus Manske’s Wikimedia Commons API tool. Note the URL of the thumbnail of 234px wide. Screenshot Wikimedia Commons API tool, d.d. 15-06-2021

We can also query the Commons API directly to retrieve information about an individual image. We use these examples and this imageinfo API documentation for inspiration. For example:


Comparision of the same Dutch language metadata snippet for the map of the Iberian Peninsula. Top part from the regular file page, bottom part from the Wikimedia Commons API. Screenshots from Wikimedia Commons (top) and its API (bottom), d.d. 15-06-2021

Summary

OK, we could have easily gone to 60+ examples, but that’s it for this fifth and last article. For convenience and overview, let me summarize all the cool new things for KB’s collection highlights we have seen in this article:

38) A SPARQL driven thumbnail gallery of KB highlights.
39) Structured lists of all KB highlights, both simple and more elaborate in JSON and XML.
40) Programatically check for Wikipedia articles about KB highlights in Dutch.
41) Request multiple image URLs from the Wikimedia Commons query API for a specific highlight, both via URL query strings and Python scripts.
42) Readily available bulk image download tools for obtaining hires image URLs and/or the hires images themselves from a specific KB highlight category on Wikimedia Commons. 43) Request highlight information from the Wikidata API in multiple formats, directly from the highlight’s Qnumber.
44) Request full Wikidata items in seven different formats via a Special:EntityData URL, directly from the Qnumber: HTML, JSON, JSON-LD, RDF, NT, TTL or N3 and PHP.
45) Get a structured, machine readable overview of persons and institutions related to KB highlights, such as authors, makers, contributors, publishers, printers, illustrators, translators, owners, collectors etc.
46) Multiple approaches for generating off-Wiki image galleries from the Wikimedia infrastructure, as detailed in the article Reusing the album amicorum Jacob Heyblocq - Image gallery of album contributors.
47) Programmatically retrieve things depicted in images, either via the Wikimedia Commons SPARQL query service, the Wikimedia Commons API or via the Petscan tool.
48) Starting from selected Wikidata biographical identifiers such as the Europeana entity (P7704), extract information from external (non-Wikimedia) databases using their REST APIs.
49) Extract information simultaneously from both Wikidata and external databases using federated SPARQL queries, such as this example.
50) Programmatically request (meta)data associated with an individual image via both the Wikimedia Commons API tool and directly from the Commons API, using these examples and this imageinfo API documentation for inspiration.

Part 6 - Summary of summaries

As a bonus - and for overview - I’ve created a summary of the individual summaries from parts 2, 3, 4 and the one above. See Part 6, Summary for all 50 new cool things in a super handy single list.


About the author

Olaf Janssen is the Wikimedia coordinator of the KB, the national library of the Netherlands. He contributes to Wikipedia, Wikimedia Commons and Wikidata as User:OlafJanssen

Reusing this article

This text of this article is available under the CC-BY 4.0 license.

Image sources & credits