Thinking through Metadata

This is a text that I started in March 2024 and wanted to publish on the Ludens Blog, but I don’t find the time and motivation to push it through. So here you go, might be rough in its edges.


After two dataset sprints and several weeks of working on adding Metadata to Wikidata, I felt the need to reflect on the process in the spirit of learning in public. Two aspects stood out for me: the Wikidata Videogame Project community and modelling ontologies as a way of thinking through a problem. A reflection on Wikidata from the perspective at the intersection of Video Game Studies and digital humanities.

One of the backbones of research in digital humanities is Metadata, data on data. It is our way of making the stuff the world is made of accessible to the computer. Metadata and Metadata ontologies are an attempt at systematically describing our research subjects. After many iterations of such systems of description we arrived at linked open data, or LOD for short. I will not attempt to explain LOD in detail, since there are better resources for that1. Basically it means that my way of describing things is compatible with another researcher’s and that both our descriptions are publicly referenceable.

The systematic and acribic description of stuff through Metadata as well as the possibilities to build on other researchers’ work doesn’t make the computer comprehend the world, as in cognitive intelligence, but it helps the machine tremendously to better understand what we want from it when we inquire our data. Since such a Metadata corpus can be quintessential for certain computational research approaches and no such corpus in regards to our research interest existed that also adhered to the FAIR/CARE principles, I set forth to produce one.

The FAIR2 and CARE3 principles are no technical standards but guidelines to ensure, that scientific data remains free (as in freedom of speech, not free as in beer), accessible and most importantly, of benefit to communities and the public at large. The CARE principles came into being as a reaction to FAIR, as a response to highlight the importance knowledge and open data has for indigenous communities.

Archival work is grunt work. After two month of working on this Metadata corpus, all I’ve achieved is to produce a meager sheet with little over 150 entries and roughly three dozen columns. The work that goes into such a dataset contains not only scouring various sources, such as online databases, scans of packagings, looking at video games ending titles, or inquiring people about their knowledge. There is also the constant reflection on how to structure the dataset. During these past weeks I’ve had discussions with several experts from archival studies and professionals on this topic and my already high esteme of that discipline and the people within grew even more. My biggest takeaway from those discussion was well outlined by Michelle Caswell.

“There seems to be little understanding in the humanities that professional archivists have master’s degrees, that archival standards and best practices are culturally constructed artifacts, and […]. Like so many other feminized professions – education and nursing are prime examples – archivists have been relegated to the realm of practice, their work deskilled, their labor devalued, their expertise unacknowledged.” [@caswellArchiveNotArchives2016]

The position Wikidata holds in the digital humanities sphere as well as the way it structures data are both disputed. Some argue that it doesn’t adhere to important datastructure standards, while being the most popular linked open data project. My decision to work with Wikidata was made after learning about the Wikidata WikiProject for video games as well as reading “Wikidata, the underground fungus in the vast forest that is the Internet” by Jean-Frédéric Berthelot. He put it this way.

“I believe Wikidata can be the underground fungus in the vast forest that is the Internet. With every identifier we cross-link, we are weaving this underground network, and in time we will allow these databases, even of different “species”, to talk to each other and exchange information and data.” [@berthelotWikidataUndergroundFungus2019]

How far the aspect of “talking to each other” is actually of a general interest is open to interpretation. I haven’t encountered any video games related databases and platforms that are interested in communicating with each other, ie sharing their knowledge and expertise. That said, gathering references on Wikidata has been indispensable to my own work. Cue to the first key aspect of this reflection, the video game WikiProject community. I’ve been able to get in contact with Jean-Frédéric Berthelot and get direct support for a variety of questions I had in working with Wikidata, but also as an introduction to the community caring for the data. Thank you Jean-Fred.

I will not go into the details of how Wikidata wants its data to be structured and just relegate to the Wikidata Introduction page, which well explaines that aspect. The important bit is, that besides some basics, structure is up for discussion. A video game needs different properties to be described than a grafic artist. Wikidata lets these properties defined by people and communities who are willing to spend their time and ressources doing so. Although this is generally a democratic approach, it currently suffers from this. Since openness doesn’t automatically create fair conditions to participate, there is for example a gender bias on Wikipedia, and others argue the need to question “modeling, collection, curation, and presentation of datasets” [@langDataFeminismDH2023]. In worst case scenarios, such community processes can turn into openly queer-phobic behaviour and actively create harm, as explained in a post on french-speaking Wikipedia deciding to deadname4 trans people [@ramaTodayFrenchspeakingWikipedia2024] 5.

Log 2024 Week 25

Footnotes

  1. For example the article on Wikipedia on Linked data or this fantastic primer What Are Linked Data and Linked Open Data? | Ontotext Fundamentals

  2. FAIR is short for findability, accessibility, interoperability, and reusability. See FAIR data - Wikipedia

  3. CARE stands for collective benefit, authority to control, responsibility, and ethics. See CARE Principles for Indigenous Data Governance - Wikipedia

  4. Read more about Deadnaming - Wikipedia

  5. Read the full post via Mastodon Bagarrosphère France