Distant viewing: analyzing large visual corpora

Bibliography

Arnold, T., & Tilton, L. (2019). Distant viewing: Analyzing large visual corpora. Digital Scholarship in the Humanities, 34(Supplement_1), i3–i16. https://doi.org/10.1093/llc/fqz013

Abstract

These various features describe particular elements of each image by a specific system of codes, which can take on the form of either structured data (coordinates and labels, as in Fig. 1) or linguistic data. These extracted elements do not attempt to capture all of the elements of an image; as mentioned, the interpretive act of coding images is necessarily destructive. The metadata here, also, does not directly attempt to measure higher-order meanings such as the themes, mood, or power dynamics captured by an image. However, much like the relationship between words and cultural elements in text, these elements can often be discerned by studying patterns in the extracted features.

Notes

Reading Notes

  • reading the abstract makes me wonder if it’s a problem, that I didn’t go for something in specific, like 
 faces?

Go to annotation “Our framework, distant viewing,is distinguished from other approaches by making explicit the interpretive nature of extracting semantic metadata from images. In other words, one must ‘view’ visual materials before studying them.” (Arnold and Tilton, 2019, p. 3)

  • distant viewing always works with metadata. just sometimes, the metadata comes in the form of embeddings or annotations
  • Go to annotation “visual material is ‘pre-linguistic, a ‘‘truth’’ of vision before it has achieved formulation’ (Scott, 1999, p. 20)” (Arnold and Tilton, 2019, p. 4), which indicates that one must indeed see to know, and know what to see, or knows it when sees it
  • how does this basic description of pre-linguistic imagery interact with the video game image? i can imagine that describing images in language (or code and math) is just completely unnatural and another indication why there are images missing in early video games; there were no tools yet
  1. images correspond with a reality (constructed or not) by their affinity
  2. An image has a connotation, the culture around why it was made the way it is

Go to annotation “The explicit code system of written language provides a powerful tool for the computational analysis of textual corpora. Methods such as topic modeling, term frequency-inverse document frequency, and sentiment analysis function directly by counting words, the smallest linguistic unit that can be meaningfully understood in isolation (Saussure, 1916)” (Arnold and Tilton, 2019, p. 5)

  • interesting in regards to distant viewing source code
  • raw pixels don’t hold the same meaningful value as single words
  • computational analysis makes sense because individually describing images by hand is an act of interpretative, and individual encoding of what is seen; see semantic gap

Go to annotation “The process of coding images in this way is both destructive and interpretive. Many elements of the image are lost in this description, and no amount of words could ever fully capture the entirety of the original photograph.” (Arnold and Tilton, 2019, p. 5)

  • since I was explicitly not interested in object or image classification, but only the formal layer, working only with embeddings made a lot of sense
  • distant viewing: distinguishing itself from text as priviledged material situated in cultural analytics and differing from how we work with text, rooted in different theories