Text Analysis with Voyant

Text-Mining are different tools historians and researchers use to gather meaning from a set of information or data. Lincoln A Mullen from America’s Public Bible project states, “The project uses the techniques of machine learning and text analysis to find the quotations, and it uses data analysis and visualization to make sense of them.” (2) Text mining allows analysis of a set of data to gather meaning from it. The types of information a researcher can use can vary. In the case of America’s Public Bible and the Signs@40 projects, researchers use written sources in the form of newspapers and articles, respectively, to gather their information. However, it does not always have to be just written sources. In the case of the Robots Reading Vogue project at Yale University, researchers used visual sources in the form of magazine covers along with written sources to accumulate data. From the data collected, researchers then ask questions and gather meaning from that data. Researchers usually try to find trends and patterns in the data and then gather meaning from those. Historians might want to understand the period or events associated with the information presented, so they would look for any trends or patterns in their collected data to find answers.

Voyant is an excellent text mining tool for historians and researchers to help researchers find those trends and patterns. Researchers can put information into the free online tool and investigate the data using the five tools provided. The five tools work together as a corpus, making it easy for researchers to find the trends and patterns in the information they provide.

When using the tool, I used data from the WPA Slave Narratives collection. When looking at the data on Voyant, I wanted to compare different states and look at the similarities and differences between them. I decided to look at Georgia and Virginia and their most used words with their respective word clouds. Looking at Georgia’s word cloud, I noticed that there were a variety of stereotypical southern words, including plantation, marster, and slaves. The most used word, however, was old.

In comparison, Virginia’s word cloud contained more dialect words, including ah, yo, tuh, and yer. However, the most used word was slaves.

When looking at the articles concerning the word clouds, Virginia focused more on dialect with enslaved people, while Georgia’s was more about life in the South.

Why Metadata Matters

Metadata is considered “data about data” (Carbajal and Caswell). According to the Jisc article, metadata “is usually structured textual information that describes something about the creation, content, or context of a digital resource—be it a single file, part of a single file, or a collection of many files.” (Jisc) Metadata describes every part of an image. It is essential in the field of digital humanities. If there is a picture of an object but no information attached to that object, people would not be able to use that digital item. Not only that, but the item would not be discoverable (DPLA). For example, with my kitchen item, if I did not add any metadata to those images, other people would not be able to search and use them.

Several different metadata categories are essential in the realm of digital humanities. The first category is the description of an item. The Carbajal and Caswell article states, “For archivists, preservation and description are key ingredients in making a collection of records ‘archival.’” (Carbajal and Caswell). A digital item should have a thorough description so that a person will know what that item is. If I did not put a description on my kitchen items, people would not know what the image is. Two other metadata categories are equally essential and coincide: creator and rights. People can find all kinds of digital images. However, one must know how to find the rights to those digital images. For example, I cannot just find an image on the internet and decide to use it; however, I want to use it. I need to make sure there is no copyright claim on the image. This goes right into the creator element. With rights, there is the correlated element of the creator. Some rights claims require people to ask the original creator for permission to use the image, so it is essential to include the creator when creating metadata.

Tropy is an excellent way to help digital humanities practitioners work with metadata. Tropy is a program that people can use to help input and describe sources using metadata. I found it helpful and easy to use in the kitchen items assignment. Tropy made it extremely easy to input the pictures and describe the images using metadata. Omeka is another program that can help digital humanities practitioners work with metadata, especially combined with Tropy. Omeka is a web platform where people can create digital exhibits. While using Tropy, researchers can export and import their data into Omeka. After that, they can then create exhibits using that data. I could do that with the items I exported from Tropy and then imported into Omeka. I could add information, change the layout, and add pictures of the kitchen items.