What Can Digital Humanities Do With Crowdsourcing?

Crowdsourcing is how historians in the Digital Humanities field can create projects that create new ways of analyzing material. Collaborators to The Collective Wisdom Handbook state, “Crowdsourcing can open up new possibilities for research that could not have existed otherwise, fundamentally changing authoritative practices, definitions, and classifications.” (The Collective Wisdom 2) Along with changing the way one analyzes data, there is a component of collaboration. The collaborators on The Collective Wisdom Handbook explain, “At its best, crowdsourcing increases public access to cultural heritage collections while inviting the public to engage deeply with these materials.” (The Collective Wisdom 3) The publics’ collaboration with these projects is one of the main components crowdsourcing. These projects come about in many different ways and what tasks the public undertakes. One of the best examples of these different tasks is transcription.

Transcription in crowdsourcing projects puts the task directly into the public’s hands. The project creators upload documents and ask volunteers to help transcribe them for an online archive. One of the first projects to bring this idea forward was Transcribe Bentham. The Transcribe Bentham project uses documents created by philosopher Jeremy Bentham “with the intention of engaging the public with Bentham’s thought and works, creating a searchable digital repository of the collection, and quickening the pace of transcription and publication by recruiting unpaid online volunteers to assist in transcribing the remaining manuscripts.” (Causer, Tonra, and Wallace, Transcription maximized, 120) The project aims to have volunteers transcribe the documents that creators upload in a collaborative effort to build an archive of Bentham’s documents.

Another project that uses transcription in a crowdsourcing project is By the People. By the People is a project created by the Library of Congress, designed similarly to Transcribe Bentham, that encourages volunteers to help transcribe documents uploaded to create an archive. The documents in this project vary widely “into ‘Campaigns’ and presented to volunteers along with transcription conventions, a discussion platform, and explanatory material to help folks learn a bit about the subjects of the documents.” (Hyning and Jones, Data’s Destinations, 9) This learning of the material is how these projects keep participants engaged in transcribing the documents. With Transcribe Bentham, the creators discovered that volunteers “were motivated by a sense of contributing to the greater good by contributing to the production of the Collected Works and making available Bentham’s writings to others, whereas some even found transcribing fun.” (Causer, Tonra, and Wallace, Transcription maximized, 127) To keep volunteers excited and engaged to transcribing the material, they need to feel like they play a significant factor in the process and it has to be fun.

How to Read Crowdsourced Knowledge

Wikipedia has gained a reputation since its inception in 2001. To this day most educational spaces treat the website as taboo, a website filled with errors that users should avoid at all costs. Alexandria Lockett, in her article “Why Do I Have Authority to Edit the Page? The Politics of User Agency and Participation on Wikipedia” (2020), states, “Wikipedia was clearly shaking up the education system back then, and it continues to be taught as a forbidden space. Throughout my undergraduate studies, my peers and I noticed and discussed that our professors were increasingly issuing threats and warnings about using and citing Wikipedia.” (208) Despite those threats, students, and others, continue to use the website for information on various topics. Roy Rosenzweig, in his article, “Can History Be Open Source? Wikipedia and the Future of the Past” (2006), believes this can be a good thing for educational and historical purposes. He states, “One reason professional historians need to pay attention to Wikipedia is because our students do . . . We should not view this prospect with undue alarm. Wikipedia for the most part gets its facts right . . . And the general panic about students’ use of Internet sources is overblown.” (136) As Rosenzweig implies, Wikipedia can be great for beginning research for students and historians—one reason is their Talk and History pages.

The Talk page on a Wikipedia article is a section where users can bring up concerns, potential edits, or any other topics they feel are essential for the article. This section can give users a sense of how the article has changed over time and what different contributors see as the critical issues and concerns of that article. When I looked at the Talk section for the “Digital Humanities,” I noticed that there were different concerns with the organization, particularly with the references and sources.

This attention to the sources tells me there is a concern with the contributors about where the information comes from and that the information is most likely accurate. This conclusion lines up when I looked at the History page for the article. The History page for a Wikipedia article gives every single time a revision took place for a particular article. It gives the user who revised the article, the edit size, and when the edit took place

When I looked at the page, I noticed three contributors significantly changed to the article: Elijahmeeks, Gabrielbodard, and SimonMahony. I decided to click on their names to learn more about them, and I found that they were all associated with the Digital Humanities in some capacity. This association lines up with the Talk page and the concerns about sources. People associated with the Digital Humanities field will be concerned about where the information is coming from and if it is accurate which is why it is brought up repeatedly on the Talk page.

Recently, there have been concerns about using A.I. for crowdsourcing information, especially as a threat to Wikipedia. In his article “Wikipedia’s Moment of Truth” (2023), Jon Gertner states, “On a conference call in March that focused on A.I.’s threats to Wikipedia, as well as the potential benefits, the editors’ hopes contended with anxiety. While some participants seemed confident that generative A.I. tools would soon help expand Wikipedia’s articles and global reach, others worried about whether users would increasingly choose ChatGPT . . . .” (36) There are many reasons why A.I. worries experts when crowdsourcing information. When I was using ChatGPT, I noticed a lack of citations for the information the program answers with. This lack of citations shows how well Wikipedia does with crowdsourcing information since there is an emphasis on citing information. However, there are still possibilities when using A.I. When I was looking at main article for Digital Humanities on Wikipedia I noticed that the Tools section was a little light, only mentioning a few tools. When I asked ChatGPT to give me some examples of different tools, it gave me several compared to the Wikipedia article. I can see a use for ChatGPT and other A.I. programs for crowdsourcing information. Nonetheless, the problems with citations still gives Wikipedia a point in proving accuracy.

Compare Tools

The digital tools Voyant, Kepler.gl, and Palladio are handy tools for historians and researchers to use when analyzing digital material. Researchers use these tools to look at relationships within, find patterns and trends, and convey meaning and understanding from the research material. Researchers can do this because of the visualizations these three tools create making finding specific patterns easier. While the goals for why researchers use these tools are similar, the tools themselves are very different in their executions.

Voyant is a text-mining tool researchers use to convey meaning from a data set. It includes five tools researchers can use together to derive meaning from selected texts. The Cirrus tool is a word cloud that shows words that appear most frequently. Like Cirrus, the Trends tool is a graph visualization that shows the frequency of a specific word across the document. The Reader tool shows the document itself where users can select a specific word to find in the document. The Summary tool shows an overview of the specific word in terms of the document. Finally, the Contexts tool shows each time the document uses the word. These tools, used together, can find meaning in how words can tell a lot about a period, place, people, and culture.

Kepler.gl is a digital mapping tool researchers can use to discover patterns and trends across space. Even a simple point map can derive meaning from the data they are researching by showing the different relationships between those points. However, that is only part of what Kepler.gl can do with a map and data. Researchers can change the map to look at the data in different ways. Instead of points, one can change it to a cluster map to see the quantity of the data and compare them. A heat map can do the same thing, showing the density of the data. Other items can also be added, including a timeline showing how that data changed over time. All of this can give researchers more information than just traditional research.

Similar to Kepler.gl, Palladio can also make use of maps in a digital medium. However, the tool’s main component is using network graphs to show the relationships between different data. These graphs, overlaid on a map, can show the connections across space. Nonetheless, some of the most helpful information comes from looking at the network graphs themselves and limiting which data is present. Limiting the data can show different connections that otherwise go unseen in traditional research or are harder to see. For example, when I worked with Palladio, I limited the information to only look at topics discussed between male and female interviewees. While both groups shared similar topics, some were limited to a particular group. These topics tell me a lot about the subjects themselves and the culture at the time of their interviews.

While these three tools have different modes of executions, researchers should look at them in a variety of ways. Researchers can use these tools together to find meaning, patterns, and trends with the material and data they choose to research. While Palladio can use maps with the network graphs, it only shows a certain amount of information. A researcher should also include using Kepler.gl to get the information one can get from using a map that otherwise Palladio lacks. At the same time, Palladio also relies heavily on using words especially in the network graphs. Voyant makes the perfect complement to that program to delve deeper into using those words in the data.

Network Analysis with Palladio

Network graphs are helpful tools that researchers can use to convey meaning between information connections. According to authors Ruth Ahnert, Sebastian E. Ahnert, Catherine Nicole Coleman, and Scott B. Weingart, in their book The Network Turn: Changing Perspectives in the Humanities (2020), “The conventional network graph of node and edge (points connected by lines) makes it possible to convey a tremendous amount of information all at once, in one view. Networks express an internal logic of relationships between entities that is inherently intuitive.” (The Network Turn 57). The connections shown through these graphs can give researchers information about the material they are studying.

Many projects have used network graphs to help better understand, gain meaning, and answer questions about the material they are graphing. One such project includes Viral Texts. In this project, researchers, including Ryan Cordell, examined nineteenth-century American newspapers and their connections. What they found with these newspapers recirculated many articles, making the researchers wonder why. They found the answers to many of the different questions posed by the different types of recirculated texts related to the culture during the Antebellum period in the United States. Mapping the Republic of Letters is another project that used network graphs similar to Viral Texts. Like in the Viral Texts project, the Mapping the Republic of Letters project used network graphs overlaid on a map to show connections between letters from historical figures. From these connections and the use of a map, researchers discovered the importance of travel and that the letters were a way “of communicating ideas and shaping opinion, and also as a process of intellectual self-definition.” (“Historical Research in a Digital Age” 407-9). Another project that uses network graphs is Linked Jazz. Linked Jazz, like the other two projects, uses these graphs to show connections with the researchers’ material, in this case different jazz musicians. What the researchers discovered was “data about concert performances and recording dates gives . . . rich information about not just collaborations between musicians, but also about time and place, musical works, songs, and songwriters, and record labels and releases . . . .” (Hwang, Levay, and Provo “Contributing to Linked Jazz” 2015). Network graphs can show the connections between material and convey meaning about a time, place, people, or even a genre of music.

I discovered much information and meaning from using these tools when starting with Palladio and network graphs. A simple network graph overlaid on a map shows the connections across the space. It gave me a scope of how and where these connections happened.

I then started playing around with the network graphs and limiting certain information to see what I could find. One such graph shows the topics that male and female interviewees discussed. While they shared many of the topics, some were limited to either only male or only female interviewees. Using that information, I gathered that those topics were important to that gender specifically for a reason. For example, only female interviewees discussed elections which made me think about how they did not get the right to vote like their male counterparts which is probably why that topic is important to them.

Another graph shows the topics categorized based on their job. Again the topic of elections stood out to me since it was limited to only people who worked in the house. It led me to think that perhaps it was because they heard more gossip and politics from their masters while working in the house.

Mapping with Kepler.gl

Mapping in the Digital Humanities is a valuable tool for researchers to look for patterns and trends across space. Todd Presner and David Shepard, in their article Mapping the Geospatial Turn (2016), state, “On its most basic level, a map is a kind of visualization that uses levels of abstraction, scale, coordinate systems, perspective, symbology, and other forms of representation to convey a set of relations.” (Presner and Shepard 247). Researchers can use those patterns and trends to gather meaning and understanding about the material they are researching. There are different ways that researchers use these maps to find patterns and form meanings, including “historical mapping of ‘time-layers’ to memory maps, linguistic and cultural mapping, conceptual mapping, community-based mapping, and forms of counter-mapping that attempt to de-ontologize cartography and imagine new worlds.” (Presner and Shepard 247).

There have been many projects that have used maps to find trends and show meaning in the material that they present. One such project is the Photogrammar project. The Photogrammar project, created and updated from 2012 to 2016, is a website showcasing photographs from the FSA-OWI archive. The program “began as a response to the challenges of navigating the digital and physical archive at the LoC [Library of Congress]” (Arnold 2). The project uses what they call “generous interfaces” to help gather meaning using different modes of visualization. (Arnold 3). Another project that uses maps is the Histories of the National Mall. This project is a website that uses Google Maps to place markers around the National Mall in Washington, D.C. These markers are important buildings, statues, monuments, and areas that hold historical significance in the city and country. Sheila Brennan states, “Our key strategy for making the history of the National Mall engaging for tourists was to populate the website with surprising and compelling stories and primary sources that together build a textured historical context for the space and how it has changed over time.” (Brennan). Mapping the Gay Guides is another project that uses maps to find trends and patterns in their material. The project uses a popular, life-saving book called Bob Damron’s Address Book to place markers on a map of all the bars mentioned in that book to find those trends and patterns. Some of the different topics explored in the project tackle race, gender, and sexuality with each topic containing trends and meanings from the material and map. (Regan and Gonzaba).

When using maps with the Kepler model I was surprised with what I could discover from the material. I have only used maps once before in my projects, and they were simple Google Maps, like with the Histories of the National Mall, and I only did a little analysis with them. Working with Kepler changed my perspective on using maps for research. Even a simple point map can show information about relationships between the different points. However, there is so much more a map can do. This includes cluster and heat maps which can show quantity and density which a point map lacks. A timeline can be added to a map to show how that area and material changed over time. There is much more that a map can show that can give a researcher more information about the material they have presented.