Seeing Literary Texts through DH Tools
Hello readers! Welcome to our second blog. Yes, OUR blog! This time, we’ve changed things up for our English 256B course at the American University of Beirut. WE are a team of 4: Dalia, Mirriam, Raghad, and Sara, and we will be posting our blogs as a team from now on! We hope you’re as excited as we are to know more about our challenges and journey within the digital humanities.
During our Digital Humanities course, we were exposed to a variety of new and unfamiliar DH tools such as CLiC and Voyant tools and how they can be used for methods of analysis. We also learned a lot about the importance of DH and the capability it has of doing for us now, and in the future. Digital humanities have become an essential part of our learning; we apply computational commands to humanistic questions, offering a pathway to conduct research questions. In today’s blog, we’re going to be introducing to you two of our favorite digital humanities tools and explain how they contributed to our work and how they could be useful to you. CLiC and Voyant tools are data visualization tools, using these tools definitely helped associate, visualize and interpret our research questions. Keep reading to find out more about them!
CLiC Logo
The first tool we were exposed to in class was CLiC: Corpus Linguistic in Context; which is a DH tool that allows you to analyze texts through distant reading and “leads you to new insights into how readers perceive fictional characters”. This tool is extremely straightforward and easy to understand! Let’s give you a quick tour: when you first enter the website you’ll find general information about the application and a user guide for further assistance on how the tool functions. You’ll also find a link attached in order to cite anything from CLiC in your work (yes, we’ve also done that as well!). Next, if you look at the right-hand side of the page, you can see there are different tab options arranged vertically (Concordance, subset, cluster, keyword, count, and text tabs) that you can choose from to begin your analysis!
There are five corpora in total on CliC: So far, the website includes the following: the Dickens novels, many 19th-century reference books, Children’s Literature, African American Writers and ArTs (which are Additional requested Texts!).
With CLiC you have several interesting features: you have the option of choosing more than one literary text (also known as corpora), limiting your search to quotes or searching for just one word in a specific text… For example, 2-gram means 2 words and choosing ChiLit as your corpora would mean you want to conduct your search through children’s literature books (which is a total of 71 books on CLiC!). Next, the Concordance and Subsets tabs both display texts (patterns) from the selected books in context. This is where you can analyze the use of particular words and phrases. The Clusters and Keywords tabs both show lists of frequent patterns (without context), but they differ in their applications. The Clusters tab lists frequent words and word sequences in a single corpus (or several corpora if you have selected more than one). In the Keywords tab, you can compare the frequency of words and clusters in one corpus with another.
Now that we’re familiar with the DH site, we decided to start off our research by selecting books from the CLiC Corpora that we were all very familiar with. We chose Children’s literature to be our Corpus, and we selected two books that we all loved as children and thought would be interesting to test out: The Jungle Book by Rudyard Kipling, Alice’s Adventures in Wonderland, and Through the Looking Glass by Lewis Carroll!
We were initially curious to check out the keywords available in each of the two texts (with a reference to ChiLit) to have a better understanding of what we could possibly create as research questions. This was not a very easy task at first, but once we tinkered around with the website, and once we had searched for several words through the Concordance tab as well, we thought of some ideas that could yield interesting results.
The Concordance and Keywords tabs were the main features that we explored to build our research objectives; however, we were curious to explore the rest of the tabs such as “Counts”, where we found the counts for In Quotes, In non-Quotes, In short suspensions, Total Words, and more. We did not use the Clusters or Subsets tabs, however, we were introduced to them before completing our research exercise (and we gave you a short intro on them just a while ago!).
Picture from https://i.pinimg.com/originals/93/78/9e/93789ecb01d583dbdd4fa00d0223cde2.jpg
Starting with The Jungle Book by Rudyard Kipling, we started off with searching for keywords. We noticed that most of the keywords were related to the animals and characters that were in the jungle. It got us thinking about how we could go deeper to analyzing the results. Knowing that the story was about a young boy in the jungle, we decided to search for the term “boy” through the “Concordance” tab. We realized that this term was frequently used throughout the book.


To take our research to another level, and combining the previous two steps together, we wanted to test how frequently the term ”human” came up in the text. We were surprised to notice that it only came up three times throughout the story. At that point, we had to participate in close reading by reading the sentences to take a closer look at the context where the terms “human” and “boy” appeared. We came to the conclusion that because the boy lived and grew up with the animals of the jungle, he was not considered human relative to the jungle kingdom.


We can also back our conclusion using the below results found from the keywords tab, which mainly shows a list of animals and names, along with Mowgli, which is the boy’s name.


We found our analysis to be a very interesting one and continued testing on Alice’s Adventures in Wonderland by Lewis Carroll.
When thinking about Alice’s Adventures in Wonderland by Lewis Carroll, the white rabbit is one of the first things that comes to mind. So we decided to search for the term “rabbit” through the Concordance tab and realized that it was frequently used.
We figured that was the case because he was obviously one of the main characters in the book. In fact, the white rabbit was what led Alice down the rabbit hole in the first place (this is actually because the rabbit symbolizes the spark of curiosity in Alice at the beginning of the story!), and he was always checking his pocket watch and rushing to be somewhere as if he was late to something! This further got us thinking of the term “time”.
When the results came to show that the word ‘time’ was so frequently used throughout the text and not just when talking about the rabbit, we realized how important the concept of time was in this book. This got us thinking after further investigation, (and possibly a bit of prior knowledge and some close reading…) that in some parts of the story the rabbit actually symbolizes time (MIND=BLOWN) and that Alice was never able to catch up to the rabbit (so she basically couldn’t catch up to time), whereas in other parts of the story ‘Time’ was a character that could be manipulated (they could change time as they pleased since it was Wonderland). This made us realize that there is so much more meaning behind a simple children’s book!
Next, we decided to search the same terms (‘time’ and ‘rabbit’) in the sequel of Alice’s Adventures in Wonderland: Through the Looking Glass. Results showed that ‘rabbit’ wasn’t found in the text through the Concordance tab on CLiC, but ‘time’ was more abundant compared to the first book. At this point, we were unable to analyze what our results meant, and had to resort to old-fashioned reading of some phrases to understand what was going on. Once we gained some extra knowledge about the storyline by investigating some of the results from the Concordance, we attempted to compare both books and realized that there might be some sort of connection as to why the ‘rabbit’ isn’t found in the second book. As we mentioned above, the rabbit was always in a rush in Alice in Wonderland whereas in Through the Looking Glass he disappeared but time was more abundant (through our analysis on CLiC). Additionally, we thought that maybe the rabbit caught up to time? Therefore we inferred that the concept of time in the book could possibly play a larger role. Combining both our close and distant reading results, we concluded that the first book revolves around the concept of Alice always trying to catch up to time whereas the second book revolves around Alice manipulating time.
To up our game, we then decided to compare The Jungle Book to Alice’s Adventures in Wonderland! In order to choose our research questions, we decided to refer to Google, our best friend, to read summaries on both books to refresh our memory and possibly shape some ideas we would like to explore. We chose to focus on some common attributes of both books. Knowing that Mowgli was a young boy who lost his parents, and Alice was lost in her own imagination, we chose to investigate to what extent these protagonists were lost, and how “lost” was used throughout the texts. We searched for the term “lost” using the Concordance tab, and noticed that the term was found more frequently in The Jungle Book. To grasp a deeper understanding, we decided to focus on the emotions of the protagonists as they go through their adventures. Terms such as “wonder” and “curious” were also searched through the “Concordance” tab, and results showed that they were only found in Alice’s Adventures. Looking closer at the sentences, we came to the conclusion that Alice was more inclined to involving herself in unknown situations.
Throughout all our above tests using CLiC, we were forced to alternate between distant reading, via simply checking the results shown, and close reading, via reading the sentences and understanding the context where the words appeared. Our most significant results could not have been made without the contribution to both types of reading. Without distant reading, we wouldn’t have been able to attain such information at such short notice, and without close reading, our results would have been meaningless. Both are just as important to utilize when using the CLiC tool. We believe that our analysis of the comparison between the two books was the most interesting amongst our research trials.
Voyant Tools Logo
Now that we’re done from CLiC, we can focus on Voyant Tools, which is what caught our attention more! It is a free online textual analysis site for digital text and is very easy to use. Voyant Tools is to some degree similar to a scanner, thus to use it effectively we have to start with a research question. For instance, “For what reason did the author incorporate such a large number of negative feelings?” The question will lead us to different inquiries by which we can utilize the tools for our potential benefit to analyze more. Voyant accepts texts in a number of ways, either by copy-and-paste of URLs or uploading files from your computer. You also have the choice of uploading multiple files as a corpus or using one of Voyant’s built-in test corpora which are only two (Shakespeare’s Plays or Austen’s Novels). After that, you’ll be presented with five primary tools: Cirrus, Reader, Trends, Summary, and Contexts.
In order to perform our analysis, we chose the three following Lewis Carroll nonsense fictional novels. We included in the corpus Through the Looking Glass, Sylvie and Bruno, and Alice’s Adventures in Wonderland.
We found these texts from Gutenberg.org and saved each one as a plain text (the year it was published then name of text) and uploaded it. We were curious to explore all the features Voyant had offered, and Cirrus was definitely the first thing that caught our attention because it was the nicest to look at *mesmerized*.
Lewis Carroll was best known for his imaginative writings for children and his work had become the most popular children’s book during his time. His books were also a great exploration of language and were famous for his writing style of lyrical nonsense in his works. As a group, we were interested in researching and learning more about his language in his books. Considering the fact that Lewis Carrol’s main audience was children, we wanted to test out if it really was the case, since his sophisticated language and ideas within the text could also be enjoyed by adults. We first thought about the term children (since we noticed this term was found in the Cirrus) and this got us thinking that the frequency of the presence of this term could possibly determine if Lewis was really addressing children or not. Our question as to how frequently the word children was answered using the Cirrus option on Voyant Tools, which is a word cloud that displays the frequency of words-the larger the word, the more frequently it occurs. We were also able to manage the list of words that Cirrus excludes (such as ‘the’, ‘is’… etc.) by clicking on ‘Define Options’ then edit the list on stopword and type the word(s) you’d like to exclude.
The Cirrus was our starting point to highlight the presence of the term “Children”
Frequency of the term “Child” (197) and “Children” (103) in the texts of Lewis Carroll
Moving on, since the most occurring word was Alice (obviously since she was the main character), we decided to further look into context to see the overall tone of the text. Given that it’s a children’s book, one would expect it to have a playful tone. Something we stumbled upon was that the word Alice was always incorporated in context with the animals she interacted with. As we previously discussed during our CLiC analysis, the word ‘Rabbit’ occurred frequently. Other words like ‘Mouse’, ‘caterpillar’, and ‘Gryphon’ were also found on the list of most occurring words. Alice’s interaction and conversations with animals further demonstrates a playful and positive tone. Also, using animals in books is expected in children’s literature since it helps spark their imagination, provide information and even teach moral lessons. Voyant Tools helped us spot patterns and draw attention to trends previously hidden in texts and the list of most used words and their context aided in finding out the attitude of the book without actually reading it (this is termed distant reading!). In this case, distant reading was more beneficial than close reading.
Another thing we found interesting and worth bringing up was the fifth chapter of Alice in Wonderland, ‘Advice from a Caterpillar’, where Alice is introduced to a wise insect. Alice’s adventures express the importance of imagination and adventure throughout childhood, and the story acts as a progression of how children grow into adults both physically and emotionally. Lewis Carroll builds this image of Victorian England through the language he uses throughout the novel, and it is particularly evident in the conversation between Alice and the Caterpillar. The word “Caterpillar” occurs 25 times whereas “Alice” occurs 27 times.
Upon analyzing the word trend graph, it illustrates the exchange in conversation regarding the advice the caterpillar is instilling upon Alice.
The word “Caterpillar” suggests a theme of evolution or transformation.Words such as “youth”, “minute”, “life”, “youth”, “old”, and “beginning” suggest a theme of a self-journey to find one’s self. Words like “inches”, “mushroom”, “height”, “little”, “grow”, and “size” illustrate the caterpillar’s instructions for Alice to eat in certain ways to grow or shrink in any given situation.
On another note, Voyant tools allowed us to study the different emotions present in the three texts, as well as their frequencies. We noticed that sadness associated with crying was far more present than moments of happiness and laughter felt by the characters of the book throughout the story.
Analysis of the emotions in Alice in Wonderland
Both Voyant and CLiC must yield similar results, but their slight differences in the sites and their tools may bring different results into light. Once we uploaded Alice’s Adventures in Wonderland and Through the Looking Glass onto Voyant, we realized that we could easily confirm some of our analysis through easy graphical and visual representations. We searched ‘time’ and ‘rabbit’ again and focused on the graphical trend results.
Looking at the bigger picture for all the texts we analyzed, what caught our attention through cirrus and keywords was the use of animals. Animals have held an important place in written literature for hundreds of years. Starting from the mid 1800’s and early 1900’s, the late Victorian Period is often referred to as the golden age of children’s literature. This is when we realized we wanted to investigate further the use of animals in children’s literature and their role and how they differ from one another. We decided to dive deeper by researching the significance of anthropomorphism (giving animals human characteristics) and how it affects children since this is an effective method used by authors to submerge children in the story and provide characters with more effective communication abilities. In order to test this research question we added The Jungle Book to the previous texts by Lewis Carroll.
According to some online articles the use of animals as characters in children’s books, is more appealing than human characters and is sometimes easier for a child to grasp an idea or concept. We found a scholarly article that claims that animals such as dogs, cats, chickens, pigs, rabbits, ducks, and bears are very familiar in children’s literature. We decided to further investigate if these animals are also common in the stories we chose and the results from Voyant easily showed that some are in fact present in these books. We chose Voyant to conduct this analysis to easily visualize the trends without looking at the whole text.
Our analysis does not stop here however, since we would like to know what roles they play and how they can affect children. So we started brainstorming simple concepts that could be connected to moral lessons such as: good, bad, evil, hero, creativity, art, fun, imagination, feelings and many more…
Animals are considered the perfect medium for conveying tangible and intangible concepts in an entertaining way. In Alice’s Adventure in Wonderland the animals were there to teach her moral lessons whereas in The Jungle Book the animals represented the good and the bad (Baloo the bear was good, whereas Shere Khan the tiger represented evil).
As beginners into the world of digital humanities, both CliC and Voyant are definitely user- friendly web-based tools that help dissect and understand texts. What’s interesting is that there are so many more features that we can benefit from to gather extra information that we might not have thought about once we originally thought of our research questions. For example, the ‘vocabulary density’ in Voyant Tools can tell us a lot about the book or text as well. Vocabulary density is a quantitative value that indicates the level of difficulty of the text; the lower the density, the more complex the text (and the more likely it has unique words), while a higher density indicates a simpler text with more commonly used words. Such features bring new ideas to the table and widen the scope of research. There are lots of more features that we have not utilized on both CLiC and Voyant Tools, and it takes a bit of investigation and sometimes relevance (or importance) to know what else we could find hidden within these tools.
Although they may have multiple features, after the first introductory session in class on how to use these tools, we directly got the hang of them as they’re both quite simple! They are educational online trusted institutes that provide quick easy access to results in seconds. They are both remarkably powerful as they can handle large amounts of texts with considerable speed and ease.
What’s special about both tools is you can easily order texts by date of publication and view word trends. Data provided in both tools are quantitative; we can find word frequencies that cannot be easily done by us humans. Such tools can and should be used as supplements to close reading, and not as their replacement. For example, they can provide quantitative confirmation of patterns that you notice in a text and allow you to locate specific words or phrases within a large corpus to make your own larger analysis. Thus, these tools only provide extra information that you may not necessarily be able to extract easily. Additionally, it’s great that you can easily compare two on both CLiC and Voyant Tools, which makes analysis between stories interesting.
An interesting feature that Voyant Tools possesses compared to CLiC is that data is presented through visual representation which is more attractive and easier to read and understand, thereby facilitating their retelling. It can help you spot patterns, uncover and draw your attention to trends previously hidden in texts, and cause you to make new inquiries. As a result, some people may prefer to use Voyant Tools instead, but then again this is a personal preference and may also depend on what you may be looking for.
Despite the advantages these tools possess, there are unfortunately a few limitations to both such as their inability to do the complete analysis for you, and so close reading must be done to some certain degree, For example, the tools cannot understand irony and sarcasm, and these descriptions must sadly be deterred by us humans. It would be pretty amazing if they could though! On the bright side, at least these tools give you starting points through distant reading to analyze things that could lead you to something bigger. The computer surely offers readers multi-directional lenses to the texts compared to if you were to analyze on your own through close reading, which would be much more difficult.
Through our practice in class and our own research using these tools, we have become very familiar with them. Surely, we will be using them again especially for our final project. This is the end for now, but not for long… We’re coming back for you soon with another blog and yes as a team 🙂
In the meantime, how about you try the DH tools yourselves? Check out CLiC using the following link: clic.bham.ac.uk. You can also tinker around with the Voyant DH tool using the following link: https://voyant-tools.org/. We’d love to hear about your experiences, so send us messages using the below comment tabs! Cheers!
Written By: Dalia Bekdache, Mirriam Hijazi, Sara Deeb, and Raghad Sheronick
Citations:
- Mahlberg, M., Stockwell, P., de Joode, J., Smith, C., & O’Donnell, M. B. (2016). CLiC Dickens: Novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora, 11(3), 433–463.
- Azmiry, N. (2014, December 28). Animals and Their Functions in Children’s Literature Since 1900. Retrieved from https://www.academia.edu/11220842/Animals_and_Their_Functions_in_Childrens_Literature_Since_1900