This week’s challenge for Digital History was to use the text analysis web-based program Voyant. From the Voyant homepage one can copy and paste a URL or text into the box, or upload a document or zip file. Through the reveal button, Voyant magically goes to work examining the text and determining the most common terms used within the text.
I copied and pasted the URL for the Toronto Star into the Voyant search box. After hitting the reveal button a new page opened up, that displayed the title ‘www.thestar.com|Toronto Star|Canada’s largest daily,’ along with the word tokens, word types and density. Opening the document further revealed a more detailed summary of the URL in seven different windows:
Each window acting as a different textual analysis tool for determining and comparing the common terms used in the text. The ‘Cirrus’ tool creates a pretty compilation of the most common words, while the ‘Corpus’ tool provides a word count, highlights particular common words in the text, then plots the information on a graph. Through this tool, one can compare the commonality between terms.
But wait… wouldn’t the most common terms be words like ‘a’, ‘the’, ‘is’? Voyant has figured that out…through options one can omit certain words and refine the results so they show more maybe untypical common words.
This was the Cirrus image I got from the Toronto Star URL (after omitting a number of common words like newspaper and subscribe):
What I thought was really interesting was that the word ‘digital’ was the most predominate word that was displayed on the main page of the Toronto Star…maybe they are catching onto the new digital sphere of media. Similarly the word ‘video,’ is common, which adds to a change towards digital online news. The other common words are predictable of what I thought would have come up, this includes terms relating to Toronto and the newspaper itself, such as ‘leafs’ or ‘gta’. Apparently ‘bieber’ is really everywhere too.
After looking at the Toronto Star, I wanted to see what would happen when I put in the URL for its counterpart; that of the Globe and Mail. The following is the Cirrus image from the Globe and Mail (after omitting some words):
What is interesting about the Globe and Mail’s Cirrus image is that ‘business’ and ‘blackberry,’ are the most used words from the main page. This to me adds to the perception of the Globe and Mail being aimed more to a more educated and business savvy reader. Similar words appear between the two newspaper’s images, but are of different sizes, which is interesting when they focus on the same news of the day. Words such as ‘digital’ and ‘video’ appear in the Globe and Mail image as well, but are smaller in comparison to that of the Toronto Star.
After voyaging around Voyant, comparing common terms and omitting certain words to see changes, I started to wonder how can this site be useful in our society?
1) Advertisement: I plugged in Tourism London’s URL, and it made a great image with the common terms, ‘museums,’ ‘sports’, ‘family’, and ‘culinary.’ This is a simple way to summarize what the city (or any place) has to offer to its interested visitors.
2) Trends in Society: I then put in the ‘American Top 40’ URL, as I wanted to see what were the common terms amongst the most popular songs today, and who would be the artist that would stand out. ‘Love’ of course was the most common word used, followed by ‘gone’, ‘night’ and ‘Justin.’ The problem with names is that it could be different people; Justin Timberlake or Justin Bieber, one has to look more at the Corpus listing of the term.
3) Themes in a book or journal article: Lastly, I copied and pasted a full article on ‘Picasso and Iberian Sculpture’ by James Sweeney, into the Voyant search box. The common words that came up included, ‘African’, ‘bronze’, ‘Paris’ and ‘art’. All these words are all predictably common to the article, or what one could extract from an abstract on the article…but Voyant can do it in seconds and then terms can be compared on the graph and a word count is given for each.
In the end, I think that this a cool tool to utilize, but really it is very similar to tags that someone uses for a blog, or hashtags# that someone would use for twitter; summarizing the key points for the viewer.