Author Archives: Jade Harris

Analyzing Subjects and Intention (Text Analysis)

After reading the class assignments related to text analysis, I found the idea of analyzing large sets of text to be potentially exciting, insomuch as esoteric subjects are dependably vaguely thrilling on the outset.  Learning about text analysis and more specifically that practise called distant reading with a mind to eventually imitate something like it, seemed to me like peering into an exclusive clubhouse (which I can see in my mind’s eye as a cavernous hall furnished with ornate tapestry and filled with distinguished, singular types of people). How fortunate and how luxurious, to contemplate what vast amounts of characters might reveal once compiled and compared, much less set about the task itself! 

So I humbly undertook to use what was advertised as the simplest and easiest tool for the job — that platform called Voyant. My first idea was a bit grandiose, inspired as I was by the course readings.  I wrote in my notebook for a bit about some thoughts I had on this trend in many parts of my own life around the topic now called “DE&I”, or diversity, equity and inclusion.  

I thought it might not be too difficult, to pull some information down from the web for an analysis of the use of some key words and the percent change, if you will, from the 1950s and 60s (let’s say), to the 2000s and 2010s.  I thought I could focus this research around possible words used in children’s books, as I also had an idea that they might be more readily available than some other works (and more straightforward too).  

With children’s books in mind I altered the premise somewhat from “DE&I” per se (which is rather adult) to words that execute on that stratagem, such as “encourage”, “together”, “different”, “share”.  

But.  It is not practicably different to access children’s books as plain text vs adult books, nor is it particularly easy to in a short while comb through different kinds of these books to compare/contrast ones that will provide even something close to a coherent sampling for study. 

So I changed tactic and decided here we are in an election year, with a pattern of words basically doing our heads in.  Shall we compare the mention of key words at the front page of major newspapers on any given day, and can we draw from that any insight into a publication’s authors, audience, ownership, point of view?  What else could we see?  

A few well-known papers:

On the NYtimes.com: we can see that the focus in text is almost the same on Trump vs Biden,  and on seats gained vs lost.   

Washingtonpost.com: very similar to NYtimes.com.

WSJ.com (the Wall Street Journal) provides a different view, with more emphasis on “election” in general.  

With the understanding and disclaimer that it may be over-simplistic and not thorough at all to use the data visualizations only as proofs in this case, for the purpose of illustration here I copy/paste examples of the Voyant-generated word clouds for the sites mentioned. See below:



WSJ.com (Wall Street Journal)

Distant Reading, Race, and Gender

Topics top of mind from this week’s readings:

distant reading. 

historical context and perspective.

big data.

text analysis.

gender and text analysis.

race and text analysis.

sexual harrassment and misappropriation of power.

the gender and race docs were interesting (most interesting) to me this week; both strongly expressing that even distant reading can produce data that can be misread to the advantage of proving a prior theory. what should be undertaken (according to these authors), in the course of conducting this research, is an examination of how certain collections have come to be, and how they may represent falsely distinct collections. using the examples of “white” and “black” race or “male” and “female” gender, we can see that quite a lot is missing from these attributes. not only the affect of identity intersectionalities, but also the definition of what these terms mean in a cultural context from time period “x” to time period “y”.  

I thought the gender piece very clearly presented the case for more flexible gender classification and the case against a binary gender mindset and against the conflation gender and sex. the analogy of gender to genre was particularly successful as an aid. Likewise the race piece argued that while it’s possible to find clear commonality among white authors and among black authors, the numbers on closer inspection show the very slight percentage that makes up the difference, and the reason for the separation in texts’ indicators is not quite as clear as it seems when reading the high-level findings of the data. 

distance reading, text analysis can provide such a wealth of insights and with the expansion of computational power the possibilities for scholarly discovery are immense.  but this area of research going forward will be best served by attention to context, history, and representation in order to provide information that is as holistic and evenly considered as possible. 

Getting acquainted with QGIS

The mapping quest: 

What to map: I’m interested in the population density of school-aged children compared to per-household property tax contribution (let’s use Brooklyn, NYC). The reason it’s on my mind: a recent conversation I had with a friend who remarked that his school taxes were high (he felt) compared to the number of students in (his) area, and (he suggested) mismanagement and/or misappropriation of funds.  Who knows, on all counts.  Certainly, not I.  I am curious. 

How to map: Sticking with the basics for now, I only aspire to plot some data onto a static graphical streetmap. A good way to show this information might be to illustrate school-aged density on a map (with color saturation), then overlay an average $ amount of property tax assessments in the same location(s).  This does suggest the map would need to be interactive, but I plan to start with a static map due to my complete lack of experience with mapping.

Finding the data: I thought this information – child numbers and property tax figures – must surely be available in some form from NYC’s open data sets, so I set that aside for a moment and concentrated on the tools to eventually wield and display this data.  But to be on the safe side, I quickly peak in NYC OpenData. Ugh, this is going to be tougher than it looks. Tax with location information easier than child numbers. And child number with location data? ???

Acquiring the tools: Based on the summary of information in the “Finding the Right Tools for Mapping” article, I decided to try the QGIS application.  I believe in free, and I also have a Mac.  Having an older Mac, I found that I had to first download Python to use the version of QGIS available to me.  So I endeavored to install that. I had to choose an old Python too. Python told me me cheerily that its installation was successful, and I took it at its word.  I moved on to QGIS.  Because my OS is so old, I had to look for “previously released” installers, which I searched for by date, using a date in about the same timeframe as the Python version I just installed.  I settled on the “official” installer vs the “kyngchaos” installer, not knowing the difference.  And ‘kyng chaos’ is not a computer-install friendly name, in my opinion. 

Well, having downloaded an ancient (2018) version from the QGIS archive, I looked in the directory and opened the “read me” file, as I was bade. Besides a bunch of notes for people who know a lot more than me, of note is that the text was signed “William Kyngesburye”, so that’s kyngchaos and I don’t know, maybe he’s OK. I clicked over to his website (it’s not a virus) and he’s quoted Tarzan and brandishes a yin and yang symbol.  Admonishments duly noted, I proceeded to install.  After some typical “you’re not allowed unless you really want to” hijinx from my anxious computer settings, I proceeded to install buckets of software packages from kyngchaos. Once that was done, and becoming a bit nervous from ignoring several “Very Important” messages in large red text, I attempted to open the newly installed QGIS application. Well, it opened!  Amazed.

Learning QGIS: After I opened QGIS, of course I realized I had know idea how to use it.  I went off to find the training guide at the QGIS website.  … Some time later, I had a lovely export of a png map of the first training software exercise, using the provided test data (that I had to download separately – because old computer). The training set uses data about lakes on a map of Alaska. Of course I checked with the oracle, Google Maps, to confirm that these are actual lakes on an actual map of Alaska. 

Because my version of QGIS is necessarily older than the current one, using the training guide presents challenge, as I’m teaching myself to use an unfamiliar system with unfamiliar terms on a platform with not insignificant differences from the manual.  For example, project files are now saved in a zipped format, but in my version they are saved out uncompressed.  No big deal, but the file name they mention is different and for several minutes I searched and searched for something… that wasn’t there and wasn’t going to be.  Another example: the Navigation tool bar and Menu have been updated (in the manual and current version).  Disorienting, but OK. Another: the Layer properties inside some data types (notably the vector layer which used a .gml file) are different.  Well, I can pretty much guess… but you get the idea.  Builds character.

I decided I need to learn more about QGIS to do anything with it from any available open data source.  But do I appreciate more what people do when they start from zero and build a map? I do!

… Then I thought about commercial more WYSIWYG solutions… I watched the Tableau software ‘sizzle reel’/”see it in action” video.  Wow technology! 

Defining DH through Colored Conventions

“Colored Conventions”  

It is my idea, that Digital Humanities (DH), taken as a whole, is a field central to providing an underpinning of empathy and service in the sometimes self-serving fields of academia and technology. It is a practice and continual process of outreach.  If DH can be briefly described as: an area of scholarship that provides support for and insights into societies by employing, manipulating, building, and advocating for digital technologies… then its continued propagation into the general ‘universal’ lexicon of our web-literate modern generation is assured.  

The website “Colored Conventions”, which by its own introduction exists to: “[bring] buried African American history to digital life”, acknowledges its purpose being rooted in “social justice activism”.  Colored Conventions clearly illustrates an outgrowth of DH as defined above —  an active scholarship that explores – and often creates – communication pathways using the digital space.