Redefining Difficult: A Post-Bac’s Perspective on the Accessibility of Digital Scholarship Practices for Recent Grads

[The following is the text (and some fun gifs) from a presentation I gave last week at the Bucknell University Digital Scholarship Conference. Thank you to Bucknell for hosting such a fun, student-centered conference, the Mellon Foundation for providing me with funding to attend the conference (and my entire job), and Ben Daigle for helping me hone my message and feel confident in my presentation!]

I want to start by telling a story from when I was an undergrad. I was in a Literary Theory class and we were reading Lacan, not secondary explanations of Lacan. Actual Lacan. For those of you who haven’t read it, here’s a couple of stereotypical sentences from one of this works, The Agency of the Letter in the Unconscious.

“And we will fail to pursue the question further as long as we cling to the illusion that the signifier answers to the function of representing the signified, or better, that the signifier has to answer for its existence in the name of any signification whatever.”

“It is not only with the idea of silencing the nominalist debate with a low blow that I use this example, but rather to show how in fact the signifier enters the signified, namely, in a form which, not being immaterial raises the question of its place in reality. For the blinking gaze of a short sighted person might be justified in wondering whether this was indeed the signifier as he peered closely at the little enamel signs that bore it, a signifier whose signified would in this call receive its final honours from the double and solemn procession from the upper nave.”

So my homework was to read 22 pages of that. I realize that I took both of these out of context, but believe me, it’s hard to understand regardless. I showed up to class early and I asked my friends, did you understand this? Did you get what he was saying? None of them had, so I felt a little better. Still, we frantically Googled anything about the chapter that we could find so that it didn’t seem like we hadn’t read it at all. We went into class and had an amazing discussion about the chapter and what it meant and how it was intentionally written to be kind of confusing because that’s the point he’s making. With the help of the professor, I came away with a much deeper understanding of Lacan than I had by simply Googling. Two years later, I was in a children’s literature course and we read Frindle, which is a children’s book about a boy who calls his pen a frindle and essentially changes the language in his school and then in his town. It’s about who has the power to decide which words are counted as real words. Who gets to make them up? How do they become legitimate? What does the power to determine the validity of a word mean and how is it enforced? It was about Lacan and it illustrated the concept of signifier and signified and the literal and figurative power struggle between them in a little chapter book that I read in like 45 minutes.

Alright, let’s all keep that story in mind while I start talking about digital humanities.

I’m going to talk about my position and trying to learn about digital scholarship, and then I’m going to talk a little bit about the issues I see within digital scholarship from an accessibility standpoint.

First of all, what is a post-bac?

A post-bac is basically a post-doc, only it takes place after getting a bachelor’s degree. This specific position was created because the members of the Digital Collaborations Group at the Five Colleges of Ohio found that a lot of the time when a faculty member undertook a research project, there was usually a student researcher who took on a lot of the technical work and helped the faculty member complete what were usually very interdisciplinary projects. The DCG decided it would be helpful to have a person who was fully dedicated to learning about digital scholarship in order to assist faculty members with their research. In short, it’s my job to research different options and opportunities within the practice that I could help students and faculty incorporate into their pedagogy.

I’ve also been given the opportunity to develop my own research project that incorporates some of the digital scholarship techniques that I’ve been teaching myself and that project is currently underway.

So now you’re next question is probably going to be, so why did they hire me?

Well, I graduated from Denison University in 2016 with a degree in creative writing. Prior to taking this position, I had had a lot of jobs.

  1. Taekwondo Instructor
  2. Babysitter/Chauffer
  3. Math Tutor
  4. Traveling Mattress Salesman/Cashier
  5. Diner Waitress
  6. English Department Assistant
  7. Oral History Research Assistant
  8. Off-Campus Study Administrative Assistant
  9. Marketing Consultant

If you’ll notice only one of those is directly related to digital scholarship. So the point I’m making is that I don’t have any background in coding, computer science, web design, math or statistics, or really anything having to do with the digital. I am a millennial, I do remember a time when I had to wait for my mom to get off the phone in order to get on the internet.

But despite my ability to set my grandma’s phone to easy mode or share a youtube video on Facebook, I don’t actually know a whole lot about computers. Surprising, right?

I wouldn’t say this came as a huge surprise to me. I knew I didn’t know what I was doing when it came to coding or anything relating to computers that don’t have a GUI. But the point of this post-bac wasn’t to hire someone who already knew how to do everything related to digital scholarship, it was to hire someone who wanted to continue learning and to leverage the research undertaken by the post-bac to assist with student and faculty projects. Which is awesome. I’m basically being paid to teach myself. And that’s the background that I’m coming with to talk about accessibility in digital humanities.

Continue reading “Redefining Difficult: A Post-Bac’s Perspective on the Accessibility of Digital Scholarship Practices for Recent Grads”

Week 28: Summer Reading Roundup

The internet is not a utopia. I know, this isn’t exactly a revolutionary, new idea. A lot of people have made this critique. There are a lot of problems with the internet. It’s overrun with racist, sexist, homophobic language. I pretty strongly agree with Sarah Jeong’s theory that the internet is mostly garbage. Amazon and Google and Facebook are all collecting our data and selling it to companies and political campaigns. And no matter where you go on the web, infinite ads flash in the corners, the middle of articles, and interrupt the videos you watch.

But the internet is not a total dystopia either, as many critics have claimed (see reading list). The internet and the digital in general seem to be forms that provide infinite production and zero consumption, and while this isn’t exactly true, the abundance and availability of information and art has dramatically increased with the internet. The internet has created the opportunity for remixes and play and new forms of community. It has created new forms of communication and provides an unprecedented amount of instantly available information. Some bloggers have likened early YouTube to David Graeber’s concept of baseline communism, that is “the raw material of sociality, a recognition of our ultimate interdependence that is the ultimate substance of social peace.” On YouTube individuals create content not to make money but instead to help and entertain one another with tutorials or silly songs. Early YouTube–that is YouTube prior to the ad revenue–created a platform on which almost anyone could upload and share their knowledge, and millions of people did this, not for their own gain, but just to help other people because we are all ultimately dependent on one another, and it brings us joy to contribute to the greater good, however small the contribution. For all of the racism and sexism on the internet (which Sarah Jeong points out is mostly run by bots created by a small portion of the population), there are examples where humans help one another and give to one another without the expectation of receiving anything in return. If the internet has done any good, it has shown that humans are not innately selfish, calculating machines, but are instead constantly striving to build communities and help one another, flawed though their attempts might be.

This article is my attempt to provide an overview of what I believe are the most pressing issues created by the internet and the digital based on the critiques I have read throughout the summer. My goal is to show how each of these problems is related to and feeds into one another, and show how we might begin to think of a world that actually changes and responds to these problems.

Continue reading “Week 28: Summer Reading Roundup”

Week 21: DHSI

I spent last week attending a class on Ethical Data Visualization at the Digital Humanities Summer Institute (DHSI) in Victoria, Canada. It was a fantastic experience, not just because I enjoyed my class but also because I met some amazing friends with similar academic interests to my own.

In class we learned how visualizations can be constructed to support a misleading narrative. Here are a few examples:

Problems with this visual (to name a few): It shows an arbitrary amount of time to disguise the rise in the trend line that would be shown if the first big spike was taken away. (What does that since 1750 even mean?) It’s showing change over time, so a straight line actually shows a steady increase. The arrow is an intentional optical illusion to trick you into seeing more of a horizontal pattern than there actually is. 

Problems with this visual: There are two y-axis that aren’t labeled, so it seems to make the two lines much more comparable when in reality they aren’t even that close. The colors are highly suggestive and are intended to play off of the public’s bias to think of women as feminine and delicate (pink) and abortion as murder (blood red). This doesn’t even kind of take into account any of the other services provided by Planned Parenthood. Cancer screenings are only going down because there are more effective methods for screenings which require fewer tests. There are no points between the two end points on each line, leaving out a significant part of the data.

Problems with this visual: For one thing, it doesn’t add up to 100% even though it’s a pie chart. The colors are misleading (blue used in chart is the same blue as the background making Huckabee less prominent). It’s hard to read the source. It’s tilted, which distorts our understanding of the amount of space each slice takes up, which makes Palin look like she has the smallest chunk of magical 193% pie when in reality she has the most support. The divisions between the pieces of pie are absolutely arbitrary. The word “Back” doesn’t need to appear on each label.

I would like to point out that the first two visuals were used by actual congressmen during congressional testimonies and the third was used on live TV. The bar is literally as low as it can possibly go. It’s not too hard to make visuals that are more ethical than these. 

At the same time, we debated the possibility of creating a value-neutral/non-political visualization. Is such a thing even possible? Is it what we should strive for? Or should we attempt to construct a narrative using our own values and politics with the understanding that one of our many values should be the truth? Most of the class seemed to agree the second was a more realistic and fruitful option.

I also learned a bit about R, including how to import data from a csv and create charts using this data. However, I also learned that R isn’t the ideal program for the type of visualizations I would like to create. Javascript would be better, but as I cannot learn it fast enough, I’m going to use Power BI for most of the visuals I intend to create for my free speech on campus project.

Finally, and I think most importantly, the idea that computers are human-made machines, not magical black boxes of science was repeatedly reinforced throughout the course. It seems obvious but given the amount of propaganda we consume on a daily basis, this is an important message to reinforce. Just because a visualization looks official and uses numbers and cites a source, doesn’t mean it’s an accurate reflection of reality. Every choice we make with our visualizations impacts the narrative it conveys to the audience. Computers aren’t inherently value-neutral because the data we feed into them isn’t value neutral and they’ve been programmed to parse it in a particular way–a systematic way, but necessarily in a way that reflects the messiness of the world. It’s just like Miriam Posner’s quote about data-based work that I quoted in my first ever blog post:

I would like us to start understanding markers like gender and race not as givens but as constructions that are actively created from time to time and place to place. In other words, I want us to stop acting as though the data models for identity are containers to be filled in order to produce meaning and recognize instead that these structures themselves constitute data. That is where the work of DH should begin. What I am getting at here is a comment on our ambitions for digital humanities going forward. I want us to be more ambitious, to hold ourselves to much higher standards when we are claiming to develop data-based work that depicts people’s lives.

The data is what you make it. The visualizations are what you make them. On the one hand this is empowering. For so long, we’ve been led to believe that there are some things that are just set in stone when it comes to computers and data, but it doesn’t have to be this way. On the other hand, this is a little terrifying. It’s a lot of responsibility to make an ethical visualization. We can’t use computers as scapegoats for the inaccurate narratives our research may display. It’s up to you to be intentional with your work.

Outside of class, I talked with dozens of wonderful people and made some great friends. It was really helpful for me, personally, to talk with academics close to my age who are interested in areas of research similar to my own. I got a lot of advice and I’m more than a little excited to go on to grad school and maybe (probably) to get a PhD.

Overall, it was a great week (even with the cold weather)!

Week 18: Youtube Tutorials and Coding with R

I know absolutely nothing about R, so in order to get started, I decided to follow along with a tutorial video by Julia Silge I had watched on topic modeling in R using something called “tidytext” principles. You can check out the Silge’s blog post and video here.

However, I wasn’t able to load my TEI copies of Capital into R (and actually crashed R to the point of having to reinstall and edit the folders to run R using command line). So I decided to follow the tutorial more closely by using the Gutenberg online copy of Jude the Obscure.

I found Silge’s video and the accompanying blog post incredibly useful. When I got stuck or a strange error appeared, I could easily Google the issue or rewind the video to make sure I was using the correct symbols for each step and in the process I learned a lot about the coding nuances of R. For example, when working with dplyr, each line in a sequence must end in %>% up until the last line as %>% acts almost like a chain linking each subsequent function together.

The first chart I created was a tf-idf chart for each part of the book, and I will admit that I was not entirely sure what tf-idf meant. Julia Silge briefly explained the tf-idf concept, but I found the this page useful because it provided a formula in plain English.

  • TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)
  • IDF(t) = log_e(Total number of documents) / (Number of documents with term t in it)
  • Value (i.e. length of the bar in the chart) = TF * IDF

In other words, tf-idf represents the importance of a term inside a document. tf-idf shows which words appear most frequently in each part that are relatively less frequent in other parts. For example, the word “pig” is very important in the first part of the book, but doesn’t come up often in the remainder of the book.

As soon as I ran the code in R, I saw a bunch of results that came out as 0, which didn’t seem correct since the video didn’t show a bunch of terms with 0s. So I used the formula above to test a few of the results by hand and found that they were correct. From here, I used ggplot2 to create this chart:

The next step was to create a topic model, which was fairly easy to do after making the tf-idf chart. As Silge explained each step, I felt I understood what I was doing and why the chart turned out the way that it did.

Finally, I created a gamma graph, which Silge explained shows that there is roughly one document (part of the story) per topic. I was unsure at the time why this should matter, but I managed to create the graph and it looked like the one in the video so I considered this a success.

Once I finished the tutorial and began writing this blog post, I realized running all of these lines of code by following a tutorial really isn’t that different than using a GUI, at least in terms of what I learned. I generated all of these charts and really didn’t need to look at what any of the packages and formulas I was using meant or how they worked (though I did look into tf-idf because I initially thought I had done something incorrect). This isn’t to say I didn’t learn anything from this exercise. Certainly, it was helpful for me to better understand the syntax and minutia of coding with R, and the resulting charts I generated were technically clear and accurate. But this isn’t good enough from a distant reading standpoint. Going back to one of the two quotes I provided in my last post, “In distant reading and cultural analytics the fundamental issues of digital humanities are present: the basic decisions about what can be measured (parameterized), counted, sorted, and displayed are interpretative acts that shape the outcomes of the research projects. The research results should be read in relation to those decisions, not as statements of self-evident fact about the corpus under investigation.” I don’t really understand any of the decisions I made aside from the fact that I wanted 6 topics and that I divided the book into 6 parts.

Certainly this isn’t a terribly impactful experiment. If I screwed up the statistics and misrepresented the data, thereby misrepresenting the book as a whole, there won’t be any negative consequences. It just means readers of this blog might think the word “pig” or “gillingham” is more important to the story than they actually are. But let’s say I wasn’t working with a single nineteenth century novel (and admitting my gaps in knowledge as I proceeded) and was instead working with data about race or gender or a corpus of nineteenth century novels. Let’s say I wanted to derive meaning from this data and failed to do so accurately. Let’s say I didn’t understand the malleable nature of my data points, that race or gender or even vocabulary are not set in stone but are instead social constructs that can be interpreted in many ways. Let’s say I created some charts following Silge’s tutorial and (incorrectly) determined white male writers have a “better” or “larger” vocabulary than women writers of color, and that I used these charts to determine which novels I should read in the future to better understand the nineteenth century cannon. That would definitely be a problem. It would be worse if anyone who read this blog post was convinced by my incorrect results and acted in accordance with them.

So let’s go back to the beginning to figure out what each of these visualizations mean, not just to do my due diligence to Thomas Hardy, but also distant reading and text analysis as a whole. I’m already fairly clear on what the tf-idf chart means and how it was created, so I’ve decided not to delve into that any further. Let’s start with the topic modeling chart. Continue reading “Week 18: Youtube Tutorials and Coding with R”

Weeks 15, 16, and 17: TEI Trials and Errors

About two and a half weeks ago, I submitted my Free Speech Research Project for IRB approval. While I wait for a response, I have decided to delve into more analytical work. I will be attending DHSI in early June to learn about ethical data visualization. Although the description for the program doesn’t list any prior experience with TEI or R as a prerequisite, I decided to go ahead and practice with both of them.

This post will be dedicated to my experienced with TEI and the following post will deal with R. I had been interested in learning TEI for a while as it seems to be one of the major cornerstones of digital humanities work, particularly when working with text. Although people often associate TEI with close reading techniques, I intended to automate the TEI conversion process, making this project more closely associated with distant reading. Distant reading was something I read about when I first started this position and I found the concept fascinating mainly because I feel like students and people my age instinctively employ distant reading techniques when faced with a large text or corpus. When I studied abroad, I didn’t have enough time to read Bleak House in three days, so I found secondary sources, word clouds, timelines, videos, any kind of visualization I could get my hands on so I could more quickly absorb the text. Essentially, I was trying to do this:

Artist: Jean-Marc Cote Source: http://canyouactually.com/100-years-ago-artists-were-asked-to-imagine-what-life-would-be-like-in-the-year-2000/

And while this is not exactly what digital humanists mean when they say distant reading, it’s a great illustration of the way the internet has drastically changed the way that we understand art and literature. Take This is America for example. So many news outlets and social media participants immediately attempted to dissect the video, none succeeding in capturing every nuance, but as a collective whole, created a fairly solid interpretation of the work within hours of its release, utilizing blog posts, Tweets, articles, videos, essays, and more. Because explanations and interpretations are so readily provided, artists like Donald Glover and Hiro Murai are pushed to create increasingly nuanced and layered works that resist simple interpretations and challenge their audience to dwell on their meaning. We’ve all witnessed times when every single news media outlet comes out with the exact same article on a pop culture phenomenon (just look up articles about Taylor Swift’s Reputation, all of which list each easter egg in the video with ease). The reactions to This is America were diverse and, cobbled together, still seem to fall short of encapsulating the sense of mesmerization that This is America provokes. The internet has so drastically changed the way we consume and interpret art, that artists who want to challenge their audience have to create works that elude the media’s customary act of reacting and labelling. At the same time, scholars have to create new ways of understanding pieces of art. The distant reading techniques employed by digital scholars are an attempt to answer this call.

Before I started anything with my project, I decided to read more about distant reading and its current relationship to the Humanities. Two quotes from these readings really stood out to me.

  • “Computer scientists tend toward problem solving, humanities scholars towards knowledge acquisition and dissemination” (Jänicke). What happens when we collapse that binary. What if the goal of the humanities becomes solving problems and the goal of the computer scientists is to acquire and disseminate knowledge? How does our ability to understand art and data morph into something new when we treat art like data and data like art?
  • “In distant reading and cultural analytics the fundamental issues of digital humanities are present: the basic decisions about what can be measured (parameterized), counted, sorted, and displayed are interpretative acts that shape the outcomes of the research projects. The research results should be read in relation to those decisions, not as statements of self-evident fact about the corpus under investigation” (UCLA). In the same way that art is created to answer a call for increasingly difficult interpretive puzzles, a product of the analytical environment into which it was born, so is the analysis itself. You cannot create data or answers or knowledge or solutions in a vacuum. The decisions I make during this project will impact the results. The fact that I am using digital techniques to perform my analysis doesn’t absolve me of responsibility for what I produce. 

I designed a project that I felt could incorporate both aspects of these quotes. I want to solve a problem and acquire knowledge, and at the same time, understand that whatever I produce would be inextricably tied to my own research decisions. This project also had to include aspects of TEI and R, as well as some of the other tools and skills I’ve learned about thus far. My plan was this:

  1. Find copies of Marx’s first three volumes of Capital and systematically scrape them from the web. (I chose to exclude the fourth volume as it was not available online in the same format as the other three volumes.)
  2. Encode the scraped texts (saved as a plain text file from the CSV file generated by the scraper) as TEI texts.
  3. Use those TEI documents to do some topic modeling or sentiment analysis in R, and put them into Voyant (because why not?).

Capital seemed like a good choice because I have an interest in Marxist theory, so I felt I would intuitively understand some of the results from my experiments, and it would be fun to learn about Marx’s most important work using new techniques that–as far as I know–haven’t been applied to this work before. Capital is also highly structured with parts, chapters, sections, and a variety of textual elements including tables, quotes, foreign phrases, and footnotes, which would give me plenty of opportunities to learn the options available in the TEI schema. Also, May 5 was Marx’s 200th birthday, and it just seemed fitting.

I coded the first chapter by hand so that I understood the structure, why some elements can be used together while others cannot, how to best divide the text using the <div> tag, how to label tags, and transform attributes into TEI-approved formats (e.g. &amp; into &#38;).

Using an initial HTML version of Chapter 1, I gradually worked my way through finding and replacing HTML elements with TEI elements.
Final structure of TEI encoded chapter

Once I completed the first chapter, I felt I had a sufficient understanding of the elements to know what to look for in a TEI document created using an online converter tool. For this step, I used OxGarage to convert the Word document versions of the volumes into P5 TEI. I then cleaned up the documents in order to make them valid. This step required very little effort. Most of my edits pertained to fixing mistakes in the Word documents, like a broken link to a footnote or an incorrect, invisible numbered list before some paragraphs.

This whole process took me about a week to complete. I summarized above but the actual steps looked like this: Continue reading “Weeks 15, 16, and 17: TEI Trials and Errors”

Weeks 13 and 14: There’s a GUI for That

For the last couple of weeks, I have been working on IRB documents for my research project involving student interviews. The good thing about the IRB process is that it really made me plan out my project so that I now know every detail of how I will do each step.

Part of my project will almost definitely involve topic modeling and sentiment analysis. However, when I wrote my first draft of the IRB Approval Form Responses, I realized I didn’t actually know very much about topic modeling or sentiment analysis, and what little knowledge I did have wasn’t going to cut it for this review process. So I sat down to try to read about the process of topic modeling and how it can be used.

I don’t know if you’ve read my About page or any of my other blog posts, but let me reiterate that I do not have a computer science or a statistics background. I consider myself fairly capable when it comes to math. I’ve always had a good sense for math and logic in the same way that although I’m not a cartographer, I am usually pretty confident in my ability to point out the cardinal directions. However, the articles I was reading and most of the blog posts sounded like this to me: 

No matter how many articles I read or reread, I just wasn’t getting how the whole thing worked. Where do you do topic modeling or sentiment analysis on a computer? Is it code? Is it a program? Do I need to download something? How should I format the files I want to analyze? What kind of code is used in topic modeling?

Learning how to do this stuff on your own is like trying to bake a cake for the first time in your life, and your only directions are in Russian, and you’ve also never eaten cake before. I have a general idea of what I hope to end up with, but I don’t even know where to flip in this cookbook to find a cake recipe.

I found myself reading the same sentence over and over again, so I did what I always do when I don’t understand how to do something and reading isn’t helping. I got on Youtube and watched some videos. (You can find everything I’ve watched and read (that I comprehended) on my reading list page.) They weren’t all helpful, but just watching someone type out the code and show a result was helpful. So I watched a bunch of these videos and after finding one that was particularly clear and useful (it was actually on sentiment analysis, but that’s beside the point), I downloaded all of the things the Youtuber had in the description section of his video. I quickly realized that many of the programs have been updated since the video, and don’t look the same or don’t have the same features anymore, so I couldn’t follow along with his tutorial as I planned. I was kind of back to square one–or square two, since I at least had a better idea of what kind of information I could get out of doing data analysis like this.

Next step? Complain about how hard this is! Feeling like I’d hit only dead ends, I explained my predicament to Ben, who sent me an article I’d previously given up on. The thing about a lot of these tutorial articles is that they start by telling you to go back a step if you don’t already know how to use the command line or BASH or R or whatever thing they’re going to use throughout the tutorial. And that makes a lot of sense to me. I wouldn’t suggest to you that the best way to learn how to bake is by using a recipe in a Russian cookbook and if I did, it would be cruel of me not to tell you to brush up on your Russian first. But if you keep going backward further and further away from the thing you want to do, you end up watching videos about how computers work (like this) instead of writing an IRB document. And I’m certainly not arguing that learning how computers work is a bad thing or that I shouldn’t spend my time learning the basics of computer science, but I’m also on a schedule. There are only so many skills I can learn in a week or a month or a year. But, I had a lead on one. Ben suggested this article, which was much clearer to me once I watched all of the Youtube videos and out of all the tutorial blogs, if Ben said this was good way to start, I could trust it would get me somewhere. That article linked to another that I could understand and another (see reading list). Eventually, I found the GUI Topic Modeling Tool, which is a GUI for MALLET, and then I really got on a roll.

Let me say briefly, I understand the trepidation around GUIs. If you don’t know how something works (e.g. what the program does and how it processes data), you might take all of the results at face value, as if they were concrete, official, and not as the social construct that they actually are. It’s like how most of us know vaccines prevent diseases and we get them, but few of us know how to make them. Generally, this works out. We don’t all need to know how to make a vaccine, but very few of us are mixing up our own vaccines at home and claiming they cure anything. Digital scholars who don’t know how the statistical model behind their data works aren’t going to accidentally give people mercury poisoning with a homemade polio vaccine, but they could confuse their audience with claims that might not be substantiated in the data. So if you don’t love the GUI Topic Modeling Tool, I get it.

The thing about GUIs to me is that they can be like the Google Translate for your Russian cookbook. Sometimes (maybe often) you’re going to end up with some total nonsense, but if it’s all you’ve got to get started, that’s what you’re going to use. The MALLET GUI available here, is really, really useful. It allowed me to work backward, so that I could tweak variables in a format I understand and compare the results. I know what number of topics should do, but what does number of iterations mean? How does it change the results from a data set I know super well? How does changing the number of topic words printed impact my understanding of the results? I don’t need to know how to tell the computer to change that variable yet. I need to understand what the variable is in the first place.

I ended up with some data that I could play around with and visualize on Power BI. Miriam Posner’s blog post about interpreting this data was super helpful at this step. Now I feel more prepared to read all of those blogs about the statistical model behind topic modeling. I have a better grasp of the variables and results. I’m also going to try running MALLET from the command line. To return to that damn cake metaphor for the 100th time this post, I know what all of the ingredients are. I’ve tasted some cake. Next, I have to learn how to set the oven and turn on the mixer. (Do you hate me yet?)

Best of all, the other benefit of using the MALLET GUI was being able to grasp enough of the conceptual ideas behind topic modeling and the data it generates so I could complete my IRB documents. Now I just need to submit them!

Weeks 11 and 12: My Favorite Tools

For the last two weeks, I have been trying out online tools that I may use for my research project and in this post, I’m going to discuss my two favorites.

Trint

When I interviewed for this post-bac position, I talked a lot about my frustration with the lack of transcription tools available online while I was working on the Digital Life Stories Archive for Regina Martin. My struggles with voice-to-text software was a learning experience for me that involved pirating old versions of Premier in order to use the now abandoned transcription tool only to find out it no longer functioned–even on older versions, transcribing an hour-long interview by hand, which took me nearly six hours, and spending multiple hours online searching for any tool, free or not, that would cut down on transcription time. We laughed during my interview about the fact that my expectations were too high because I knew Apple and Google created voice-to-text programs for Siri and Google Assistant and I expected there to be a “magical” solution to this transcription issue on Regina’s project.

Well, laugh no longer because there is a magical solution out there called Trint. This is an online program that you have to pay for (The rate is $12/hour right now). I conducted a practice interview with my brother, Spencer that I fed into Trint, and honestly, I held off trying it because I was so sure I would be disappointed in the results. Well, I’m writing this now to tell you I most certainly was not disappointed. Trint transcribed a half-hour interview in about 2 minutes. The audio was decent but not high-quality by any means because we conducted the interview over Zoom and Spencer’s internet was a little spotty. The results were incredible. I didn’t think it would function half as well as it did.

I don’t mean to suggest that Trint was perfect, certainly, there were some words that Trint mixed up. For example, Spencer and I have midwestern accents and we tend to slur words together, so depending on the context, we might say “a”, “I”, and “uh” the exact same way, so a sentence like “And uh, I was a little embarrassed” might come out like “And I I was uh little embarrassed.” This isn’t a hard fix. As you listen to the interview, Trint darkens the words being played and you can easily edit the text to match the intended words by just clicking a editing the transcript like a word document.

Another aspect that takes longer to edit is punctuation, as Trint only puts in periods at hard stops in the conversation. In order to better textually represent what’s being said, sometimes it’s necessary to put in additional punctuation. For example, Trint might write “You know I was at work and the dude well he so. He’s not nice.” Without punctuation we might understand the speaker’s intended meaning of these sentences, but punctuation would better capture the speaker’s actual phrasing, so that sentence becomes “You know, I was at work and the dude, well, he–so… he’s not nice.”

I will say that there were a number of difficult or unlikely phrases that Trint did accurately capture. Trint caught Spencer’s use of “sorta” and my use of “gotcha”, automatically capitalized “Black Lives Matter”, and cut out filler noises like “uh” and “um.” And aside from a few situations with homophones like “by and buy”, it was mostly correct.

I don’t think I can say enough positive things about Trint, so I’ll just say if you have an interview project or any audio or video that you want transcribed, try Trint out.

Webscraper.io

The second application I’m going to talk about is webscraper.io. This is an incredibly useful, simple tool to use. Although it appears to be really complex because, for whatever reason, the actual web scraper is the very last tab on the tool, the videos provided on the webscraper.io website (though quiet and oddly monotone), are very easy to follow and tell you everything you need to know about the tool in order to scrape websites.

I tend to resist watching video tutorials because I find them boring – as I’m sure many people do – and I enjoy figuring things out by playing with the tools on my own. However, web scraping and in particular webscraper.io uses its own language and without any context, I totally failed at my first scrape (I wouldn’t even say scrape, I’d say random clicking and nothing happening). However, once I gave in and watched the short video, I completed a web scrape of The Denisonian website in just a few minutes.

As I moved on to more complicated, slightly newer-looking websites, I found that having a little bit of knowledge of HTML was helpful because some of the newer websites disabled the point and click feature on the program and I had to edit the JSON code for the web scrape by hand. This was both frustrating and rewarding, but I’m very glad I was presented with that challenge because I feel I have a much more in-depth understanding of what web scraping is because of it.

The Takeaways

Both of these applications took time – one in terms of waiting around for the application to be developed, the other in terms of my ability to learn a new program. But what I get out of both of these experiences, is an appreciation for the fluid learning skills that I’ve gained through a liberal arts education.

Right now, to me digital scholarship is another branch or level of understanding and interpreting the world, and it’s one that often frustrates the hell out of me. I spent weeks searching for a program that didn’t exist only to find the best version of it possible years later (thanks for the recommendation, Ben). I spent hours – probably longer than I would’ve spent had I just copied and pasted the data I wanted – teaching myself how to scrape different websites. But it’s also incredibly rewarding and gratifying. I stood up in triumph in the computer lab at OWU when my web scraping code worked after the 30th try. I called my mom to tell her about Trint because I was so excited it existed at the same time I was planning to take on another interview project.

I can’t say the digital liberal arts alone have instilled perseverance in me. I’ve always been stubborn. But it has transformed that stubbornness into something productive, the ability to try new methods, learn new ways, and maybe best of all, how to know when to quit (e.g. don’t illegally download an old version of Premier that will crash your 13-year-old computer).

Weeks 8, 9, and 10: How do You Start a Research Project?

It’s been a little while since I last posted, but the delay comes with a payoff: I finally decided on a topic for my independent research project. This is going to be kind of a long post. So, buckle up!

As part of this position, I get to dedicate some of my time to an independent project of my choosing (so long as it involves digital scholarship). I debated taking on a dozen different projects with topics like:

  • Memes
  • Capitalism
  • Karl Marx/Das Kapital
  • #Activism
  • Twitter Bots
  • Thomas Hardy’s Jude the Obscure
  • Thomas Pynchon’s The Crying of Lot 49
  • The myth of the American dream
  • Utopias
  • Free Speech
  • Portrayals of panopticons in video games

And, I considered making all kinds of weird digital projects involving these subjects, like:

  • flash games
  • Twitter bots
  • visuals not unlike The Knotted Line
  • an essay consisting entirely of memes
  • any other jarring combination of maps and graphs that would cause you to pause and think before interpreting the data

The one topic that really grabbed me was free speech on campus. Free speech has been an interest of mine since I was a student at Denison, where I protested speakers and groups invited to table on campus (details I won’t go into here). When I found myself talking about free speech every evening and weekend with my friends and family, I knew I had found a topic that could sustain a years’ worth of inquiry. Below, I’m going to outline how I turned a vague idea into a research project.

The idea for this project came up during the Ohio Five board meeting while provosts and board members discussed their concerns about students’ interpretation of free speech policy on campus and the seemingly escalating events that challenge free speech policy on the campuses. It felt like something was missing from the conversation (student voices), but also an acknowledgement of the post-structuralist point of view that many students–especially those advocating for stricter policies–have embraced and taken as truth (which certainly isn’t a criticism coming from me, the “how many times can she reference Foucault in an hour” student).

So for me, this conversation generated a pile of questions that I couldn’t immediately, internally resolve with Marxist theory as my friends and family would tell you I am wont to do. I ordered the book suggested during the meeting, Free Speech on Campus by Erwin Chemerinsky and Howard Gillman, and after some discussion with Ben, I decided to sketch out a rough research project involving student interviews.

I started with initial questions that I had for students, like:

  • Do you feel the administration does a good job making you feel safe to express yourself on campus?
  • Do you feel like there’s a group of students who don’t agree with your political opinions on campus? What is your perception of the divide of political opinions on campus?
  • Are there any policies that you would change on your campus in order to better reflect your views on free speech?

From there, I started reading everything about free speech and interviewing that I could get my hands on (see reading list). However, this isn’t exactly a digital project yet. It’s a sociology/psychology/communications project, but it doesn’t incorporate digital scholarship outside of the fact that I would be using a digital recording to analyze the students’ views. So that’s when I brought in the idea of text encoding. There are a lot of words that students (and everyone else) uses to talk about free speech that are coded with different meanings depending on who uses them. For example, the phrases “political correctness” or “PC culture” are rarely defined by the people who use them. Heidi Kitrosser explains,

“When [journalists] reference “political correctness,” it often is unclear whether they mean to reference formal restrictions or informal pressures, let alone the subset of either type that they have in mind. Even when reports single out particular practices, important details frequently are excluded. We saw, for instance, several commentators refer to “trigger warnings” without specifying whether they mean voluntary warnings by faculty, warn- ings suggested or encouraged by a school’s administration, or administratively mandated warnings. There is even less clarity as to the meaning of “safe spaces.””*

So I wondered whether there a way I could encode the transcriptions from the interviews to reveal the disconnect between students and the words that they use to talk about free speech. That’s when I came across this very helpful paper by two Berkeley students who were using interviews to visualize information like word frequency and word count. After talking with Jacob Heil, the Digital Scholarship Librarian and Director of CoRE at the College of Wooster, I found this was actually a pretty doable project.

However, as I mentioned earlier, I want to get weird with the data and my analysis or visualization of it. I want to do something unexpected. So, I thought about creating a Twitter bot, and it’s still a consideration. I’m drawn to the uncanny valley aspect of a Twitter bot and though I don’t know exactly how a bot might take shape as a result of this project, it seems like an interesting way to reveal new insights into the thoughts of students. I’m also on the fence about making something similar to the Knotted Line, though I am certainly not an artist of any kind of caliber, so I will have to find a new way of creating this type of visual. Regardless of the form, I think making these challenging visuals actually results in a better conversation with the audience because the audience is forced to spend time with the visual to interpret it, rather than showing statistics or graphs that reveal to the audience the answer that they want to find.

The questions about this type of visualization become: can I interweave the student narratives to reveal something new about them? Can I uncover assumptions students hold? Is one of them right and the other wrong? What is the relationship between the post-structuralist points of view these students and their arguments for changing campus policy? This isn’t an argument between red and blue, it’s much more nuanced than that (as are many arguments). Where do these students overlap? Where do they separate? How can I visualize this without imposing my own interpretation on students’ beliefs, or is that possible?

I don’t know the answers to these questions and of course, there are a lot more logistical and pragmatic issues to take into account for this project as well. For example: What if no one volunteers to be interviewed? What if I only get students with very similar points of view? What if I don’t have enough time? I have to go through five IRB processes if I’m going to do interviews at all of the campuses like I hoped. Do I even have time for that?

For now, I am simply maintaining a list of all of these questions so that as I move forward, I don’t lose sight of the constraining factors.

Finally, I had to actually formulate a research question that was both narrow enough to fit within the scope of this project and my 10 month(ish) deadline, and broad enough that it could be tweaked to better fit the project should it begin to take a new turn as I’m interviewing students. To formulate this question, I first read papers on qualitative research techniques and interview projects (see reading list again). Then, I compiled a list of all of the questions that arose as I was reading about free speech. I used these notes to write research questions that could allow for the possibility to dig into the theoretical questions as I interviewed students. Below are the two research questions I landed on and a series of 5 sub-questions related to the main two.

(1) How do students think (talk/perceive/conceptualize/interact with) about free speech on campus? (2) When students disagree on aspects of free speech, are there differences in the ways they talk about free speech and the language that they use?

  1. What words do students use when talking about free speech? How often do they use them?
  2. What do students think the “other side(s)” don’t understand about their POV?
  3. What influenced students to think about free speech in the way that they do?
  4. Are students’ views grounded in different schools of thought/theories/ways of understanding the world?
  5. How does students’ perceptions of campus climate impact their views or vice versa?

This isn’t a final list, and it certainly isn’t a list of interview questions, but it’s definitely a start. From here, I am going to continue to meet with campus leaders to discuss my project plans and continue to read about free speech and interview practices. Soon, I hope to begin the IRB process and teach myself how to encode text.

*Citation on Reading List under Kitrosser

Week 7: Exceptions to Open Access

Over the last couple of weeks, I have been highly interested in open access and debates surrounding its merits and shortcomings. Serendipitously, I was invited to sit in on Dr. Amy Margaris’s Anthropology Seminar, Culture, Contact and Colonialism, during which we would discuss the limitations of open access systems for publishing scholarly work regarding traditional/indigenous knowledge. Prior to attending the class, I read a number of articles centered on this debate including “Opening Archives: Respectful Repatriation” by Kimberly Christen and “Protecting Traditional Knowledge and Expanding Access to Scientific Data: Juxtaposing Intellectual Property Agendas via “Some Rights Reserved” Model” by Eric C. Kansa, et al. (Full citations can be found on my reading list).

The main argument against open access from traditional knowledge advocates goes like this: by arguing that “information wants to be free”, open access advocates fail to take into account whether the owners of traditional knowledge wish to share the information taken from them. Said more eloquently, “At one side, open-knowledge advocates seek greater freedom for finding, distributing, using, and reusing information. On the other hand, traditional-knowledge rights advocates seek to protect certain forms of knowledge from appropriation and exploitation and seek recognition for communal and culturally situated notions of heritage and intellectual property” (Kansa, et al.).

What I didn’t know prior to reading these materials was that traditional knowledge was and still is largely considered part of the public domain. I had visited museums filled with stolen objects (everything from art to architecture to furniture to actual human remains), so I had a frame of reference regarding the scale of the issue when it came to physical objects. I hadn’t yet considered the digital form these objects and the intangible aspects of traditional knowledge had taken. Digital objects are entirely different from their physical counterparts because digital objects can be stolen repeatedly, infinitely reappropriated, and can be easily taken out of context. Without strict licensing agreements or copyright protections, we have very little control over how digital objects are used, and even then, we still may not have as much control as we hoped. Naturally then, making all traditional knowledge part of the public domain has some serious consequences, namely that it perpetuates colonization.

Traditional Knowledge advocates argue in favor of some guidelines and restrictions when posting traditional knowledge online. Kansa et al. argue in favor of creating new Creative Commons licenses that include terms I’ll paraphrase such as “user must maintain the cultural integrity of the object” and “user must provide a native translation of the object.” Though Creative Commons licenses aren’t perfect, they are a solid model upon which traditional knowledge restrictions and licenses can be built. Christen makes a case for the Mukurtu project, which is an online hosting platform for traditional knowledge collections which allows for greater indigenous control over the visibility and access to materials, a more equitable visualization of traditional knowledge side by side with “expert” or “academic” knowledge, and the ability to create unique, culturally informed systems of organizing knowledge.

If we decided to take up all of these practices, would we be in direct opposition to the principles of open access? Personally, I don’t think so. Take my analogy* below:

Imagine you have a recipe that has been passed down for generations in your family and though it isn’t currently used to generate a profit, it has great emotional value within your family. You may not want someone to come to your house, read the recipe, and post it online where anyone could access it, and start making your great, great, great aunt’s dish for a profit, right? Not all information belongs in the public domain. Your private photographs are not part of the public domain. Your family’s recipe is not part of the public domain. Your diary isn’t public domain. None of these things are automatically assumed to be part of the public domain just because they consist of information, and even if someone asked you if they could read your diary or flip through your photo albums, that doesn’t give them the right to post all of it online. You should have control over what gets posted and how it’s presented.

What needs to change is the assumption that every single piece of data a researcher collects is part of the public domain. Just because a researcher may have collected it, doesn’t mean it belongs to them–regardless of whether they had consent to collect the information. This is certainly still in line with the principles of open access.

The goal of open access is to prevent exploitation and unjust gatekeeping of information. As stated by the Budapest Open Access Initiative:

“There are many degrees and kinds of wider and easier access to this literature [specifically, peer-reviewed journal articles]. By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”

This critique is, at its core, anti-capitalist. Open access advocates don’t argue that literally all information should be posted online (certainly, some information is private, as I made clear above), but rather that the work scholars produce should not be hoarded by private publishers for the purpose of making information a scarce commodity. There are limitations on open access (such as copyright, which should be used to maintain the integrity of the work and ensure proper citation), and traditional knowledge should be understood as a form of information that has limiting factors. At the same time, the goal of traditional knowledge advocates is to prevent exploitation and appropriation of culturally significant materials. It is, at its core, an anti-colonial argument. So essentially, both argue against imperialism, against dominating Western forces that prevent the human flourishing scholarly work is meant to produce.

As a scholarly community, it should be our goal to bring both sides of this debate together, so that the fewest number of people are exploited by research and publishing activities. This means reducing barriers, particularly financial barriers to access in some situations, and raising some barriers in others, particularly in cases where the information is culturally sensitive.

Moving forward, digital scholars, researchers, and archivists alike need to empower indigenous systems of understanding and sharing information. Indigenous populations should be able to:

  • Dictate the terms of use for the traditional knowledge and objects they share. They should have the right to: control what goes online, who sees it, who gets to use it, and how the person accessing it get to use it.
  • Benefit from the collection being online.
  • Control how the information is modeled and framed (i.e. the Smithsonian top down model for organizing information doesn’t apply to all knowledge. Other ontologies need to be made possible online).

At the same time digital scholars, researchers, and archivists should try to make research and scholarship more widely available whenever possible by publishing in open access journals, publishing in repositories, encouraging students and the public to use open access journals, and taking advantage of Creative Commons licensing systems.

The academic publishing status quo leaves much to be desired by both the traditional knowledge and the open access standards. We can all do better.

 

*To be clear, I am not at all trying to imply that private photos, recipes, diaries, etc. have the same cultural value or significance as traditional knowledge. My intention is to imply that some knowledge is personal and private, not public.

Week 6: Shared Ideals

By now, six weeks into my new position, I’ve read a lot of arguments both in favor and opposed to the digital humanities, and beneath the surface of almost every single one of these arguments, I’ve seen the same underlying theme: anti-capitalism or anti-corporatization. Most of the articles I have read have in one way or another made the case that the academy should not be run as a business. The most interesting thing about this is the way that it plays out in opposing arguments. Let’s take a look at some of the (oversimplified) arguments that have been made about digital scholarship. 

  • We should all publish on open access platforms because knowledge should be shared.
  • If we all publish only on open access platforms, there will be no peer review, and there will be a death of expertise, making it easier for the general public to be manipulated by propaganda.
  • Libraries shouldn’t be just houses for information, they should be centers for generating new knowledge.
  • Digital humanities will save the humanities from its crisis.
  • Digital humanities will not save the humanities from its crisis.
  • We need more cross-disciplinary research and research centers.
  • Digital work is rarely counted toward tenure and often not recognized as research.
  • The higher ups in higher education only like DH because it brings in funding and it’s something they can sell to incoming students.
  • Teaching students to use computers is exactly what the corporations want you to do!
  • Using blogs or other more creative forms of assessment instead of papers is better for student learning than asking students to write essays and dissertations.

The disagreements here are not about whether digital scholarship is the best way to make money for the academy and continue its expansion. They’re about whether digital scholarship is capable of dismantling capitalist practices and systems within the academy or whether digital scholarship will further corporatize the academy. It’s not about whether the scholars think it is good or bad that there is a paywall to access scholarship. It’s about whether the paywall is necessary in order to maintain the level of scholarship taking place in the academy right now. It’s not about whether the humanities should be saved (regardless of how realistic this outlook is). It’s about how best to do it. It’s not about whether there should be more tenured positions for faculty. It’s about whether junior faculty members are doing an unfair amount of labor in order to gain tenure. (Again, I’m well aware that these are oversimplifications, but go with me).

The point that I’m making is that for all of this contention around digital scholarship and digital humanities, what everyone actually seems to be concerned about, is the seemingly inevitable corporatization and cannibalization of the university as we know it. We don’t fear computers and their ability to make graphs out of word counts (at least most of us don’t). We fear that using a tool created by a warmongering, exploitative, power-hungry corporation will open the floodgates for even more exploitation and warmongering practices in our university.

I’m going to take a deeper dive into one of the major arguments that has been ongoing in the DH community. That is, should we involve digital scholarship in our curriculum?

Some professors are torn between on the one hand, giving their students a unique, but possibly untested learning experience using digital tools, and on the other hand, making sure that the work students do in class is valued by future employers. Professors care about their students’ futures. That’s why they teach. To ignore the necessity of creating an environment that produces capable workers is to do their students a disservice. They have to prepare their students for the working world, even if it doesn’t operate the same way they wish they could operate their classrooms.

Other professors argue that this innovative, digital work and assessment is valuable to employers, that as the world becomes increasingly digital, it is relevant to give students non-traditional assignments that involve aspects of digital scholarship. This argument is similarly centered around what will best prepare students for getting jobs, but it also argues that this isn’t the only function of digital scholarship, that it is fulfilling and fun and that it inspires a passion for learning in their students more so than lectures and papers.

Still other professors believe that the digital work and unique assessment practices are not productive at all, that they not only fail teach students the skills and learning outcomes necessary for the basic understanding of the field of study, but are also not skills that are marketable to employers.

As a recent undergraduate student, I would say that all of this is missing the point. I really enjoyed my college experience. I enjoyed writing papers and making videos and blog entries. I enjoyed classes where we had lectures and classes where we had discussions. I also enjoyed writing poetry using tweets from Twitter bots and making pottery and giving presentations and running social media platforms for our creative writing lecture series (the Beck Lecture Series). I liked having a mix of all of these kinds of work. It’s not about the type of work I did as a student, so much as it was about feeling I could engage in learning in new ways, which sometimes meant writing a fifteen page paper. What mattered was that these different types of learning were valued for their difference and their breadth. There is no reason why the liberal arts shouldn’t embrace digital scholarship. It embodies everything the liberal arts is about. It’s cross-disciplinary. Making a video is as much about camera angles as it is about storytelling. Understanding poetic meter is as important as understanding algorithms. Knowing how to teach yourself new skills is not just about where to find information but also about how to interpret it. To exclude print or digital information from these endeavors would be a huge mistake. The liberal arts needs the digital and the “traditional” forms of learning.

Whether the work is digital or not, is not the point. The point is that no matter what we do or how we structure our curriculums, ultimately, students have to be able to point to the work they did as undergraduates in order to get a job or get into grad school and that is the problem with education. What we all really want, not just for ourselves but for students as well, is the ability to learn for learning’s sake alone. We want to pursue our passions because that’s what the liberal arts is really about–inspiring a life-long love of learning (a phrase I heard 5,000 times at Denison). The problem is that the current system doesn’t allow everyone to pursue what they love.

I’m not arguing that I have a solution, nor am I arguing that I expect every single one of the articles and books I read to reference the Communist Manifesto when talking about ideal learning, teaching, and researching conditions. What I’m arguing is that I see a great amount of agreement beneath the contention in the DH community, and that it gives me hope because if all of these smart people are working on these problems, we might actually get somewhere.