OTW Guest Post: Smitha Milli

From Fanlore
Jump to navigation Jump to search
Interviews by Fans
Title: OTW Guest Post: Smitha Milli
Interviewer: Claudia Rebaza
Interviewee: Smitha Milli
Date(s): January 8, 2017
Medium: online
Fandom(s):
External Links: OTW Guest Post: Smitha Milli – Organization for Transformative Works, Archived version
Click here for related articles on Fanlore.

OTW Guest Post: Smitha Milli is a 2017 interview done as part of a series. See OTW Guest Post.

"Smitha Milli is a 4th-year undergraduate at UC Berkeley whose research interests lie in artificial intelligence and cognitive science. Today, Smitha talks about her research using natural language processing to reveal patterns in fanfiction texts, the results of which is available online."

Some Excerpts

Could you explain how your study was done and what you found?

The goal of this work was to share what fanfiction has to offer to the fields of natural language processing, computational social science, and digital humanities. Towards this end, we collected a large dataset of fanfiction from fanfiction.net that consists of about 6 million stories written by around 1 million authors. To characterize the interaction between authors and readers, we analyzed the network structure of the community. We found that 52% of the authors in our dataset had reviewed another author’s story. Of these authors, each had reviewed on average 13 other stories. We did exploratory data analysis to investigate the content of these reviews. In particular, we ran a statistical model called “latent dirichlet allocation” to extract the different topics underlying the reviews. Probably unsurprising to most of you, most of the reviews consisted of positive author encouragement (“please update!!!”) or emotional reactions to the story (“aww cute”).

We also investigated differences between fanfiction and canon. Specifically, we compared ten canons present in the Gutenberg corpus to their fanfiction counterparts. (We used canons from Gutenberg, so that we would have access to the text of the original stories. The canons we looked at were Les Miserables, Sherlock Holmes, Pride and Prejudice, Peter Pan, Alice in Wonderland, Anne of Green Gables, Jane Eyre, Little Women, The Scarlet Pimpernel, and the Secret Garden).

In both fanfiction and canon we found that female characters were mentioned less frequently than male characters. However, we did find that fanfiction had a slight, but very statistically significant, increase in the frequency of female character mentions. In fanfiction 42.4% of character mentions were female, while in the canons 40.1% of character mentions were female. We also analyzed how the number of times specific characters were mentioned differs between canon and fanfiction. For example, in Pride and Prejudice fanfiction, Mr. Darcy receives a large increase in mentions, while nearly every other character drops in the amount that they’re mentioned.

In addition to analyzing the fanfiction itself, the fact that the dataset had reader reviews on a chapter-by-chapter basis allowed us to pose a new, challenging NLP task about predicting reader sentiment towards characters. The goal of the task is to create an algorithm that when given any character in a story can predict whether a reader will like the character or not. To create labeled data for this task, we had annotators on Mechanical Turk label sentences in reader reviews as containing positive or negative sentiment towards a character. We trained a simple machine learning model to classify characters as positive or negative based on the text of the fanfiction. Our simple model finds plausible features. For example, it picks up on the fact that characters that “hiss”, “sneer”, or “shove” tend to be disliked.

Despite this, it does not achieve high performance on the task. We believe that is because you need a much higher-level abstraction of characters to understand why a character is disliked or not.

What about your research findings has inspired you the most?

I actually find it inspiring how poorly our baseline model did on the predicting reader sentiment task. We have a long way to go before computers can come even close to the story understanding that humans do naturally.

References