Fandom statistics is the activity in which fans try to make sense of fandom trends quantitatively. Such quantitative analyses have been part of fandom since before the internet became broadly accessible in the 1990s, but the mass movement of fandom into online spaces helped accelerate the process as data collection and processing became easier.
Types of Data Collected
One of the main types of data collected by fan statisticians is about fandom demographics - information like how old are fans, where do they come from, gender, sexual orientation, and similar information. This information is usually generated by fan surveys. Examples of this kind of data include Fan Fiction Demographics in 2010: Age, Sex, Country, a 2011 analysis of demographics on Fanfiction.net, and Science, y'all, Melannen's compilation of polls on slasher sexuality.
If fan demographics are about the people, then the other kind of data generated is about what they produce: fanworks. This kind of data includes things like numbers of fics produced about different fandoms; proportions of slash to het, femslash, or gen; how much fic is AU; how much is explicit; changes and trends over time; and many other ideas.
Fanwork Searches and Sampling
Some fandom analysts collect data about fanworks and by performing searches on fanwork archives such as Archive of our Own or Fanfiction.net. Fanwork metadata can then be gathered from the search results. For instance, data collected from an AO3 search can indicate the frequency within the archive of different relationship categories, different fandoms, different time periods, or, more generally, of any search based on archive tags and other metadata. The AO3 search results can also easily indicate the range and the median values for dates of publication, word count, number of kudos, and other metadata. On AO3, the "Sort and Filter" sidebar can also be used to rapidly collect data about the top subcategories within a given tag -- e.g., the top relationships and their frequency. Fanfiction.net organizes fanworks differently from AO3 and currently has more limited search capabilities -- e.g., one cannot search for fanworks by relationship or relationship category (e.g., F/F or Gen), and searching based on date of publication is limited.
Web-based searches can be automated by use of programming script; data can be gathered more quickly by a script that fetches the HTML files and scrapes the relevant data than by a person gathering the data by hand. If, in the future, archives such as AO3 provide an API to allow for database queries without fetching the HTML, that will be a more direct method of collecting the same sort of data.
This form of data collection reflects the metadata of the fanworks rather than their actual content. If content exists in a particular fanwork but is not tagged, it will not be counted via this methodology. Tagging norms may also differ between different communities; if one fandom frequently uses a particular tag (e.g., "Angst") and another fandom does not, it is impossible to tell solely from this method of data collection whether that reflects a difference in the amount of angsty content between the two fandoms, or a difference in the tendency of each fandom to apply that tag to fanworks.
A related method involves human classification and counting of types of a limited set of fanworks. This method may be appropriate when the categories of interest are not directly searchable via an archive -- for instance, to classify relationships featured in fanworks on Fanfiction.net, one can read the fanworks or the fanwork summaries and record the relationships involved by hand. This can also be a useful method on a platform such as Tumblr, which does not make it easy to count the number of posts returned by a given tag search. Additionally, human classification fanworks based on their content rather than their tags according to a consistent standard can address the limitations of the metadata-only based search methods, described above.
Because this method of classification is relatively time- and labor-intensive, analysts may choose a small subset of relevant fanworks to analyze. The method for choosing this sample can influence the results. For instance, selecting a sample that consists of the most recent 100 fanworks posted to AO3 may lead to an overrepresentation of certain fanwork categories -- e.g., influenced by the most popular current fandoms, or by challenges currently underway.
Many analyses of fanworks simply compare the number of works in different categories; for instance, in looking at the prevalence of M/M slash on AO3, an analysis might look at the number of works tagged M/M vs. those tagged with other relationship categories.
Normalization means dividing by a total amount; this can make very different categories more comparable to one another.
For some comparisons, looking at raw frequency data may be insufficient. For example, in determining which fandoms have the most femslash, if one simply looks for the fandoms with the highest numbers of F/F works, one will mostly end up with the most popular fandoms overall. If one instead normalizes for each fandom and looks at the percentage of works that are labeled F/F within that fandom, then one gets a clearer picture of the relative popularity of femslash within a given fandom.
Similarly, when looking at how fanwork trends have changed over time, one needs to account for the change in the overall volume of fanworks posted over time. E.g., in comparing the popularity on AO3 of a movie that came out in 2013 vs. a movie that came out in 2011, one must account for the fact that far more fanworks are posted on AO3 in 2013 than in 2011. One can normalize to attempt to factor out the overall popularity of AO3, e.g., by dividing the numbers of fanworks posted in a given fandom within a given time frame by the overall number of fanworks posted on AO3 during that same time frame.
To answer a question like, "How much does the amount of femslash in a fandom relate to the number of female characters in the source media?" one can measure the number of F/F fanworks in different fandoms, count up the number of female characters in each of those fandoms, and then do a correlation test to see how closely these numbers are related. A high correlation coefficient means that they are closely related. That doesn't reveal the causal relationship between the two factors, but it can at least indicate whether there is any relationship at all between them.
Comparison of two groups
To find out if two groups are significantly different from one another along a given dimension, a t-test is often appropriate. For instance, if one wants to compare whether the fanfic produced by the Sherlock fandom tends to have a significantly higher word count than that produced by the Supernatural fandom, one can take a sample of fanfics from each fandom, look at the word count for each, and compare the set of Sherlock fanfic word counts to Supernatural fanfic word counts to find out whether there is any significant difference between the two sets of numbers.
Most of these statistical analyses include their own descriptions of potential pitfalls in their methods, but many of these are common potential difficulties. Firstly, it is impossible for the data gathered to be representative of all of fandom, or even one group. Many surveys are limited to friends of the original poster, though they may be spread around to a wider audience. Analyses of fanworks on any given website are just analyses of one website, not all of fandom. Further, fandom is constantly growing and changing as an environment, and by the time a significant analysis is undertaken, the situation may have again changed.
Data collection and analysis methods have changed with the types of data and tools available.
- Fall of 2000: The Fan Fiction Universe: Some Statistical Comparisons, compiled by Mary Ellen Curtin
- June 2000-November 2003: some FanFiction.Net stats compiled by Mary Ellen Curtin: Fanfiction.net Statistics Tables, part one; WBM link; WMB link, part two, commentary Fanfiction.net Statistics; WBM link
- 2004: Young, Female, Single…? A Study of Demographics and Writing-/Reading-Habits of Fanfiction Writers and Readers
- The Broads Who Blog: Gender and Fannish Expression in the Buffyverse Fandom by Claudia Rebaza, results and analysis of 1,541 respondents to an online survey.
- A June 2007 poll on FanLib's forum asked members their gender. 19% replied they were male, 81% replied they were female out of 52 respondents.
- a survey and analysis done by Katherine Morrissey at fandom then/now: an ongoing and participatory research project. (a detailed look at media fandom in 2008)
- Fan Fiction Statistics - FFN Research: Fan Fiction Demographics in 2010: Age, Sex, Country, Archived version (2010)
- Fan Fiction Statistics - FFN Research: FanFiction.Net Member Statistics, Archived version (July 2010)
- Fan Fiction Statistics - FFN Research: Most Popular Categories, Archived version (July 2010)
- Fan Fiction Statistics - FFN Research: FanFiction.Net story totals, Archived version (July 2010)
- Fan Fiction Statistics - FFN Research: Research 101, Archived version (December 2010)
- Science, y'all. ("So: Are slashers straight?")
- Fan Fiction Statistics - FFN Research: FanFiction.Net Fandoms: Story and Traffic Statistics, Archived version (January 2011)
- Fanfiction Study; Webcite, compiled by Amaya Ramiel
- OTW Community Survey, which includes data on language use, time spent in fandom, and a closing section comparing activities by fans who self-identify as creating particular types of fanworks.
- destinationtoast popularized comparative analyses of fandom trends using scraped data from various sites. Most of these analyses revolve around AO3 as the way the archive uses tags makes it the easiest source for data generation; however, there have been some analyses that include FF.net and Tumblr.
- following destinationtoast's analysis, centrumlumina completed a similar survey of AO3 fanworks, see Why M/M? , followed by a survey of AO3 users to establish the demographics of readers and creators on the site. This survey received over 10,000 responses, and has now published results on demographics, Archived version, site use, Archived version, and fanwork preferences, Archived version. Further results, Archived version are still being published, and include notes for future researchers, Archived version.
- AO3 Ship Stats 2015 continuing the AO3 Ship Stats project by centrumlumina.
- The 100 most popular ships in 2015 on AO3: what place for women?
- TV Fandom Sizes on FFN; archive link (July 2016)
- Tumblr Fandometrics' Year in Review 2016 (December 2016)
- AO3 Ship Stats 2016 continuing the AO3 Ship Stats project by centrumlumina.
- Five Tropes Fanfic Readers Love (And One They Hate) - results and analysis of 7,610 respondents to an online survey on tropes.
- Tumblr Fandometrics' Year in Review 2017 (December 2017)
- Fandometrics in Depth: Shipping was a series of posts listing the most reblogged ships on Tumblr of by year.
- AO3 Ship Stats 2017 continuing the AO3 Ship Stats project by centrumlumina.
- The Fansplaining Shipping Survey results and analysis of 17,391 respondents to an online survey.
Some Fandom Stat Analysis Sources
- Fandom then/now: an ongoing and participatory research project. (2008)
- Fan Fiction Demographics in 2010: Age, Sex, Country (2010)
- AO3 Ship Stats Masterpost (2013)
- AO3 Census: Masterpost (2013)
- Destination: Toast!
- ToastyStats: Fandom statistical analyses
- fannish age survey: results!, Archived version (2014)