Fandom Statistics

From Fanlore
Jump to navigation Jump to search
Related terms: Meta, Fan Survey
See also: Counters and Kudos
Click here for related articles on Fanlore.

Fandom statistics is the activity in which fans try to make sense of fandom trends quantitatively. Such quantitative analyses have been part of fandom since before the internet became broadly accessible in the 1990s, but the mass movement of fandom into online spaces helped accelerate the process as data collection and processing became easier.[citation needed]


Types of Data Collected

Fan Demographics

One of the main types of data collected by fan statisticians is about fandom demographics - information like how old are fans, where do they come from, gender, sexual orientation, and similar information. This information is usually generated by fan surveys. Examples of this kind of data include Fan Fiction Demographics in 2010: Age, Sex, Country, a 2011 analysis of demographics on, and Science, y'all, Melannen's compilation of polls on slasher sexuality.


If fan demographics are about the people, then the other kind of data generated is about what they produce: fanworks. This kind of data includes things like numbers of fics produced about different fandoms; proportions of slash to het, femslash, or gen; how much fic is AU; how much is explicit; changes and trends over time; and many other ideas.

Collection Methods



Fanwork Searches and Sampling

Archive Searches

Some fandom analysts collect data about fanworks and by performing searches on fanwork archives such as Archive of our Own or Fanwork metadata can then be gathered from the search results. For instance, data collected from an AO3 search can indicate the frequency within the archive of different relationship categories, different fandoms, different time periods, or, more generally, of any search based on archive tags and other metadata. The AO3 search results can also easily indicate the range and the median values for dates of publication, word count, number of kudos, and other metadata. On AO3, the "Sort and Filter" sidebar can also be used to rapidly collect data about the top subcategories within a given tag -- e.g., the top relationships and their frequency. organizes fanworks differently from AO3 and currently has more limited search capabilities -- e.g., one cannot search for fanworks by relationship or relationship category (e.g., F/F or Gen), and searching based on date of publication is limited.

Web-based searches can be automated by use of programming script; data can be gathered more quickly by a script that fetches the HTML files and scrapes the relevant data than by a person gathering the data by hand. If, in the future, archives such as AO3 provide an API to allow for database queries without fetching the HTML, that will be a more direct method of collecting the same sort of data.

This form of data collection reflects the metadata of the fanworks rather than their actual content. If content exists in a particular fanwork but is not tagged, it will not be counted via this methodology. Tagging norms may also differ between different communities; if one fandom frequently uses a particular tag (e.g., "Angst") and another fandom does not, it is impossible to tell solely from this method of data collection whether that reflects a difference in the amount of angsty content between the two fandoms, or a difference in the tendency of each fandom to apply that tag to fanworks.

Online Sampling

A related method involves human classification and counting of types of a limited set of fanworks. This method may be appropriate when the categories of interest are not directly searchable via an archive -- for instance, to classify relationships featured in fanworks on, one can read the fanworks or the fanwork summaries and record the relationships involved by hand. This can also be a useful method on a platform such as Tumblr, which does not make it easy to count the number of posts returned by a given tag search. Additionally, human classification fanworks based on their content rather than their tags according to a consistent standard can address the limitations of the metadata-only based search methods, described above.

Because this method of classification is relatively time- and labor-intensive, analysts may choose a small subset of relevant fanworks to analyze. The method for choosing this sample can influence the results. For instance, selecting a sample that consists of the most recent 100 fanworks posted to AO3 may lead to an overrepresentation of certain fanwork categories -- e.g., influenced by the most popular current fandoms, or by challenges currently underway.

Processing/Analysis Methods

Fanwork analyses

Relative frequency

Many analyses of fanworks simply compare the number of works in different categories; for instance, in looking at the prevalence of M/M slash on AO3, an analysis might look at the number of works tagged M/M vs. those tagged with other relationship categories.


Normalization means dividing by a total amount; this can make very different categories more comparable to one another.

For some comparisons, looking at raw frequency data may be insufficient. For example, in determining which fandoms have the most femslash, if one simply looks for the fandoms with the highest numbers of F/F works, one will mostly end up with the most popular fandoms overall. If one instead normalizes for each fandom and looks at the percentage of works that are labeled F/F within that fandom, then one gets a clearer picture of the relative popularity of femslash within a given fandom.

Similarly, when looking at how fanwork trends have changed over time, one needs to account for the change in the overall volume of fanworks posted over time. E.g., in comparing the popularity on AO3 of a movie that came out in 2013 vs. a movie that came out in 2011, one must account for the fact that far more fanworks are posted on AO3 in 2013 than in 2011. One can normalize to attempt to factor out the overall popularity of AO3, e.g., by dividing the numbers of fanworks posted in a given fandom within a given time frame by the overall number of fanworks posted on AO3 during that same time frame.


To answer a question like, "How much does the amount of femslash in a fandom relate to the number of female characters in the source media?" one can measure the number of F/F fanworks in different fandoms, count up the number of female characters in each of those fandoms, and then do a correlation test to see how closely these numbers are related. A high correlation coefficient means that they are closely related. That doesn't reveal the causal relationship between the two factors, but it can at least indicate whether there is any relationship at all between them.

Comparison of two groups

To find out if two groups are significantly different from one another along a given dimension, a t-test is often appropriate. For instance, if one wants to compare whether the fanfic produced by the Sherlock fandom tends to have a significantly higher word count than that produced by the Supernatural fandom, one can take a sample of fanfics from each fandom, look at the word count for each, and compare the set of Sherlock fanfic word counts to Supernatural fanfic word counts to find out whether there is any significant difference between the two sets of numbers.


Most of these statistical analyses include their own descriptions of potential pitfalls in their methods, but many of these are common potential difficulties. Firstly, it is impossible for the data gathered to be representative of all of fandom, or even one group. Many surveys are limited to friends of the original poster, though they may be spread around to a wider audience. Analyses of fanworks on any given website are just analyses of one website, not all of fandom. Further, fandom is constantly growing and changing as an environment, and by the time a significant analysis is undertaken, the situation may have again changed.


Data collection and analysis methods have changed with the types of data and tools available.














  • destinationtoast popularized comparative analyses of fandom trends using scraped data from various sites. Most of these analyses revolve around AO3 as the way the archive uses tags makes it the easiest source for data generation; however, there have been some analyses that include and Tumblr.[1]









Some Fandom Stat Analysis Sources