What data sources do you use?

Each year we scrape data from tens of millions of unique consumers in the US alone, and tens of millions of consumers across numerous other markets. This generates hundreds of millions of data points, and it’s done in a completely randomized manner. Our data comes from forums, comment sections on blogs/news/e-commerce/video sites, as well as other types of long-form (deep) engagement platforms. We do this because our interest is to decode the culture around a topic. That is, we want to identify the universe of topics surrounding a topic, as well as the depth of the relationship between these topics.

NOTE: In non-English speaking markets we collect data in the local language and our anthropologists analyze the data in that language - avoiding the pitfalls of machine generated translation and errors in interpretation (as it requires an understanding of the cultural and political history of the country in question).

Ethnography (the study of a culture and its language to decode meaning) requires a minimum level of data quality.

When we scrape data, we look for platforms that satisfy two important criteria from a quality standpoint, so as to enable real ethnographic analysis.

  1. Some form of anonymity: Platforms that allow users to use a pseudonym, giving them the feeling of anonymity, leading to more honest discussion on topics that range from people's deep routed underlying beliefs to their opinions and attitudes on matters both public and private.
  2. Long form discourse: Platforms that enable real discussion rather than encourage users to hit 'like' or 'share'. The act of needing to use words and compose sentences to articulate one's feelings and opinions allows us to not only understand what people are saying, but also get at why they're saying it. This is why we don't scrape platforms like Facebook, Instagram or Twitter. They do not satisfy either of these criteria and therefore deliver low-quality data from an ethnographic standpoint.