A cautionary tale about buying into statistics

I was going through my Google Reader feeds again an a series of articles on a few different sites caught my attention. They listed a couple of statistical “facts” and that is always something I hate seeing online because of how easy you can manipulate statistics to support any claim you want. I have a Bachelor’s degree in Business Economics and a couple of courses I had were regarding statistics and how different true statistics are from what you read in media. That’s why I think it’s about time someone went head to head with all these false numbers once and for all.

The two statistical stories I saw that lead me to write this both had close connections to tablets, which is why I’m writing this on NBT. One was regarding Android overtaking other platforms in the OS race and the other was regarding return rates of the iPad and the Galaxy Tab.

Know what statistics you’re looking at

The first thing you have to be aware of is what statistics you’re actually looking at. The first story I saw was regarding Android allegedly taking over the smartphone market. The problem with this “story” is that the statistics only list the shipments in Q4 2011. They don’t include shipments for all of 2010, and they certainly don’t include numbers regarding what platform is dominant when you look at all devices still in use. I would dare say that won’t become the dominant smartphone OS in the world until it surpasses the other companies in number of devices in use. By looking at only Q4 2010 you get an extremely limited view of the market and not a statistic that is useful in any way. If you had the same type of statistic for Q2 and Q3 for instance, Apple would do much better because that’s when the iPhone gets updated every year. Even if it were to surpass the others in that quarter doesn’t make it dominant, however.

Another very common deceptive tactic is to downplay information about what group of people you’re looking at. A very common tactic here is to limit the geographical area that is included in the data set, for instance by looking at the US smartphone market rather than the worldwide smartphone market. Nokia is big in Europe, not so much in the US- so if Nokia wanted to claim they’re still the biggest (in terms of shipments, Q4 2010) then all they need to do is release data for Europe only (assuming Nokia is still biggest there, I don’t have any data on that). They could also release data that showed devices still in use, but then you’d have an issue regarding what formula would decide what percentage of devices ever sold are still in use. They could use an averaged guestimate, or do a survey asking a certain percentage of the population- but then you’d have a whole new set of problems such as how accurate the selection is, bias etc. There are a a ridiculous amount of formulas and figures that test the accuracy of selections in the statistical world, but since no mainstream consumer knows what they mean they never make it to media. That means that if you ask 10 friends what smartphone they have and write a blog post from the results that isn’t necessarily any less accurate than the statistics you find on reputable sites, simply because the data that concerns the accuracy of the results is missing from both.

http://imgs.xkcd.com/comics/consecutive_vowels.png

Then you have to look at what the statistics company actually looked at. If you look at the smartphone statistics sheet, they actually list company names not OSes. Google is synonymous with Android as that’s the only smartphone OS they have (although they list two others I haven’t heard of at the bottom). The same goes for Apple and iOS, and RIM. But what about Nokia and Microsoft? In the original article they write, and I quote, “(..)devices running Nokia’s Symbian platform trailed slightly at 31.0 million worldwide(..)”. The problem with this is that “Symbian” isn’t an OS as much as it is a series of OSes. Symbian S60v3 and v5, Symbian^3, do they all count? What about Symbian S40, a “dumphone” OS that I wouldn’t include under “smartphones” but that I would certainly include under “devices running Nokia’s Symbian”. Then you have Maemo, which isn’t Symbian, but is a smartphone OS that Nokia uses. Microsoft now focus solely on Windows Phone 7, but you can still get Windows Mobile phones in certain markets, especially China- and that latter point goes for Android as well. Official numbers vs actual numbers, in other words.

You also have to look at what defines a phone. An iPod touch can make phone calls with Skype and send various types of messages, and some people exclusively buy it for that use. You need Wifi, but between devices like the Mifi and bad cell phone coverage in certain areas that might actually be easier to get than cell phone coverage for some people. Europeans would say that a phone has a SIM card, but CDMA users would certainly disagree. Even if it’s a stretch to say the iPod touch with it’s 10 million Q4 shipments should count, all anyone needs to do to include it is to change “smartphone shipments” to “OS shipments” and suddenly both the iPad and the iPod touch would add significant numbers to Apple’s shipment totals while the other companies who either don’t have PMPs and tablets (Nokia etc) and Android which has them but to a much lower degree would gain very little from the inclusion and it would change the results completely- just by switching from “smartphone shipments” to “OS shipments”- a difference many people wouldn’t think twice about. There’s also the Galaxy Tab, which is a tablet that can make phone calls. It shipped a lot of devices last quarter, so does it count? If so (and it isn’t included), Android had an even bigger lead than what the numbers show. What about that peculiar Windows XP phone that we heard about a while ago, even if that’s an insignificant number it’s a good example of how undefined borders can impact results.

The contents of any one panel are dependent on the contents of every panel including itself. The graph of panel dependencies is complete and bidirectional, and each node has a loop. The mouseover text has two hundred and forty-two characters.

Lastly, the word “shipment” bugged me in this report. Shipping to who? The retailers? Customers? For all we know, every electronics store in the world is filled to the brim with Nokia and Apple devices, and only Android, Microsoft and RIM actually sold much in Q4. If I make a new smartphone and OS and ship 150 million of it to various retailers and not a single one of them sells, will my company then be splattered across the news as overtaking every other smartphone platform/manufacturer on the market combined? Or do they mean sales? Who knows…

Knowing where the data comes from

Knowing where the data comes from is as important as knowing what data you’re looking at. In the case of the smartphone shipments, the statistics come from an independent company. Who knows if those people have any knowledge of what they’re doing or where they got the data. From the manufacturers? Retailers? Did they just Google around a bit and slap everything together? At the top of the original article they state “-Canalys reveals smart phone market exceeded 100 million units in Q4 2010”. So basically we’re trusting this company to have done a thorough, accurate job. Some companies even keep sales figures secret, so getting a truly accurate number from anyone seems unlikely.

There’s also the problem of media coverage. Mainstream users don’t know how to read statistics, and neither do most bloggers and journalists. The second story I mention at the beginning is proof of that, and has to do with Galaxy Tab return numbers vs the iPad. Instead of telling you directly why I banged my head into the wall when I went through my Google Reader feeds, I’ll instead post a screenshot:

At least two of them managed to agree on the number… Engadget at least provided a bit more information, aside from citing a company called ITG Investment research. Apparently the numbers are based on data from 6000 US retailers. What does that mean? Well, assuming at least one of these blogs above has the right number, it means that the Galaxy Tab has a 13-16% return rate at 6000 selected retailers in the United States. It doesn’t say anything about actual percentage of returns in the US, not to speak of the world. And who are these 6000 retailers? The Apple store? I’d think the most iPads are sold through those. Different return policies would certainly affect the result as well: do all stores in the US even accept returns? For how long? IS the a restock fee? Here in Norway, you have the right (by law) to return any item bought away from the normal physical retailer without paying a fee (aside from any shipping), as a counter to impulse buys online, by phone, people who come to your door etc. Numbers over here would certainly be different then, simply because of the return policy being more alike across the board (though some offer better policies than the law demands). The Galaxy Tab also requires a cellphone plan in the US, which would certainly make it more of a hassle than people realized.

There’s also the tiny issue of statistics causing themselves to be right. If you release data saying that many people return the Galaxy Tab, you’re influencing happy users to do the same if they have some doubts about the device. This is very common in politics where statistical data pointing to one party over the other might in fact influence people to vote for that party. Group mentality, in a way.

The bottom line

I could go on and on about how to poke holes in statistics all day, but I think you get my point by now. You can twist the data into saying more or less anything you want, as long as restrict parts of the data set and make sure to present it i a way that makes whatever you want to look good, look good. As someone who has spent more time that I’d want poking around SPSS 8statistics program) and looking at data sets and how small tweaks can change the outcome I can only advice you to ignore any statistic you see in media and not give it a second thought. There is some truth to any statistic, but it’s often not what people first think. I’ll leave you with a joke from the book “The Curious Incident of the Dog in the Night-time”, a book written from the perspective of a boy with Aspberger’s syndrome. While being a joke, it pretty much sums up my thoughts on statistics by showing that there are several ways of looking at things, it just depends on how specific you are.

There are three men on a train. One of them is an economist and one of them is a logician and one of them is a mathematician. And they have just crossed the border into Scotland and they see a brown cow standing in a field from the window of the train.

And the economist says, “Look, the cows in Scotland are brown.”

And the logician says, “No. There are cows in Scotland of which one at least is brown.”

And the mathematician says, “No. There is at least one cow in Scotland, of which one side appears to be brown.”

Pocketables does not accept targeted advertising, phony guest posts, paid reviews, etc. Help us keep this way with support on Patreon!

Andreas Ødegård

Andreas Ødegård is more interested in aftermarket (and user created) software and hardware than chasing the latest gadgets. His day job as a teacher keeps him interested in education tech and takes up most of his time.