Hi, I’m Nicolas, the resident Geek at Reflection, in charge of turning shedloads of data into gold. Ok, ok, I’m just a Chief Data Scientist working on modelling the App industry 🙂
Here at Reflection, we are often asked:
“How can your estimates be SOOO accurate??”
To which we humbly respond:
“I don’t know – ask Nic!” (or “Thanks to our great partners!” for the more PC version…)
So here is my attempt to explain some of the inner workings behind Reflection’s estimates.
First of all, all of what we do is possible because the App stores (IOS, Google Play, etc…) publish a ranking of the most popular and best grossing apps. This helps users find interesting apps, but it also give us a proxy to estimate how many downloads and how much revenue all the apps are making. Because our valiant army of Developers kindly share their sales data with us, it provides us with a sample of data points from which we can infer all the rest.
Now a sample of data can (and does) have gaps, if you consider all the 155 countries, the 43 different categories, the free, the paid and the grossing lists on different App stores, gaps are going to to happen. So we leverage historical data a lot, to give us at least a baseline of what the numbers should be.
So what is our accuracy? It’s hard to give a meaningful number, because the market is roughly a power-law (downloads or revenue vs ranks), so it’s very unequal.
In that respect, an average value for error doesn’t really makes sense. Think about calculating the world average salary, you will get a few earning billions skewing the average up, while way more people are earning pennies. It doesn’t really paint the right picture. (for more on this see: https://en.wikipedia.org/wiki/Power_law#Lack_of_well-defined_average_value). Even a relative error is misleading because at the bottom of the paid list, apps are only making a few downloads, so if we estimated 3 but the real number was 1, we would have a 200% error when clearly the difference is not significant (and note that in a power-law there are A LOT of apps with only a handful of daily downloads, so that would totally skew the average relative error).
But, you still want a freaking number don’t you?? We get it, so here’s a few numbers from a validation test we did with a big developer that had many apps with a good range of popularity (Electronic Arts). We estimated all their apps and the percentage error was usually well within 30% on individual apps, but when summing up all the apps downloads we ended up being only 4% off! So that should give you an idea of what to expect, individual numbers (for 1 app for 1 day) can be quite uncertain, it’s just an estimate after all. But what really matters is that if you combine or average our estimates over a week or month, which is usually what is sensible to do, we get quite accurate results. We might have had luck on our side to get to 96% accuracy, but you get the idea. In statistics that is called having good ACCURACY, but lower PRECISION.
The lack of precision comes from the App store ranking formula, which doesn’t just consider downloads, it’s a measure of “popularity” whatever this means. The rankings are also updated a few times a day, which doesn’t necessarily coincide with the time the sales data are released.
So we are working hard on getting more inputs into the model, things like reviews, updates, whether the apps are featured, Facebook likes, Google trends, you name it, we ingest it. Then we try to find out if Tensor Flow can beat the good old Random-Forest (FYI those are 2 popular machine learning algorithms) and fun things like that. At Reflection we also like our queries snappy, so we experiment with different data storage BigQuery, Cassandra, Spark SQL, etc…
But what is even more critical for good accuracy and better range of coverage of the market, is growing our army of Devs! Join us!