"There are three types of lies: lies, damned lies, and statistics" - Samuel Clemens (Mark Twain) referencing a quote attributed to Benjamin Disraeli.
Lately, i have been seeing a lot of talk about per capita (especially crime rates). Let's just say i'm not a fan (if you didn't gather that from the title of the post). Why?
- Stat normalization is decent in theory - used to compare populations of different sizes. However, there are inherent problems with it. A big one is that it tends to extrapolate data in lower populations and slightly suppress it in larger ones. True, variability is lower with larger populations.
Per capita is a rate stat for "normies" to massively oversimplify data. It is not really that useful and can be misleading. Comparing data in a very questionable way, while stripping out context. Per 100K, makes it look to me like the data has 4 or more leading zeros (after the decimal point). Always thought aggregate data was more meaningful. i don't need per capita for the data to make sense.
Example: if i had $1, and it was adjusted per 100,000. That would be treated as $100,000. Where did the extra money come from? Nowhere, because it never existed. That is an issue with lower populations - it extrapolates/creates data (which may not exactly exist) at a common rate. Yes, that is an extreme example.
A big problem with using per capita (per 100K) is that it assumes that data scales at a linear rate. Kind of like the ECON 101 supply and demand curves being 45° straight lines (in reality that is rare). With stuff like crime, the rates generally decrease as population increases. There are far more people who don't commit crimes than crimes added via population increase.
i suspect that total crimes committed and population increase have a very low (if any) positive correlation.
- i view most rate stats as borderline useless (outside interest rate type stuff). Per capita is a "franken"-stat created from other stats, then adjusted. Why would you want to use a stat that would increase variability? i want my stats more accurate, not less.
- To be fair, most of my collegiate experience with data was observational (a computer lab "usage" group project) or time series data (multivariate regressions, usually linear - sometimes i wish i would have kept my Intro to Econometrics textbook). Haven't used it in the real world (as much of the analysis was dependent on computer programs like SPSS - statistical program for the social sciences).
i heard you can do some of this type of stuff in Excel, but it a bit messier. In an ECON 400 level class, i didn't (opted for SPSS) - maybe that's why i had the highest score on the final by 40%.
i still look at standard deviations, pretending everything is normally distributed.
No comments:
Post a Comment