Testing for dodgy numbers using FastStats and Benford’s Law

Showcase, discuss, and inspire with creative America Data Set.
Post Reply
phonedata
Posts: 25
Joined: Mon Dec 23, 2024 3:19 am

Testing for dodgy numbers using FastStats and Benford’s Law

Post by phonedata »

"Naturally occurring" numbers usually have a non-uniform distribution of the first digit with the lower digits being relatively more common i.e. more 1's than 2's, more 2's than 3's etc.

When first heard that seems a strange idea. But it is sort of obvious when you think of prices in the supermarket or house numbers on streets. Trying running that one through cyprus mobile phone numbers your head as you're walking home.

This observation is known as Benford’s Law see here for an explanation.

Obviously it doesn't apply to random numbers or structured numbers like SIC codes or telephone numbers which have their own special distributions.

It is easy enough to test the numbers in any FastStats variable. I’ve used an expression to pick out the first digit. The first digit frequencies are then analysed in a cube and the distribution is dragged onto a Charting window to visualise.

Benford’s Law is used as a method of detecting accounting fraud.If you're just making up the numbers then you won't easily get this distribution. The effect was even used as "evidence" against the Greek government's suspect accounts by the EU auditors.

Screenshot

Further reading suggests that it is quite possible (although relatively rare) to have naturally occurring sets of numbers that don't follow Benford's Law. So in fact it isn’t a “law” at all. However, used carefully it is a useful fraud flag: a quick test for dodgy numbers.
Post Reply