It’s dark every night when I leave work. I know what that
means: it’s December and the college football season is almost over. We still
have some football left, notably the conference championship games, the bowls,
and the college football playoff, but the end is in sight.
Last year, that I meant I stopped posting for 9 months. But
this year, this year it means I’m shifting into political mode. After all – the
2016 presidential election is a mere 328 days from now. I’ll be building out a
polling model just like I did in 2012 (here’s hoping I can achieve the same
result) but this far from Election Day the polls don’t have much predictive
value. This far from the election polls do have some predictive power, but
other elements like endorsements and experience matter as well.
That being said, I have built a model, but it doesn’t rely on polls.
Instead, it relies on odds-maker consensus. Since July 20th, I’ve
been capturing betting data from different odds-makers on each of the declared
candidates regarding a) winning their party’s nomination and b) become president.
I can’t think of a better day than today, the day the
republican candidates are debating in Las Vegas, to make my first 2016 election
post.
The process for developing these percentages has been:
- Capture the odds at which
one could bet on a candidate (to win the nomination or the presidency)
from 26 different odds-makers
- Average those odds to get
a consensus
- Translate those odds into
an implied likelihood of winning
- Normalize the odds across
all candidates such that they add up to 100% (removing the house edge)
Repeating that process every few days has let me build the
following graphs, tracking each candidate’s fortunes over time. I’m
experimenting with two different ways to present the data. The difference
should be very clear. In this top graph I put everyone together. It has the advantage
of letting you directly compare one candidate’s fortune to another’s, but there
is a glut of candidates at the bottom who are obscured.
My second graph, shown below, plots every candidate on their
own little graph. Each graph is identical in both axis values and scale. This
data visualization technique is called small multiples, and visualizing many
different trends is one of its best uses.
This technique has the advantage of giving each candidate their own little space (very socialist) and making each line much easier to parse. Conversely, it's harder to understand what's happening between candidates, and each candidate's line is shrunk a little horizontally. For example TRUMP looks over the last few weeks looks like just a squiggle down below, but in the graph above you can exactly what's been going on with TRUMP.
Both graphs (obviously) tell basically the same story, but do so with emphasis on different things. I think I slightly favor the second presentation, but I'd be interested to hear what people think.
This is just the first post of many on the 2016 election. In the coming weeks and months I expect I'll make or publish:
- Democratic primary model
- Primary Election night county based forecast
- Primary polling based model
- All sorts of general election stuff
Happy presidential election cycle!
P.S. I know the fonts and colors aren't the same between the two graphs. It's bugging me too. But I'm jumping back and forth between two pieces of software as I sort this out, and I decided it was more important to get started than get every color/font/detail spot on.
Another tutorial about winning with gambling using statistics can be found here: http://www.data-blogger.com/2016/01/26/win-the-lottery-map-estimation-on-a-bernoulli-distribution/
ReplyDelete