Representative polls of voter intention are our best tool for forecasting how an electorate will vote in a free and fair election. On the 8th of June 2017, the United Kingdom will go to the polls once more to elect members of parliament, with the Conservative party seen as the favourite to win, and the opposition Labour party (as of 3 days before polling day) seen as quickly gaining momentum.

This election is unusual in many ways, but one glaring feature is the widespread distrust of polling forecasts from all sides of the political spectrum. This is unsurprising, given that the majority of polls failed to correctly forecast the 2015 general election or the 2016 referendum on the UK membership of the European Union.

But are polls really that bad? After all, we know voting intention polls are imperfect; they have small sample sizes, may be biased by methodology, experience short-term swings by current events and struggle to reach certain demographics. Pollsters typically claim a margin of error of ±3 percentage points, based on internal validation of their methods. Fortunately we have another way of quantifying just how bad the polls are at forecasting election results – looking at past elections. To assess this, I collected published polls on voting intention for the two major parties (Labour and Conservatives) between the announcement of a general election and polling day going back to 1992. To quantify the error, we can take a snapshot of the polls  and compare the forecast against the outcome of the election.

MATLAB Handle Graphics

We often look at averages of polls, because any given poll will be subject to noise; either because any given sample will never be perfectly representative or because of biases in the methodology used by the pollster. And because opinion changes with time, we tend to look at the most recent polls. Here we have two alternatives, an average of all the polls conducted during the campaign period, weighted for recency[1], and an average of the last seven poll before polling day, as favoured by UK newspapers.

The mean error between polling averages and election results for this period is between 6 and 7 percentage points, depending on how you average polls. Clearly, this is not a very good forecasting model when elections over this period were decided by an average vote difference of 7 points between the two major parties. In other words, a typical post-1992 UK election is, on average, ‘too close to call’ based on polling forecasts with the error margin being as large as the point difference between the parties.

What does this tell us about the 2017 election? Well, the polls currently stand like this:

MATLAB Handle Graphics

With a weighted average of polls since the election was called on 19th April, the Conservatives stand with a more than ample margin of 18 percentage points. However, recent weeks have seen Labour clawing back some territory, with the last seven-poll average putting them 7 points behind the Conservatives. Based on the error rates self-reported by individual pollsters, or long-term projections this places us within safe territory for the Tory party. However, if we wish to take the recent polls and looking at the historical accuracy of poll forecasts taken as a cumulative model, it places us within a possible polling error upset.

MATLAB Handle Graphics

Taking the 60-odd polls conducted in the last month, we can model voter preference for the Labour or Conservative party as a normal distribution, which approximates the data fairly well[2]. We can then ask what is the probability that a larger share of the electorate votes for the underdog than for the favourite, i.e. a forecast upset. For the model derived exclusively from 2017 polling data, we can expect a 5% chance of an upset, which places the Conservatives in a secure place. However, the polling error rate is not well reflected in the internal variance of the polling sample, so we can adjust the error rate by the expected range seen in the historical data, i.e. ±7 percentage points. This simulation gives us a probability of a forecast upset of 11%, or to put it another way, if pollsters are making the same errors since 1992, there is a 11% chance they have wrongly forecast the Conservative party as securing more votes than the Labour party on the 8th June.

However, it should be noted this is not the whole story. UK general elections are not decided by a single party capturing the plurality votes, but by forming a majority in Westminster through a byzantine system of local seat elections. The First Pass The Post system results in a winner-take-all outcome for individual constituencies. If one candidate secures more votes than any single other candidate, they can, and often do, win the constituency with less than 50% of votes, meaning a majority of votes in that constituency did not contribute to the overall outcome of the election. Much has been written about First Pass The Post, but suffice to say for our discussion that this system makes translating voter intention polls to parliamentary seats a notoriously tricky problem. Second, UK governments can be formed by political coalitions when one party does not hold absolute majority in the House of Commons, as happened in 2010 when a Conservative and Liberal Democrat coalition assumed power. In this scenario it is not the overall majority of parliamentary seats that matter, but the proportion of seats controlled by each party. Both of these complications mean a forecast purely based on the share of the vote captured by each party is an insufficient model of the UK general election, and with a large error rate based on historical performance to boot.

What should we make of this? As it stands, the Conservatives retain a comfortable lead, but the error margin is much larger than you might guess by looking at 2017 polls alone. While it might be tempting to keep checking the worm of public opinion swing up and down on daily poll trackers, remember to take it with a pinch of salt.

—————————————————————————————————

[1] Polls are weighted with an exponential function, giving larger weights to polls conducted closer to polling day. For the simple seven-poll average, no weighting was applied.

[2] The Gaussian model explains >99% of the data (R2 = 0.9994). This model includes all publicly available polls carried out between the day the election was called (19th April 2017) and the time of writing (5th June 2017). The model does not weight samples by recency, nor by accuracy of the pollsters, both of which would be sensible additions for future models.

Polling data were obtained from UK Polling Report and Opinion Bee. There is a never-ending stream of UK election poll trackers, but I recommend checking out the Financial Times, Economist or Guardian for their sound methodology and excellent visualisations.

Advertisements

The excellent folk at the FiveThirtyEight Politics Podcast have been running an interesting exercise, where members of the public write in to say what topics surrounding the recent US Presidential election they have been discussing around their kitchen tables, and what reforms they would like to see made to the electoral system.

One comment that caught my attention was the following:

“Number one, why is election day not a national holiday where everyone should be able to go out and vote, and number two, [I propose] offering a $1000 tax credit when you prove that you voted.”

This is a fascinating idea for a multitude of reasons. Increasing participation in the electoral process has been much debated in the United States; rates of participation are typically around 50% of the electorate, low for the OECD club of developed countries. For comparison, Australia maintains exceptionally high participation rates at 91% in the last federal election, largely attributed to a policy of compulsory voting. Eligible voters who do not cast a ballot are fined a $20 AUD penalty, which despite being a relatively small amount of money ($15 USD or £12 GBP) is enough to drive high participation rates.

However, plenty of arguments have been made against compulsory voting, from ethical (is it democratic to force citizens to vote?) to the practical (does compulsory voting increase the rate of protest or erroneous votes?). For a variety of such reasons other democracies have been reticent to follow the Australian model of compulsory voting. Which is why the suggestion above is interesting, as it offers the carrot instead of the stick, so to speak. A $1000 tax credit is a very attractive proposition, and would surely draw many voters who would otherwise stay away from the polling booth on election day. But could the US, or indeed any country, bankroll such a massive effort to bring voters to the polls?

Let’s look at the numbers. The US had 251,107,000 eligible voters in the 2016 presidential election. The final number of participating voters is still unknown due to late counting in some states, but from the majority of states we can estimate a turnout rate of 59%. We have no idea of knowing how many of the 41% who stayed home would have been attracted to vote if a tax break had been on offer, or indeed how many of the already-voters would claim their tax break. But if we assume a financial worse case scenario, where all eligible voters turn out and all claim their allotted $1000 tax break, that would be a $251 billion deductible from the national coffers. For comparison, that is an amount roughly equivalent to the entire yearly budget of the US Department of the Treasury, or about half the budget of the Department of Defense.

But what about the first part of the listener’s suggestion? If everyone in the US stopped working for a day, would we see a significant cut to US economic productivity? Once more, let’s look at the numbers. The combined revenue of income and payroll tax for the current period stands at $2.91 trillion dollars, or 81% of all US government revenue. If we take out a slice corresponding to a single working day on election year, it would represent a $7.97 billon dollar loss to the Federal Reserve. While significant, it pales in comparison to the expense of providing a $1000 tax break to every voting citizen.

So the combined cost of this carrot-before-the-stick exercise would be in the region of $260 billion dollars. That is, suffice to say, a lot of money – it is roughly equivalent to the entire GDP of Chile or Pakistan. But is that a lot of money for the US government? The total US government revenue for the current fiscal year is estimated to be around $3.6 trillion, so our voter turnout programme would cost 7% of all revenue the government receives, or 1.4% of GDP. The current GDP growth rate of the US stands at a healthy 2.2%, so knocking it down by 1.4% would not automatically trigger a recession, but would significantly slow down the recovery from the 2008 financial crisis.

While obviously lost tax revenue is not directly convertible to GDP and the cost of any such programme of voting enticements would be spread over the four years between elections with special provisions for a newly instituted holiday, it is nevertheless a gargantuan amount of money, so keep that in mind next time you decide to give everyone a thousand dollars.

—————————————————————————————————

US Department yearly budgets are released by the Congressional Budget Office. Nominal GDP values per country, including the US, are from World Bank figures for 2015. Annual GDP growth figures are from 2016 estimates also from the World Bank.

Ah, 2016 – the electoral year that keeps on giving. From the UK EU membership referendum, a narrow miss for the Pirate Party in Iceland’s parliament, an Austrian presidential election postponed thanks to faulty glue, and over 100 other elections and referenda worldwide. And lest we forget, the ongoing US presidential election, now just five days away.

The race for the White House is currently disputed between the Democrat candidate Hillary Clinton and Republican candidate Donald Trump, with polls showing a tightening race in the final stretch. While the US is trapped in feverish speculation and much hand wringing at the latest poll results, the rest of the world watches silently as the next Western hegemon is chosen. But perhaps not so silently, as a few news outlets have taken to asking citizens outside the US about their opinion on the current electoral process. Given the significant international repercussions any electoral result will have, it poses an interesting question – if you are a citizen of the world, which candidate would you vote for?

Now, many of these are not scientific polls, but what polls analyst Nate Silver of FiveThirtyEight has taken to calling “clickers”, in that you put a straw poll on the internet and have visitors click on their preferred candidate. It should be immediately obvious that there are flaws with this approach, including a self-selection bias for respondents, no accounting for demographically balanced sampling and the opportunity for abuse by users casting votes multiple times. At best, “clickers” represent a minority of internet-using, English-speaking people who happened to come across your straw poll. And worst, they are fundamentally flawed exercises that tell us nothing useful.

One interesting exception is a recent poll conducted by WIN/Gallup International, using scientific polling methodology (Technical note: WIN/Gallup International is not affiliated with the more well known Gallup consortium, and I have not formed an opinion on the reliability of their work, but the methodological notes provided in the report check out, and are broadly in line with best practise in scientific polling). This poll asked people in 45 countries the following question:

“If you were to vote in the American election for President, who would you vote for?”

And the results are quite fascinating.

One popular interpretation of the unscientific “clicker” polls is that the world leans Democrat, while Russia is the lonesome supporter of the Republican candidate. Does this bear out in scientific polling? The short answer is yes: every other country surveyed gives Clinton an advantage except Russia, where 33% of respondents would vote for Trump compared to 10% who would support Hillary. What these stories ignore, however, is the clear majority of Russians surveyed did not side one way or the other, with 57% declaring “Don’t know/no response”. So the majority of Russians surveyed would not vote for Donald Trump, but instead have no defined opinion or declined to comment.

The second narrative that has emerged from worldwide polls is the overwhelming support for the Democrat candidate across the globe. Indeed, the average across the 45 nations surveyed gives Clinton a +50% lead, while current estimates give her a measly +3.1% lead in the US popular vote. It might shock some readers that there is such discrepancy between popular opinion in the US and the rest of the world, but this discrepancy is far from homogenous.

If we want to bridge the divide, we can then ask: what country from the 45 surveyed most resembles the US today? Let us look at the map:

worldvotes

Surprisingly, the country most similar in popular opinion to the US right now is China. With 53% favouring Clinton and 44% favouring Trump, that gives Clinton a 9% lead, still larger than the actual lead in US polls. China also has the lowest number of undecided respondents (3%) of the 45 countries surveyed at, even lower than the US electorate currently at 5%. Let us reflect on the fact that people literally on the other side of the world are more likely have made up their minds about these two candidates, than the citizens actually having to vote less than a week from now.

We can take this further and inspect polling not country by country, but state by state. One of the advantages of the unrelenting scrutiny of the electorate we see in US presidential elections, is that we get continuous polling from each individual state on a near-daily basis. For this step, we will take the weighted polling average for each state as provided by FiveThirtyEight and compare them with the individual polling results for non-US countries (Technical note: polling adjusted for recency, likely voter ratio, omitted third parties, and pollster house effects. Full methodology here).

statesascountries

The countries that are most similar in presidential candidate leaning to US states are far and away Russia (Trump +23%) and China (Clinton +9%). These two alone account for the majority states, 44 out of 50 (plus DC). This translates fairly well to the individual state polls – a strong preference for Trump in the Midwest and South, and lukewarm but widespread support for Clinton elsewhere. A handful of other states are most similar to four other countries (Lebanon, India, Bulgaria and Slovenia), all Democrat-leaning.

What does this exercise tell us? First, it shows how US political races divide opinion worldwide. While most of the 45 countries surveyed were Democrat-leaning, there is significant heterogeneity in the level of endorsement for Hillary Clinton, with strongest support in Western Europe and Latin America and weaker support in Africa and Asia (albeit from a limited sample of countries). Second, the story of “Russia supports Donald Trump” does not check out. A larger group of respondents in Russia expressed no opinion about the election than supporters of Clinton or Trump combined. Finally, state-wise analysis shows that worldwide opinion are not good models of regional US political leanings, with the extreme countries in the survey (Russia: strong Republican, China: lukewarm Democrat) being most similar to individual state polls.

If you are in the US, perhaps time to consider a vacation in a political soul mate across the globe?

—————————————————————————————————

Survey of opinions on US election in 45 countries carried out by WIN/Gallup International and released here. US national and state poll averages from FiveThirtyEight aggregates. Number of undecided voters aggregated by the Huffington Post. All polling indices current as of 3rd November 2016.

Referenda – love them or hate them, they are a mark of modern democracies. In the next few weeks, the United Kingdom will vote on whether to leave or remain in the European Union. It’s a historic vote, with significant repercussions not just for the UK, but also for the future of the European project.

Of course, when the stakes are high, the prognosticators get called in. Much has been made of the very tight race in the polls, with newspapers often lauding the results of the latest poll as the final say in the debate.

But any observer with a passing interest in statistics will know this is a misguided conclusion. Just because a poll is more recent, doesn’t mean it’s more accurate. Polls vary in their quality, representativeness, and will experience some natural variation, which is why seasoned observers will follow the trend in a population of polls instead.

As a quantitatively-minded person, I take particular offence at the way polls are being displayed in summary fashion, particular this heinous graphic at the Telegraph:

BrexitTelegraph

Polling, as it turns out, is a fantastic example of how easy it is to misguide, misdirect or downright lie with statistics. So let us look at the different messages we can glean from the EU referendum polls. As our source, we will take every leave/remain poll result from the comprehensive tracker made available by the Financial Times, going back to 2010.

First, we want to try and go beyond just parroting what the latest poll tell us. Using the most basic summary metric, the arithmetic mean, we can get an idea of what a population of polls tells us about public opinion over a six-year span. We can see the remain camp leads across polls with 2 percentage points. Good news for Europhiles.

Brexit1

However, this is a misleading conclusion. Firstly, we are ignoring how much opinion varies from one poll to the other – this could be due to biases of particular polling companies, of the method used, or simply noise in the sampling process. Secondly, it’s opinion now and not four years ago that decides a referendum, so we may want to apply a weighting scheme where older polls count less than more recent polls.

If we include such weighting, and add the standard deviation of opinion across polls, the story becomes a bit more muddled – no party has any clear advantage above the variance in the data.

Brexit2
Public opinion, of course, changes with time. A traditional way of displaying poll results is to show answers to the same polling question across an arbitrary timespan. This has the advantage of revealing any strong trends in time, but also leads us to over-emphasise the most recent results, be it because they represent the will of the people or because of spurious trends. With this approach, we can tell a very different story – the leave camp appears to lead in the most recent polls.

Brexit3

Taking a different approach, we can also look at polling as an additive exercise. If we take the difference in responses to leave and remain at each poll, it gives us a net plus-or-minus indicator of public opinion. We can then plot these differences as a cumulative sum over time, to estimate whether a given camp gains ‘momentum’ over a sustained period of time (this approach has garnered significant favour amongst news outlets covering the US presidential primaries);

Brexit4

In this case the leave camp not only comes on top on recent polls, but also is shown as having gained considerable momentum in the last few months of 2016, usually taken as indicative of a step change in popular opinion. A very different story from our original poll average.

The problem with these past approaches, is that they fail to encapsulate uncertainty in a meaningful way. We can take yet another treatment of the data, and ask what does our population of polls show across all samples, and fit a mathematical model that allows us to describe uncertainty.

Brexit5

Here, we see a histogram of individual poll outcomes and a simple Gaussian model of the responses, across six years of polling. We see that while there is significant spread in responses, overall the stay camp has an advantage, but still sits within the confidence zone of the leave camp; in other words, it’s pretty close. But what this, and most polling trackers often fail to acknowledge is the large number of undecided voters who could swing the referendum either way. On average, 17% of those polled were undecided, with 12% still undecided in the last month. If we include the uncertainty of undecided voters into our simple model, we can see a vast widening of our confidence margin;

Brexit6

And no significant advantage to either the leave or remain camps.

What this exercise demonstrates, is that data literacy goes beyond being sceptical of statistics in the news. Interpretation is not just dependent on knowing what you are being shown, but also on understanding that different data crunching approaches will favour different interpretations. We should be mindful of not just what data is shown, but how it is presented – data visualisation plays a large role in guiding us towards interpretation.

—————————————————————————————————

Poll trackers have been made available by the BBC, Economist, Telegraph and Financial Times.

Analysing outcome likelihoods in the real world is a risky business. But if all else fails, you can always rely on the one interest group that has a consistent stake in accurate outcome prediction – betting companies. OddsChecker currently has best odds for a leave vote at 11/4 (27% likely) and stay at 1/3 (75% likely). Make of that what you will.

On the 26th May 2016, the United Kingdom introduced a blanket ban on new psychoactive substances, widely described in the media as ‘legal highs’. Well, legal no more – under the Psychoactive Substances Act, production, distribution, or sale are now criminal offences. This is a troubling development for science-based policy and law enforcement.

What are, or rather were, legal highs? These are synthetic compounds that produce similar psychotropic effects to illegal drugs such as marijuana or cocaine, but have been designed to possess a different enough chemical structure to bypass existing UK legislation regulating their use and sale. They have seen increase popularity, with the UK being the largest market for legal highs in the EU. Crucially, their role in causing harm – potential for addiction, long-term health consequences and associated risks – is poorly understood.

Under the recently introduced legislation, such substances are now automatically illegal. The new Act defines them as:

“any substance which—

(a) is capable of producing a psychoactive effect in a person who consumes it, and (b) is not an exempted substance. […] a substance produces a psychoactive effect in a person if, by stimulating or depressing the person’s central nervous system, it affects the person’s mental functioning or emotional state.”

In summary, that is any substance that alters ‘mental functioning of emotional state’, and is not already regulated through some other legislation. This is a problematic definition, as it is far too broad to be useful, and it ignores the critical aspect that practitioners look for in drug legislation – the potential for causing harm. This contradiction manifests in two important ways: establishing whether a substance is psychoactive and the lack of evidence-based steering for legislation.

Establishing psychotropic status

Voices in law enforcement and the scientific community have already voiced their concerns, stating that such a ban is unenforceable. What do they mean by this? We first have to look at the way we define whether a substance produces a psychoactive effect. There is currently no predictive method to establish a causal relationship between the molecular structure of a substance, and their capacity to induce changes to cognition or emotion. The only route to ensure whether a given substance has a psychotropic effect is to conduct human clinical trials, querying the participant’s subject experience. Perversely, with the introduction of a blanket ban it is precisely this type of clinical trials that become difficult or impossible to conduct.

It is worth pointing out that, for example, coffee and alcohol would fall under this legislation, lest for the case that they are already regulated by existing laws. These are substances that as a society we consider as acceptable, either because their risk and health effects are considered mild enough (coffee), or because we accept a culture of consumption as a reasonable precedent, despite the fact that alcohol is far more dangerous than other regulated drugs.

Lack of evidence in legislation

The argument for the regulation of drugs is usually constructed around the concept of harm – harm to the self, due to addictive behaviour and long-term health risks, or harm to others, due to altered behaviour or financing of criminal drug enterprises. The troubling development in the UK courts is that we have little evidence on the risk of harm for most of the substances that fall under this new legislation.

While some of the substances tackled may indeed pose serious risk of harm, other may not, and this lack of evidence creates a scenario where individuals may be criminally prosecuted for dealing in a substance that has unproven capacity for harm. This is moving from evidence-based policy where the objective is harm reduction, to morality-based policy, stating that inducing altered cognitive or emotional states is inherently immoral and therefore illegal.

This approach is not good enough, for two reasons. First, because we live in a multicultural society where judgements based on plurality-defined morals and traditions cam exclude and stigmatise minorities, as we have already seen with the psychotropic khat and the Somali community. Second, the objective of such legislation should be first and foremost harm reduction, nor criminalising users. It is often the poor and vulnerable who are most at risk of both substance abuse and criminal convictions, creating a system for marginalisation of sectors of the population. As a society may decide this is acceptable in the name of harm reduction, when the evidence is available. When there is no evidence, such choices become even more difficult.

Then there is the question of criminality. One argument in favour of the bill is that it will force so-called ‘head shops’ to close, therefore reducing the legal loophole for providers of psychoactive substances. But critics have raised concerns that this will simply drive the supply underground, putting users in the hands of criminal gangs, and placing the narcotics trade even further away from the arm of the law. In addition, the past two decades of collective experience in substance abuse has shown that criminalisation is significantly less effective than clinical intervention in preventing narcotics trafficking and undermining the criminal enterprises that sustain it.

There is no question that unregulated substances may pose a significant risk to health, particularly amongst vulnerable users. But ultimately, it is the poor and disadvantaged who will likely suffer from the newly introduced legislation, and a lack of evidence-based policy is unlikely to reduce harm, and will not help those in the direst need.

—————————————————————————————————

A summary of the Psychoactive Substances Act 2016 is available here, as well as the full text.

Ah, it’s spring; the flowers are out, the sunshine pokes through the clouds and the smell of democracy is in the air. No, I am not talking about the Ghanaian or Philippine general elections, which are indeed taking place this year, or the whooping 8 elections and 1 referendum taking place in the United Kingdom. I am of course talking of star-spangled democracy with a capital D – the 58th United States presidential election.

As discussion of the presidential candidates heats up in the media, the casual reader might be forgiven for thinking this campaign has been going on for years. In fact, much has been written about the perceived length and strain of the presidential campaign, with many commentators labelling it as ‘absurd’.

This piqued my interest and made me wonder if the US election race is truly an interminable soporific, or whether it could be attributable to perception alone. After all, the US is a world leader in most international matters, and the election of the president is the high point of the political calendar, so extensive news coverage is to be expected.

To find out, we will need to compare the length of the US election to equivalent elections in other countries. I have chosen to do this with the other 33 OECD nations, the rich-world club of industrialised economies that are most similar to the US in terms of economic prosperity and demographic makeup.

Now, defining an election campaign is a tricky business. Some countries like Canada and Australia limit the duration of a political campaign by law, but most don’t. Others like Slovakia limit the funds available to each party for campaigning, while others have free-for-alls. Ultimately, we are interested in the number of days viable candidates do spend campaigning for their election in a major race in a given OECD country; be it general, presidential or parliamentary election.

We will therefore have to make some rules. When no legal limit is imposed, we will take the number of days between the first announcement of a major candidacy (so no Monster Raving Loony Party) and voting day. This works well for most countries except for the Scandinavian nations, where candidates typically don’t launch their campaign individually, but rather their party does. In such cases, we will take the last date for valid submission of candidacy as the starting point, and voting day as the end. In all cases, when legal limits are not present, we will use the last major election as indicative.

The United States is, unsurprisingly, an oddball case here. We can divide the US presidential campaign into two halves. First, the time between a major candidate announcing their intention to run for office and the party national convention, can be considered the first leg of the race. Following the convention, each party selects a candidate and the second leg goes from nomination to election day. A few other countries conduct this out-of-season political campaigning, most notably Italy, so we have included those as well.

For the 2016 presidential race, Hillary Clinton was the first major candidate to pronounce herself on 12th April 2015 towards the Democratic convention in late July 2016. The race then goes on until Election Day on the 8th November. How does that compare to the other OECD nations?

CampaignDurationBy a wide margin, the United States remains the undisputed leader of absurdly long election races. South Korea and Italy are notable outliers, both countries that engage in similar pre-campaign campaigning; but at a mere 241 and 190 campaign days in their last electoral cycles, they are dwarfed by the gigantic 576 days of US presidential campaigning. At the other end of the scale, Japan imposes a restrictive maximum of 12 days of political campaigning, with heavy fines for any infractions inching over the two-week line.

Regulating campaign time is a comparably crude method by which countries limit political opinion making. Regulating party financing is arguably a much more effective way of doing this, and much has been written about the successes and failures of financial approaches. Alas, the US election remains, as of now, a wide field of possibilities for financial backing of political campaigns. And while that continues to be the case, we are likely to continue enjoying the marathon races that make up the US presidential election.

—————————————————————————————————

While researching this article, I discovered the curious practise of electoral silence, where countries impose a complete ban on political advertising or campaigning on the day, or the day immediately before, an election. These range from a reasonable 24-hour ban in Bulgaria, to the ludicrous 15-day ban on polling in Italy. In Spain, this day preceding an election is delightfully called “reflection day”. Perhaps we should all reflect on that.

Talking to voters about LGBT issues changes bias

Ran one of the many headlines this week. “Hang on”, I though, “this seems awfully familiar…”

If it feels familiar, it’s because we have heard this story before. Specifically, in late 2014 when Michael LaCour and Donald Green published a flashy article in Science claiming a short conversation with a door-to-door gay canvasser was enough to change people’s opinion towards gay marriage, and maintain that opinion nine months later. The study received wide coverage, both for the large implications for political campaigning and reducing discriminatory bias, but also for the feel-good factor of the power of human interaction. “At least”, exclaimed the campaigners, “here is science proving human connection can break down prejudice”.

Alas, it was not meant to be. In May the following year, Broockman and Kalla, two graduate students at University of California, Berkeley published a comprehensive critique of LaCour’s paper. A thorough statistical analysis of the original data, as well as following up on the survey companies used for the study revealed several irregularities, which were hard to explain without accepting the data had been falsified. Shortly afterwards, the original paper was retracted.

So far the story is unusual, but not surprising. Cheating in science happens on occasion, particularly in the glamorous, high impact journals like Science. The story was a disappointment for LGBT campaigners, but a small victory for open science and the public availability of data. And that should have been the end of the story, except for those two graduate students back in Berkeley. Broockman & Kalla’s were deeply interested in the LaCour paper in the first place (they did write a 27 page analysis on it, after all), not because they set out to debunk it, but rather because they were planning their own follow-up study on canvassing techniques and opinion shifting.

That follow-up study is now published, and much to everyone’s surprise it seems to broadly agree with the trend in the (allegedly fraudulent) original paper. This new study shows that a 10-minute conversation with a canvasser can change people’s opinion about transgender issues in a similarly long-lasting way. But importantly, and here is where the study differs from LaCour’s interpretation, this effect is seen even if the canvasser are not transgender themselves, changing the focus of why the effect occurs. The original study was all about humanising an issue – by exposing voters to real, flesh-and-bone gay activists their opinion can be shifted by linking the issue to the person at their door, or so the logic ran. With their latest findings, Broockman suggests instead that the success of canvassing relies on a particular perspective-taking technique, and not with exposing prejudiced individuals to a gay or transgender canvasser to humanise them.

Many of the positive design features in the new study came about from the LaCour debacle. Recruitment techniques, canvassing methods and the perspective-taking approach were all informed by the ensuing debate, fine-tuned over the course of the fallout, and it could be argued that this new study wouldn’t have been as successful without it.

This story is therefore a tale of good science. By making the data for LaCour’s original study public, the irregularities were exposed and a better study came out of it. We found out the original interpretation was likely inaccurate, and now have a better idea of what works when changing people’s opinions (whether the interpretation of the original data has any validity in light of the alleged fraud remains an open question).

Most tellingly, Broockman and his colleagues remain committed to open science, having seen the benefit of inspecting other scientist’s data. They have published all of the data and code for their study, in case any amateur sleuth wishes to take a magnifying glass and scrutinise their findings – and, as we have learned, that can be a very good thing indeed.

—————————————————————————————————

Broockman & Kalla (2016) was published in Sciencealong with accompanying commentary. Some great coverage of the story has featured in NPR, FiveThirtyEight, and Wired.

On Wednesday the news broke that an artificial intelligence has finally cracked one of the most complex board games out there, Go. The announcement and subsequent paper from Google’s DeepMind showcases their brainchild, nicknamed AlphaGo. This computer program is not only capable of playing Go, but also defeated the European Go champion, Fan Hui back in October 2015. It is currently scheduled for a match against Lee Sedol in March, arguably the reigning world champion in the game.

If this story feels familiar, it’s because it is – IBM’s Deep Blue famously beat chess grandmaster Garry Kasparov back in 1997, launching artificial intelligence (AI) into the public consciousness and starting a great pursuit of computer programs that could beat humans at every conceivable game of skill. The team at DeepMind announced early last year they had develop another AI which could beat humans at a whole swathe of classic arcade video games, from pong to space invaders.

The game of Go itself has a deep appeal to the mathematically minded, from baseball-like in-depth player statistics, to convoluted mathematical models for player rankings, so it is no surprise that it would appeal as a challenge for designers of artificial intelligence. But what caught my attention amongst the media flurry, was this statement by Demis Hassabis, CEO of Google DeepMind in interview with Nature:

“Go is a very ancient game, it is probably the most complex game humans play, it has more configurations on the board that there atoms in the universe […]”.

These kinds of statements are often hard to wrap your head around. From intuition, it does not make any sense: I can sit at my desk with a standard 19×19 Go board and the 361 stones it takes to fill it, and very laboriously make every possible combination on the board. This way I haven’t used up all the atoms in the universe, only those for a single set of Go that I have in front of me.

Dealing with very large sums such as the number of atoms in the universe is a notoriously difficult thing to grasp intuitively. Let us begin by looking at the actual numbers. What is the number of possible configurations on a Go board? For the standard 19×19 board, there are 361 positions where the two players can place their stones. In Go, any given position can be either empty, a black stone or a white stone, meaning any of the 361 positions can be in one of 3 states. Thus, the number of possible board configurations is 3361, or to put it in more standard notation, 1×10172. Of those, only about 1×10170 are legal positions, in other words not violating the rules of Go. Surprisingly, the exact number of legal positions has been calculated as:

208168199381979984699478633344862770286522453884530548425639456820927419612738015378525648451698519643907259916015628128546089888314427129715319317557736620397247064840935

Which, I am sure you will agree, a rather large number indeed. Now, onto the universe. The usual number banded about is of 1×1082 atoms in the universe. In truth, any such estimate is made with some very strong assumptions. The typical approach an astrophysicist might take when thinking about this problem is to start with the number of stars in the universe. Computer simulations put that number at around 1×1023. Next we need to know how much stuff is in a star – based on the universe observable from Earth, each star weighs an average of 1×1035 grams. Next, a perhaps the most precise estimate in this equation, is that each gram of matter contains about 1×1024 protons. Finally, we can put all those numbers together:

1023 × 1035 ×1024 = 1082

Planets and other planetary bodies don’t make it into the calculation since stars are substantially more massive. So ultimately it is a very rough, back of an envelope estimate, which is probably in the correct region, give or take a few orders of magnitude.

Going back to the number of Go board positions, we can see that there is a massive difference between the two estimates. How can this be? The first way to conceptualise this is to think of not using a single Go board to make every combination, but to have many Go boards sitting side by side, each with a different combination of stones laid on top. As we established before, we would need 1×10172 individual boards to make every possible combination. How much room would that take? Laying them side-by-side in single file, the line of Go boards would measure 4×10168 km. For comparison, the diameter of the observable universe is about 1×1033 km.

Another helpful way to think about the problem is to consider how long it would take a person to set a single board to display every possible combination of stones. Let us assume a reasonable competent person can shift the stones to any new configuration in 20 seconds. Even better, let us assume we have a robot that can do the same, with the added advantage that it does not eat, sleep, or make mistakes. Starting with a blank board, how long would it take to complete every possible board combination? Using the values above, it would take 1×10163 years, on expressed fully:

1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

That is not only a very long time; it is many times longer than the age of the universe, which is 13.8 billion (109) years old.

This astounding number of board combinations is often cited as one of the reasons Go is particularly fiendish game; while it is true that Go is very complex, we as humans constantly engage in games that have fantastical numbers of possibilities from poker (106 possible hands) to chess (1010^50 possible games). Even humble Connect Four has 1013 legal playing positions, and most 10-year-olds seems to manage just fine. We are certainly capable of dealing with exponential complexity, just not very good at thinking about it.

—————————————————————————————————

The AlphaGo story was widely reported across the general press, and more extensively in science reporting.

Women ‘don’t understand’ fracking due to lack of education, industry chief claims

Or so ran the headline in The Independent newspaper today.

If you live in the US or the UK, it’s likely you have come across hydraulic fracturing or ‘fracking’ in the media. This relatively new approach at extracting fossil fuels has generated plenty of controversy, both for it’s potential to restore fuel security and drive down prices and rising concerns regarding environmental impact.

Now professor Averil MacDonald, chair of science engagement at the University of Reading has given a statement to The Times saying that women disagree with fracking because they “don’t understand” the science and instead rely on their gut reaction.

Ah, where to begin… I have three main complaints about this statement; that it contravenes a responsibility in scientific engagement, that it harms dialogue and finally that it focuses on the wrong point. I will deal with these in turn.

1. Duty to scientific engagement

Firstly, the idea of a professor of science engagement accusing women of effectively making the ‘wrong choice’ because they are uneducated is both short-sighted and potentially derelict of a scientific duty to put evidence first.

Professor MacDonald’s opinions are clearly partisan – she has recently been appointed chair of UK Onshore Oil and Gas, an industry group – and she makes no pretence otherwise. That is not a bad thing; scientists, like other citizens, can take sides on public issues, it’s part of what makes a democracy vibrant. However, her responsibility as chair of scientific engagement is to provide a balanced account that puts evidence first, as is the role of all scientific engagement with the public. Whether she states these opinions in her capacity, as UKOOG chair is irrelevant – her academic responsibility extends to all public engagement activities, a point sorely missing in discussions of the recent Tim Hunt affair.

For someone who emphasises the role of ‘facts’ in determining the role of fracking in the UK, it is rather curious that any evidence on scientific education or decision making in women is conspicuously absent form her argument.

2. Harming dialogue

At a fundamental level, what could possibly be accomplished by making these public statements? If you are a woman who disagrees with fracking, you will likely feel patronised and potentially less likely to listen to arguments from the pro-fracking camp, given that instead of communicating evidence they are now resorting to unnecessarily gendered attacks. If you are a woman who agrees with fracking, you will also likely also feel patronised, because the argument made is that you have somehow subverted your automatic emotional response (you are, after all, “naturally protective of your children” unlike men, according to MacDonald) and have risen to accept facts. And if you are a man, you are perhaps, like me, wondering what all of this has to do with fracking in the first place.

The issue of fracking is a complex, multi-faceted issue that draws in considerations of economic development, energy security, fuel efficiency, environmental impact and disruption of rural communities, among others. It is certainly not an issue that can be reduced to “women don’t know better because they are poorly educated”. First, suggesting that someone cannot make an informed decision because of a lack of formal education is belittling and elitist, and dismisses the important democratic contribution of opinions from all sectors of society. Second, while women are under-represented in science and engineering, things are changing fast – many young women today will have an excellent scientific education and will be able to digest the complex evidence for and against fracking.

In short, these kind of reductionist attacks will do more to alienate the kind of people that both sides need to appeal to, people who do want to be convinced by evidence, but who feel pushed away by sweeping generalisations and unfair portrayal in the media.

3. Focusing on the wrong point

This may strike our readers as obvious, but talking about why one group or another is for or against an issue, is not addressing the issue. If we are to have a constructive dialogue between those wishing to improve the financial and energy development of the this country, and those wishing to minimise the environmental impact and risks to human health, we need a common ground, and that common ground is provided by evidence. Focusing on evidence allows both sides to present their cases in a rational framework and most importantly, allows for compromise to be reached over facts that are agreed to by both sides. And attacking the education of decision-making skills of your opponent most certainly does not fall under this category.

——————————————————————————————

The Times (paywall) published the original interview with Professor MacDonald, which has also been covered by The Independent, Telegraph, Guardian and others.

I had a recent conversation that went something like this:

“Hey, you know about brains. What about Boltzmann brains? Can they exist?”

If, like me, you are not versed in statistical physics, you might as surprised as I was. Boltzmann brains? are we talking about 19th century physicist Ludwig Boltzmann’s actual brain, or some crazy AI project?

As it turns out, neither. Boltzmann brains are a logical argument that speak deeply to cosmology, stochastic systems and our intuitions about the universe – and very little about actual brains.

The argument goes like this: you have a universe around you governed by the laws of thermodynamics. Specifically the second law stipulates that in a closed system, any physical process leads to an increase in entropy; in other words, disorder amongst the component atoms inside the system. The universe is a closed system (i.e. it is finite) and therefore it tends to increase in entropy. Therefore, the ideal state for our universe is of thermodynamic equilibrium, in other words, high entropy. However, the world that surrounds us is not like that, we observe order everywhere, from stars and galaxies to the presence of life, which indicates low entropy in the system. So we are left with a sticky situation: thermodynamics predicts high entropy, and we observe low entropy.

One solution would be to theorise the universe as mostly being in a state of high entropy, which occasionally fluctuates into a low entropy state and gives rise to our reality, before being quickly ushered back into chaos. This is when Boltzmann comes in – while he sees the universe as fundamentally statistical system, fluctuating between states of varying entropy, the idea of our perceptible reality being a statistical fluke appeared to him as nonsense.

The problem is, every increasing step of complexity requires a increasingly rare statistical fluke of low entropy to give rise to it. So a bit of dust clumping together is more likely than a planet forming. Similarly, a star or a galaxy are increasingly more improbably scenarios in our thermodynamic balanced universe. Taking this argument to the extreme, if you wish to create a scenario where a series of unlikely stochastic processes from a maximally entropic starting point lead to a sentient being, it would vastly more probable that a) such sentience would be made up from the minimum number of components necessary, for example a lonesome brain without a body and b) that such sentience would exist for the smallest amount of time possible. It therefore follows that in this universe any sentient being is far more likely to be a single, floating brain immersed in the a chaotic universe flickering in and out of existence by sheer randomness, rather than the complex pastiche of order we see in our low entropy universe.

Boltzmann brains are therefore, a reductio ad absurdum of the argument for our observable universe being a statistical fluke from thermal equilibrium. More fundamentally however, it’s also a paradox – the second law of thermodynamics holds in virtually every conceivable scenario covered by classical physics and yet our universe is ordered and low-entropic.

In essence, Boltzmann brain can’t exist – it is in fact a tool in cosmological theory for arguing a theory is flawed, if it predicts their presence. And as I promised, they have little to do with actual brains. But they do speak of some fundamental concepts about the way we think about the physical universe. Firstly, and probably most appealing to Ludwig Boltzmann, that entropy is fundamentally statistical and our universe follows its rules. Secondly, that our current understanding of the universe is not reconcilable with fundamental thermodynamics, at least not without considering some more exotic theories about how the universe works, which is what physics is all about. And finally, it allows us to consider how unlikely sentience is, in the ‘high-entropy stochastic process’ sense; it is perhaps the most unlikely of all the observable things we see in the universe.

So cheers to you, you vanishingly unlikely statistical fluke atop a high-entropy universe.

——————————————————————————————

An awful lot has been written about Boltzmann’s brain including a great series of articles for Discover Magazine, and in the New York Times. There’s even a video, if that’s more your thing. Sadly, I could not find any literature on Boltzmann’s actual brain.