A Facebook note by De La Salle University professor Antonio P. Contreras has taken the social media site by storm. As of press time, the note has been shared 14,832 times. Contreras argues that there are deliberate attempts to rig the 2016 Philippine elections against vice presidential candidate Bongbong Marcos. In response to Contreras, a PhD candidate of UP School of Economics John Carlo Punongbayan published on his Facebook account a note to assail the accuracy of Contreras' allegations.
Weighing in on the Leni-Bongbong data debate: a growth rate perspective
The so-called statistical “analysis” of David Yap (Ateneo) and Antonio Contreras (La Salle) purporting to show evidence of systemic cheating in the VP race has inspired a torrent of responses and counter-analyses from many friends.
Some of the choicest responses include:
Adrian R. Mendoza (UPSE PhD candidate), “On spurious correlations and biased statistic(ian)s”
Peter Julian Cayton (UPSS asst. prof; ANU PhD student), “On the question of the Leni-Bongbong gap” and “Waves of transmission rates in the 2016 Philippine election cycle”
Arnold Lau (Columbia MA student), “Did Leni cheat? We don’t know, but trendlines are insufficient evidence”
Many of these pieces focus on Yap & Contreras’ poor appreciation of time series data and its nuances, like: the importance of “detrending” the data by using the technique called “first differencing”; the law of large numbers and regression to the mean; and the inappropriate use of ordinary least squares (OLS) regression on time series data that fail cointegration tests.
To many non-statisticians these jargon can easily seem abstruse and daunting. But the collective message so far, put simply, is that the analysis by Yap & Contreras is founded on misguided assumptions (e.g., a supposed randomness of the transmission of election results) and erroneous methods (e.g., inappropriate linear regressions), and these have all led to conclusions that are faulty and non sequitur. Such poor data analysis, however, has not prevented Yap & Contreras from gaining attention in many circles who see their analysis as sufficient proof of cheating (and good ammunition for anti-Marcos bashers).
Let me offer my own take on the matter, which focuses on the growth rates of Leni and Bongbong’s respective votes over time and how correlated they are. This is largely inspired by my days in NEDA (2014-2015) where my boss Sec. Balisacan trained me to become familiar with computing GDP growth rates to almost an uncomfortable degree. I would like to think that such degree of closeness with the data has its benefits.
(The following analysis comes from the real-time data shared to me by Edj Robleza and collated through the diligent efforts of Earl Anthony Villacarlos Bautista via Google Docs. In the interest of transparency I invite everyone to look at this common dataset to see the numbers for yourself. The link to the full dataset is here.)
The analysis is quite simple, really. We begin with the basics. Figure 1 below shows the cumulative votes garnered by Leni and Bongbong over time, from 5:45 PM of May 9 to 10:45 tonight, May 11. Even from this graph alone it is apparent that while their cumulative votes are closely tied together, many of Bongbong’s votes came in earlier while many of Leni’s votes came in later. In the graph this is shown by the fact that early on the red line can be found on top of the blue line, but later on this reversed.
(Unfortunately the public spreadsheet does not contain a provincial/regional breakdown of Leni and Bongbong’s votes, which could ideally be correlated with the national data to see which places were responsible for the influx of votes at any given time.)
The next Figure 2 shows the incremental/marginal votes that came in for Leni and Bongbong at each successive period reported. It provides a second way of looking at exactly the same dataset. Again, notice that these two trends are highly correlated (as in Figure 1), and a huge spike of votes came in at 6:13 PM of May 9, just over an hour after most polling precincts nationwide closed.
What we want to highlight here are the growth rates of these incremental votes, which we present in Figure 3 below. The bottomline is that a very high correlation of the growth rates between Leni and Bongbong’s votes (as seen below) means that when Bongbong’s votes increase/decrease in one period, Leni’s votes would tend to mirror this very closely. Consistency in this pattern, in turn, would tend to debunk rather than support any allegation of cheating on Leni’s part.
Let me expound a bit further. Recall that Yap & Contreras used linear regression fits to show that after some fateful hour (3 AM allegedly) the tide turned magically in favor of Leni, and for every added percent of precincts counted Leni “suspiciously” earned more than 40,000 votes in a seemingly systematic fashion. However, Figure 3 below shows that ever since the first votes came in on May 9, the growth of Bongbong’s votes was mirrored very closely by the growth of Leni’s votes. Whatever advantage Leni has derived thus far into the race came from the accumulation of very small gaps between these otherwise highly correlated growth rates.
The growth rate trends in Figure 3 can seem noisy and hard to process, and one way we can improve its presentation is by taking the moving average (MA) of the growth rates over time. This is another way of revealing core/long-term trends in time series data. Below we show the noticeably smoother 3-period MAs...
the 5-period MAs...
and the 7-period MAs...
The moving average technique would tend to improve the correlation of the trends, and indeed for the 3-, 5-, and 7-period MAs the correlation coefficients between Leni and Bongbong’s votes are 0.82, 0.95, and 0.97, respectively (the coefficient ranges from 0 to 1, and a higher value denotes a higher correlation). Again there does not seem to be any evidence to suggest that at any point of the race Leni’s votes grew disproportionately vis-à-vis the growth of Bongbong’s votes.
In fact the opposite trend seems to be happening in the very recent data points today (May 11), where in the past few hours Bongbong’s growing votes have not been mirrored by a similar growth in Leni’s votes. This could be a manifestation of the dominance of pro-Bongbong votes in the final batch of votes to be counted (e.g., absentee votes and those in far-flung areas in Mindanao). Put another way, Bongbong seems to be gaining (and Leni losing) in the past few hours of May 11. All we can hope is that the million-plus votes that have yet to be counted favor Leni instead of Bongbong.
In conclusion, we agree that the Yap-Contreras appraisal of linear trends of the gap between Leni and Bongbong’s votes is an erroneous exercise in time series analysis that cannot in any way support the conclusions of cheating/manipulation they are trumpeting. There are a number of techniques to analyze the VP vote data more appropriately and soundly, and our chosen method---a simple examination of growth rates, their moving averages, and their correlations---would tend to argue away from (and not toward) conclusions of cheating in the VP race.
A final word: Just as data can be used to reveal the truth, it can also be used to parade lies. Let’s just hope that the vast majority of the readers of the Yap-Contreras analysis will be able to separate the wheat from the chaff.
*The author is currently a PhD student at the UP School of Economics, where he also graduated summa cum laude and valedictorian in 2009. His professional experience spans the Philippine Center for Economic Development, the Securities and Exchange Commission, the World Bank Office in Manila, the Asia Foundation, the Energy Policy and Development Program, and the National Economic and Development Authority where he served as Head Executive Assistant from 2014-2015 during the term of former Sec. Arsenio M. Balisacan. The views expressed here do not necessarily reflect the views of the author’s affiliations.
RELATED: Signs of electronic electoral fraud?