EDITOR'S NOTE: Facebook note by De La Salle University (DLSU) professor Antonio P. Contreras has taken the social media site by storm. As of press time, the note has been shared 14,832 times. Contreras argues that there are deliberate attempts to rig the 2016 Philippine elections against vice presidential candidate Bongbong Marcos. In response to Contreras, Mariel Kho Fang, a DLSU alumna, published on her Facebook account a note to assail the accuracy of Contreras’ allegations.
Thanks for being patient in this, as well as your fervor and passion to give geeks and nerds some love + stand by what is right.
First things first - please see the attached database file and the newest “analysis” provided by Mr. David Yap.
While there are already so many who have spoken up against Sir David Yap and Sir Antonio Contreras (please see exhaustive list below). I promised to give my take on things. So this is what this post will be about. I wrote this with several collaborators from different walks of life - but all of us share an enthusiasm and respect for Math, reaching out to people and making things easy to understand. We stand for academic integrity, transparency and preserving the integrity of this election.
We believe that Math/Statistics should be used to illuminate - dapat nakakatulong sa pag-intindi ng tao - hindi ginagamit para manlinlang o mang-uto. Statistics by itself is not an absolute science, there are margins of errors and ultimately is not able to brand one kind of truth. However, it is a tool that can help us analyze the world fairly and with reason.
We tried to make this as understandable and as easy a read, so that everyone can appreciate and understand what we are saying.
Now - *cracks knuckles* bear with us. And if things are unclear - please feel free to reach out.
=== end of introductory section (there is a summary below) ===
To understand the issue at hand we summarized their claims into two:
Contreras & Yap’s Claims:
First, they noted that if the vote gap of Robredo and Marcos were plotted in a graph, for every one percent increase in transmission, there is a “constancy in increments in vote gaps.” Basically, they are worried that the graphs look too linear or too “clean,” concluding fraud, simply because it’s “too incredible to happen.”
Furthermore, they point out that the sudden reversal in the pattern of the vote gap between Robredo and Marcos as problematic -- it rises for some time in favor of BBM and then reverses until Leni gains the lead. They attempt to create more suspicion by pointing out that this is not true when comparing Robredo with other Vice Presidential candidates.
Notable Observations on their points:
1. They do not give us a basis of comparison for what a “legitimate” victory by Robredo would look like statistically. Based on their type of analysis, is it even possible for Robredo to earn a victory or to come close without cheating? It is unclear on their end.
2. Antonio Contreras in particular notes that “it is not the linearity that is bothersome but the constancy of increments.” And in the same post he claims that “the increments were almost uniform leading to the linear correlation.” This illustrates a lack of understanding of the criticisms based on linearity, because the concept of “uniform increments” and linearity are synonymous. It is the difference between saying “black surfaces cars drive on” and “roads”.
Analysis of Claims:
Question: Does a linear relationship prove cheating?
On Linear Trends
We performed the same analysis done by Contreras & Yap on other candidates, and we see that these candidates show the same linear trend as the Bongbong-Robredo graph. The attached photos on this post show the graphs for Escudero-Cayetano (Graph EC is the first PICTURE) and Drilon-TesdaMan (Graph DV is the second PICTURE).
As mentioned above, Contreras & Yap draw suspicion from (1) the linear shape of the “vote gap” graph and (2) the reversal seen in the Marcos-Robredo (MR) graph, from upward to downward.
However, what do the EC and DV graphs tell us?(1) Plotting the difference between any two candidates’ vote results typically results in a linear graph. (2) More importantly, whenever the two candidates exhibit a change in rank (e.g. a little before 70% transmission, Drilon became #1 in the senatorial race, overtaking Villanueva), this linear graph will exhibit a change in direction from upward to downward. In fact, the Drilon-TesdaMan graph looks strikingly similar to the BBM-Robredo graph.
So anytime there is a candidate that overtakes another candidate in any election, this type of trend can be drawn. Using the Contreras & Yap analysis, as long as the initial leader loses the lead, ANY ELECTION can be described as stained with cheating.
Why is this so? This is because, contrary to what Contreras & Yap claim, the data set is, in fact, NOT random. In statistics, the data would be considered “random” if (1) all regions transmitted votes at exactly the same pace (ie. at any point in time, all regions have the same transmittal rate eg. at 7pm all regions transmitted exactly 20% share of the votes) and (2) in each precinct, transmitted votes are drawn randomly.
Several people have already proven that this is not the case:
(1) (The most Facebook friendly) - VP Election Results data visualized by Reina Reyes, Mark Ruiz and Aly Yap
(2) Philippine Elections 2016 - VP Race Computer Simulation by Hyubs Ursua
The posts above show that there was a trend in vote transmission: some regions (Region I, Region II, NCR) were able to send in their votes earlier than others (Region V, VisMin in general), presumably due to faster Internet or other factors connections. In fact, the regions that were able to transmit faster favored certain candidates: Marcos, Cayetano, and Villanueva in our MR, EC, and DV examples, respectively. This resulted in the sharp change in direction of the lines.
Conclusion: The pattern is simply a manifestation of what slow (lags in) counting does. It does not prove cheating. The pattern only shows a trend that the votes counted fastest are often from the same area and thus create a temporary lead. This does not imply cheating; it implies slow counting in other areas. From their logic, any “come-from-behind” victory is cheating. Unless you want to go to the full extreme and say that Tesdaman, Cayetano, etc were all cheated. Hahaha. But even then, this does not prove that anomalies happened. Cheating was not proven; however, note that even the absence of it was not proven either.
Assessment of Mathematical Methodology:
First,as shown above - their underlying assumption that the transmission of votes is random is already false.
Second, let’s assess their methods of analyzing the data:
Let’s point out the extremely, extremely obvious. They should not be making conclusions and running regressions on 5 data points, 29 data points, and even 34 (still a stretch)
[Though unsound, for argument sake, we replicated their “analysis”. But again- honestly - this is wrong.]
(Woohoo! We can fit a line perfectly with 2 points!)
But seriously, the reason why we asked for them to #FreetheData was for us to first assess if they could even work with the data they had. The simple answer to this is they should not have made conclusions/observations on 5 points of data.(or 29/34) You need larger sample sizes to make sure that things are not just mere coincidences.
Funny exchange I had with a co-enthusiast :
“Masyadong konti ang 5 points. Parang 5 votes lang yan. If nagkamali ka ng pili sa buong pilipinas at ang 5 voters na nadampot mo, kay seneres. Patay na presidente po ang maeelect nyo.”
To learn more about why small base sizes are not reliable: great read here:https://web.facebook.com/arnoldlin…/posts/10154256505469673…
Conclusion: Base size for analysis is too small to make conclusions. Especially a conclusion as grave and major that there is cheating happening.
Next criticism is- on why he’s even regressing this Time Series. No matter how long your time series is, you can't just run a regression and make predictions out of that. This is a little more technical to explain. This link explains it more thoroughly for those who are interested [https://web.facebook.com/jpcvergara/posts/10208083954290243]
They say, that the relationships are linear; he can draw a line on it, but this is largely JUST BECAUSE of the time element. He did not take out the factor of time before assessing the relationships between the variables. linearity of the data does not automatically mean linearity of the variables being analyzed. there can be an underlying cause of the trend between two variables and in this case that cause is time.
Conclusion: In this case, time - and not foul play- is the more likely reason why things are linear.
Third, let’s assess the inherent limitations of any statistical tool.
Math is a powerful tool. It can be used to educate, or misguide. Statistics is a science of assessing probabilities. At best, he proved a correlation, not a causality. Again, the point is simple, showing any statistical relationship alone does not INDICATE at all that there is cheating.
Conclusion: Statistics does not represent absolute truth and should not be used as such. No damning conclusions can be drawn from here, especially at this point.
EVEN IF we assume that their methods are correct (and clearly they are not!), the “suspicious pattern” they raise is actually present in other candidate pairs, too. Making it not isolated, and simply showing a pattern based on the rate of transmission.
We reject the conclusion of Contreras & Yap of electoral cheating, because, despite having legitimate but very limited data, they had (1) questionable analysis and (2) unsound interpretation of their questionable analysis. This do not bring us anywhere near possibly concluding that there is electoral fraud. As a result, they have sown unjust doubt on the election results (as there are those who relied solely on their academic credentials, and believed all at once without a second thought)
A well-loved professor from DLSU used to tell us this - if you torture the data enough, it will sing. What occurred here was waterboarding of the data until a conclusion was forced
We urge everyone to be vigilant in discerning truth from conspiracy in social media; even when it shows seemingly correct data analysis.
I thank Sir David and Dr. Contreras for releasing their dataset and analysis (thank you everyone for making that happen!) It has allowed us to pursue the truth together. The two of them did the right thing in the end, and it allowed us to assess their claims. We don’t have any ill intentions, we only wanted to see the merits of the allegation, being these are not petty allegations.
Again, I don't dispute that Sir David Yap made an “extensive analysis.” I applaud his efforts. But upon review of their efforts and assumptions - I can now say that there were lapses on multiple fronts. Both their camp and ours believe in a free and fair democracy, we are all on the same side. We just have different points of view. We have well considered their side fairly, we hope they can properly and respectfully consider ours as well.
For more info, someone collated everyone who is speaking in contrary to Sir Yap and Sir Contreras. https://web.facebook.com/notes/jesus-lemuel-martin/vp-race-statistical-brouhaha-megamix/10156897807450052?__mref=message&_rdr"
RELATED: In defense of Leni Robredo’s lead