Misleading statistics in the world of mass information
Tuesday, January 16, 2018
We live in a perpetual state of misinformation. In 2016, Oxford Dictionaries selected “post-truth” as its word of the year, an adjective “relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.” In 2017, Collins Dictionary chose “fake news” as its word of the year.
With society’s need and capacity for constant innovation and new discoveries, the scientific world has witnessed a significant increase in published research in the past 30 years—roughly nine per cent per year. The advent of the internet has boosted the availability of information and, in its wake, a variety of online journals have emerged; some willing to publish just about any research. The increasing use of social media aiding the spread of information combined with a genuine misunderstanding of statistics create the perfect recipe for mass misinformation.
In the digital age, not a day seems to go by without someone using debatable numbers to support even more debatable conclusions. Research, it seems, has become a race to publish, in which quality plays a role that is only secondary to recognition and discovery. The misleading statistics and false results published by academic sources and the media are often celebrated as “discoveries.”
Shared from person to person, with no apparent source, erroneous statistics manage to make it onto everybody's Facebook and Twitter feeds, and even slip into conversations. Yet, nobody seems to know where these statistics come from—and whether or not they are actually true. Even reputable articles, that can be found both online and in the physical press, use an impressive amount of numbers to seemingly validate their claims—yet fail to cite any respected source or expert.
Shane Terrillon, U3 Science, studies mathematics and statistics. He cautions that people tend to easily misinterpret statistical results. One common mistake is the assumption that correlation between data sets necessarily implies causation.
"People definitely pick and choose the statistics that they use for their articles to make their argument a bit more [...] credible," Terrillon said.
Terrillon takes numbers in the media with a grain of salt, and is weary of sharing them. But not everyone is as cautious, spreading misleading articles across social media feeds daily.
As of August 2017, 67 per cent of Americans get at least some of their news from social media platforms, according to a survey conducted by the Pew Research Center. News readership via social media has increased from 65 per cent a year prior, particularly among Americans 50 and over. The preferred three platforms survey takers reported using to source news were Twitter, Reddit, and Facebook.
Despite its popularity, social media might not be the best place to get accurate information. In the lead up to the 2016 U.S. presidential election The Wall Street Journal compared sources shared by users who reported they were either “very conservatively aligned” or “very liberal.” They compiled these sources into “blue feeds” and “red feeds” to illustrate the discrepancies in news offerings between Facebook users. Considering how many people increasingly use social media as a source of information for worldwide news, having selected facts and statistics floating around exacerbates the trend of misinformation plaguing our society. These tailored feeds create an “echo chamber” effect, wherein social media users only see content from friends and third-party sources that align with their own views.
This issue becomes even more troublesome when established media outlets also include dubious statistics in their reporting. In April 2017 Fox News came under fire for tweeting a graph with statistics that incorrectly suggested that President Trump had a much lower 100 days unemployment rate (after his first three months in office) than most other presidents over the course of the past 14 years. What the graph fails to show is that Obama’s rate was the outcome of the previous administration’s policies, while Trump inherited that of the Obama administration.
The primary cause behind the publication and spread of misleading statistics is an inadequate approach to the discipline. Russell Steele, associate professor in the Department of Mathematics and Statistics, thinks that misleading statistics are not wrong per se but are simply overstated.
“[The problem] has less to do with the results themselves not being correct, but rather the conclusions that you make on the basis of the evidence not being correct,” Steele said. “It really comes down to two things [...] generally people do not receive enough training in statistics from statisticians [...and thus] they don’t understand the limitations of statistical techniques.”
Understanding the limits of statistical investigation is key to the discipline itself.
“Statistics is a discipline [that], when it's functioning properly, [...] should be inherently skeptical," Steele explained.
Fellow statistician Yi Yang, assistant professor at McGill, echoes Steele’s point of view.
“Statistics is a discipline [that], when it's functioning properly, [...] should be inherently skeptical."
“Sometimes the researchers themselves, when they use some statistic, because they’re not trained in statistics [...] do not fully understand the meaning of the tools that they use,” Yang said. “They make some much stronger conclusion [...] when it should not be the case.”
This misleading data caused by inadequate approaches to the statistical discipline is then widely disseminated through publication. John Ioannidis, a professor in disease prevention at the School of Medicine at Stanford University, claimed that more than half of published research is false. He blames confirmation bias for such a significant occurrence.
Many methodologists are also pointing the finger at the p-value. Also known as the rejection region, the p-value is used to evaluate the likelihood of an event. If the p-value is very small, then this indicates a very low probability that the event falls outside of the interval of likely values. It is a very popular statistic used to determine the relevance of results, particularly in the biomedical field.
But, as Regina Muzzo noted in a Nature article, the p-value is far from the most reliable statistic. Its inventor Ronald Fisher, actually intended it as an informal first—not final—test for further examination.
"Yang suggests that industry professionals and researchers alike rely more on statisticians to ensure their methods are appropriate, rather than clinging onto the belief of a one-size-fits-all statistic."
In 2016, researchers from Stanford University School of Medicine examined millions of research abstracts between 1995 and 2015, and found that the mention of p-values more than doubled over the period, despite mounting evidence against its reliability. Considering that the research was all conducted in the biomedical field, this raises some concerns; not everybody has to care about p-values and confidence intervals, but everyone cherishes their health and well-being.
The habit of relying on a debatable statistic to determine statistical relevance—whether a hypothesis is plausible or not—is far from unique to the scientific world. Reliance on confidence statistics, much like the p-value used in scientific research, is one of the main factors that led to the financial market crash of 2008. Following the crisis, the world discovered that many banks and investment firms had been underestimating the risk of certain investments because of inadequate statistical assumptions. Investors were overconfident in the reliability of the “normal” distribution of probabilities, which poorly predicts days that do not follow the typical evolution of a trading cycle, and thus underestimated the likelihood of financial loss.
For such reasons, Yang suggests that industry professionals and researchers alike rely more on statisticians to ensure their methods are appropriate, rather than clinging onto the belief of a one-size-fits-all statistic. Similarly, journalists and others responsible for the dissemination of information should also seek the opinion of statisticians to evaluate the validity of their claims when citing research found on the Internet.
But even when consulting an accountable statistician, another problem can surface: The inability to challenge desired findings.
“If [researchers] get the answer [they] want, then [they] stop,” Steele said. “But if [they] don’t get the answer [they] want, [they] keep going.”
Too often, researchers will challenge and re-evaluate their methods only when the results are not in accordance with their hypothesis. Then, they will use alternate methods, review their data sample, and look for faults in the way the study was conducted—all in order to get a different conclusion. However, if the results align with the prediction, scientists may neglect to investigate any further.
Research data always carries the potential for mistakes, no matter how many attempts it takes to acquire the results. Regardless of the outcome, the methodology and the desire to explore all possible flaws, should be the same for both statisticians and researchers.
“[For] good statistical analyses, the [best] way to support your conclusion, [...is] once you have your answer, you try to break it,” Steele said. “[Researchers should] be more skeptical [...] and always challenge their results.”
Researchers aren’t the only culprits for the proliferation of misleading statistics. People share dubious articles because their lack of understanding of the numbers renders them unable to recognize the critical flaws in the “facts” presented. Sharing articles without understanding or fact-checking them can lend credibility to an otherwise baseless claim.
“If my friends or someone [I know] shared a stat [...] I would [be more likely to] believe it, because they’re someone that I trust [...] so their judgement I would [also] trust [...] instead of some random source.” Coco Wang, U1 Science, said.
Both Yang and Steele believe that statistics should be better integrated into the educational system. Without skepticism of the facts and figures in popular news articles, misinformation can spread rapidly via social media. If people better understood statistics, they would be more prone to challenge the information they read.
“[For] good statistical analyses, the [best] way to support your conclusion, [...is] once you have your answer, you try to break it.”
"Especially when you go through [the final years of high school], that’s your best chance [to educate people about statistics],"Steele said. “If people are learning about statistics in university, [for many of them] it’s probably too late [to change the way they think about statistics].”
Yang also believes that more elements of data science should be added to academic curriculums.
“People are realizing the importance of data science [...and] in different programs they started to add some flavour of the training of data science into statistics,” Yang said.
In the meantime, what is one to do as we wait for the world to catch up to the realization that numbers play an increasing role in our data-driven lives? Inspecting every cited research source in the morning newspaper, while attempting to chase down a cup of coffee, hardly sounds like much fun--or something that anyone would actually do.
The situation is especially tricky considering how many rely on social media to stay up to date with current news. In August 2017, Facebook attempted to resolve the issue with an algorithm to spot fake news. Using fact checkers, it tagged articles with suspicious content as potentially fake. But four months later, Facebook terminated the initiative, due to poor results. Moreover, algorithms in general are still extremely vulnerable to hacking and corruption operations which render them useless. Such algorithms wouldn't necessarily be able to prevent studies with misleading results from spreading on the web and social media.
Steele suggests taking a more philosophical response to the issue of misleading statistics:
“[People in the scientific world should be] more comfortable with saying ‘I think,’ instead of ‘I know,’” Steele said.
He believes that if researchers start being more skeptical, journalists would be more likely to follow suit when reporting their studies, as would the general public when looking for information.
After all, perhaps the problem of misleading statistics really is just a fear of not knowing; and if the general public accepts that sometimes an admission of uncertainty is preferable to blind and incorrect assertions, then there would be no need to look for numbers that aren't really there.