Jump to content

Wikipedia talk:Combining sources

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

When are additional calculations made upon existing data or statistics considered original research rather than trivial calculations?

[edit]

Since 7 June 2015, this article has included a paragraph under the heading Trivially simple interpretations. that says:

At the same time there may be cases when an interpretation may only seem trivial. A notable case, which involved much debate in wikipedia, is combining data from various statistical tables. The main caution is that different source[s] may use different criteria in creating tables; they may not always be compatible, so that the combined table may be misleading.

Unfortunately, this paragraph does not cite or link to the notable case nor the following debate. Consequently, it is difficult to understand what exactly the author means by this cautionary statement and assess when the line leading to original research is crossed. Is the author suggesting that statistical tables should not be combined, and no calculations performed, or that statistics from the same source can be combined because the calculations won't produce misleading results?

The reason for my concern is that in an article about crime in New Zealand an editor has seen fit to add some tables of crime statistics that has been generated from New Zealand Police data and calculate crime rates using population estimates from Statistics New Zealand. Both the crime data and population data comes from Open Data tables published on Statistics New Zealand's NZ.Stat website. Later in the article there is a second statistical table that uses a different source of crime statistics based on a Police report. That report contains the same crime statistic as published in the first table, but also includes a table of population estimates that Police have used for calculating the crime rates in their report.

Comparing the Police population estimates with the Statistics New Zealand population estimates reveals they are different! The difference is about 1% and is only revealed in the second decimal place used for the crime rates in the second table when the Police have only published crime rates to one decimal place in their report. Most, but not all, of the crime rates published in both tables in the Wikipedia article can be obtained from the Police report. For the crime rates that are not in the Police report it is possible to calculate them from the raw data in the Police report, alone. Additionally, after searching the various Statistics New Zealand websites, I found an archived static web page containing details about calculating crime rates and a link to a separate table of population estimates that should be used. Because the Statistics New Zealand website has been archived and rebuilt after being destroyed during the 2016 Kaikoura earthquake, this information and the open data (NZ.Stat) tables are now reside on different websites. However, when viewed from the archives, it is apparent that there were originally separate calendar and fiscal year population estimates that went with the crime tables.

Clearly, combining the open data sources about crime and population using the Statistics New Zealand data tables is original research because the results differ from the official report produced by Police, even though the calculations would seem trivial and come from the same source. But does making these same seemingly trivial calculation constitute original research when they are made using the population estimates that are advised to be used to make those calculations, even though they come from different sources? - 210.86.82.145 (talk) 03:18, 10 February 2019 (UTC)[reply]

Combining different statistical collections

[edit]

A further complication is when there are 2 statistical collections that cover different time periods but measure similar things, but they are counted in slightly different ways. This is where things can get really murky. How should editors deal with situations where one statistical collection only provides data for part of a period and a second statistical collection covers the rest of the period? When is it original research to combine the statistical collections? Clearly the calculations required will not be routine as these will require more than simple arithmetic. However, if a recommended methodology is followed, can the originality be overcome?

For example: More recent crime victimisation statistics that could be included in the same Crime in New Zealand article discussed above are not compatible with the historic recorded crime statistics collection that was retired after 2014. However, the Police helpfully publish some methodologies that explain how to deal with presenting crime trends across the transition period. Since the new statistical collection is published as a monthly open data source and the historic collection was an annual collection, not only are the collections counted differently, they cover different time periods and have different scopes, so it is not possible to simply put the 2 sets of numbers side by side as this will be misleading. Yet if the methodology is followed properly then the 2 statistical collections can be usefully combined to show underlying trends in crime. Unfortunately, nobody is publishing source material that presents the available data in this way. With the increasing availability of open data this is going to present a problem for Wikipedia editors who want to present meaningful information in articles. While all the data sources are available and information about the recommended ways to combine those data sources can be cited and verified, the calculations themselves haven't usually been performed and published in a citeable source. While the calculations are not trivial, they can be verified from the source material. So is including the results of these these calculations in a Wikipedia article legitimate research for an article, or is it something original? - 210.86.82.145 (talk) 03:18, 10 February 2019 (UTC)[reply]