Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: editing
From
Phil Schumm <[email protected]>
To
<[email protected]>
Subject
Re: st: editing
Date
Fri, 12 Jul 2013 11:32:27 -0500
On Jul 12, 2013, at 10:39 AM, Donald Spady <[email protected]> wrote:
> Send it back. Tell them to clean up their product. I have had the same experience and one can spend hours and hours cleaning and checking and documenting. Often, you doesn't know what the 'right' entry is, and before you analyze the data you have to have to find out. The only benefit of such a data set is that every student should have one of these as an exercise. It is a great way to learn about checking for inconsistencies (logical and otherwise), and for learning how a data set can be damaged, so that they will not do so in the future.
Agreed, though I would just like to point out one caveat (which is not inconsistent with anything you have written above). Sending a dataset back for cleaning can often result in substantial errors being introduced, since the person assigned to clean it is often ill-equipped to the task (i.e., attempts to do things manually rather than programmatically, failure to follow proper standards for reproducibility, may be hurried or otherwise less careful, etc.). Thus, in cases where I am partially responsible for the product, I sometimes explicitly request that the person providing me the data not attempt to manipulate them. Of course, this is only possible when you have both the time and willingness to take this on.
Of course, the ideal situation is one in which a proper database is established at the beginning of a project, together with appropriate methods for extracting the data. Moreover, it is also true that in many cases where the data are poorly managed, they are also not worth analyzing (e.g., because the study was conducted so poorly). However, there are a few situations in which some data cleaning/reorganization is worth doing, such as with emergent collaborations among a large group of people (all of whom have collected and stored their data in different ways and with different levels of expertise).
But as you and Peter have pointed out, this can take a *long* time, and is only worth considering if the payoff is great enough.
-- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/