| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: keeping an inventory of dropped observations
Friedrich Hueblet had an interesting suggestion. Here are some
further comments.
There are two problems with your code. The simpler one is the need to
remove typos:
gen mark_for_drop = 0
gen reason_for_drop = ""
replace mark_for_drop = 1 if eodlymph == 99
replace reason_for_drop = "missing lymph" if eodlymph == 99
...
drop if eodlymph == 99
...
tab mark_for_drop reason_for_drop if mark_for_drop == 1
The more fundamental one is that you can't have it both ways. Once
observations have been -drop-ped, they are not available to -tabulate-
(or do anything else with). You are not going to see "missing lymph" in
your table because you already -drop-ped those observations.
You are here combining two good ideas that don't really combine. The
first is to document data management by a -log- or -cmdlog- file. It is
perhaps worth underlining that you can annotate this as desired with
comments:
* problem: missing lymph if eodlymph == 99
drop if eodlymph == 99
In general, you would need also to document the use of -keep-,
-reshape-, -contract-, etc.
The second is to record, within a dataset, which observations are not
included in a particular analysis. The only way that you can easily do
that is by not -drop-ping them, but marking them in some way, and you
are most of the way there. The -mark- command offers some technique, but
it is as easy to reinvent it. Programmers' standard techniques can be
borrowed. See also, in due course, a Tip from Ben Jann in Stata Journal
7(2).
gen byte touse = 1
gen problem = ""
replace touse = 0 if eodlymph == 99
replace problem = problem + "missing lymph; " if eodlymph == 99
...
<analysis> if touse
...
tab problem if !touse
Note the use of
replace problem = problem + "<reason>; " if ...
which might be advisable if observations could be problematic for more
than one
reason. Of course, you may need to use abbreviations, codes, etc.
Another possibility is the use of -notes-, but I don't think that is
what you are looking for really.
Nick
[email protected]
Michael McCulloch <[email protected]>
While cleaning a dataset, I'm periodically dropping observations that
meet certain criteria, for example:
drop if eodlymph==99
Since this occurs very often within a long do-file, I'd like to keep
an inventory of dropped observations & my reason for doing so. Aside
from manually searching through my log file, is there a more elegant
way than what I suggest below, to do this?
For example:
gen mark_for_drop=0
gen reason_for_drop=.
replace mark_for_drop=1 if eodlymph==99
replace reason_for_drop="missing lymph" if eodlymph==99
...
drop if eodlymph==99
...
tab mark_for_drop reason_for_drop if reason_for_drop==1
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/