Tim,
"It depends" is usually a safe answer. You might want to keep them for
example if you run several analysis with different variables and there are
reasons why you wouldn't want to have identical samples for whatever
reasons. Or if you wanted to impute the missing values of course. Or simply
because it's easier to deal with the data when every subject has the same
number of observations.--After all, the missings were in the data all along.
Other than that I don't own the book you mention and can only assume the
same is true for at least some other members of statalist, too. Also the url
in your post contains two typos. So unfortunately I can't provide a more
specific answer.
Generally speaking, in many cases Stata simply "ignores" missing values in
analyses and therefore they do not affect the results (see below). To better
understand your specific problem it would be helpful if you could provide
more details, like what analysis in particular they perform in section 9.6
and an excerpt of the relevant lines from your log file.
Best,
Sven-Oliver
-------example: summary statistics reshaped vs. original online
data---------
. use "C:\downloads\fevwide.dta", clear
(Repeated measurements of FEV for three groups, coded wide)
. reshape long fev, i(id)
(note: j = 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48)
Data wide -> long
----------------------------------------------------------------------------
-
Number of obs. 57 -> 969
Number of variables 19 -> 4
j variable (17 values) -> _j
xij variables:
fev0 fev3 ... fev48 -> fev
----------------------------------------------------------------------------
-
. rename _j month
. d, s
Contains data
obs: 969 Repeated measurements of FEV
for t
> hree groups, coded wide
vars: 4
size: 20,349 (99.9% of memory free)
Sorted by: id month
Note: dataset has changed since last saved
. sum fev
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 663 42.59765 18.51655 10.12 110.81
. bysort grp: sum fev
----------------------------------------------------------------------------
----
-> grp = 1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 459 47.64026 18.58769 14.28 110.81
----------------------------------------------------------------------------
----
-> grp = 2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 146 28.86562 9.250675 10.12 65.02
----------------------------------------------------------------------------
----
-> grp = 3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 58 37.25828 16.47463 16.59 81.8
. use "C:\downloads\fevlong.dta", clear
(Repeated measurements of FEV for three groups, coded long)
. d, s
Contains data from C:\downloads\fevlong.dta
obs: 663 Repeated measurements of FEV
for t
> hree groups, coded long
vars: 4 20 Apr 2002 21:43
size: 14,586 (99.9% of memory free)
Sorted by: id month
*** #obs in long data set = #non-missing in reshaped wide data set!
. sum fev
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 663 42.59765 18.51655 10.12 110.81
. bysort grp: sum fev
----------------------------------------------------------------------------
----
-> grp = 1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 459 47.64026 18.58769 14.28 110.81
----------------------------------------------------------------------------
----
-> grp = 2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 146 28.86562 9.250675 10.12 65.02
----------------------------------------------------------------------------
----
-> grp = 3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
fev | 58 37.25828 16.47463 16.59 81.8
*** ==>statistics identical regardless if missings are dropped!
-------end example---------
> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Tim
> Sent: Sonntag, 20. Juli 2008 06:57
> To: [email protected]
> Subject: st: drop 'em OR it depends
>
> New semester starts in about a week.
> One thing I had difficulty with last semester was getting the data
> provided into the form needed for the analysis. I could get reshape to
> work, but had to look it up every time, and it still took several
> attempts every time.
> So I've been looking again at Hills and De Stavola, "A short
> introduction to Stata for biostatistics", chapter 9. (files at net from
> http://ww.stata.com.data/hs/; net get book)
> In section 9.5 they cover reshape.
> In section 9.6 they cover _N and _n.
> The examples in section 9.6 use the fevlong dataset. When I tried using
> fevwide reshaped to long, I did not get the results in the book. Only
> after dropping missing observations did it work.
>
> So my question is, should dropping missing obs be normal practice after
> reshaping from wide to long, or does it depend on what I want to do
> with
> the long dataset?
> And if I dont' drop 'em always, when do I keep them?
>
> Tim
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/