Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: RE: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
Date
Tue, 30 Nov 2010 10:48:19 +0000
That is possible as Dimitriy pointed out, but it wouldn't make the problem soluble in one line, as the missings would just be reversed in time.
Nick
[email protected]
Dimitriy Masterov
=================
I think you can construct a "fake" hour variable easily:
gsort id -hour;
bys id: gen hour2=_n;
bysort id hour2: do your thing
Hoffman, George
===============
Thanks again.
If only -by, and -bysort, could take a reverse modifier (like gsort id -hour)
Nick Cox
=========
You could package this, but at root Stata needs to look at _all_ the values for a panel before it can decide that _all_ are missing. Hence I think there isn't a one-line solution, except trivially if you write a program to do it.
Nick Cox
========
This came up in a different form a few days ago. See my post on 24 Nov
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1011/date/article-968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Hoffman, George
===============
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But - if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
===============
This works. Thanks!
Nick Cox
========
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
===============
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0-100 and missing
Nick Cox
========
We need to know more about how -hour- is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
===============
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/