Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
From
"Hoffman, George" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: RE: RE: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
Date
Tue, 30 Nov 2010 10:59:49 -0600
I've settled on a two-line solution.
In the process, I've discovered that my favorite user-written command, defv, will take bysort as an option also!
Example:
defv bysort id (hour): varx50sum = sum(varx<50)
defv bysort id (varx) : varx50sum = . if missing(varx[1]) & missing(varx[_N])
STB-51 dm50.1 . . . . . . . . . . . . . . . . . . . . . . . . Update to defv
(help defv if installed) . . . . . . . . . . . . . . . J. R. Gleason
9/99 p.2; STB Reprints Vol 9, pp.14--15
updated to Stata 6 and improved
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Tuesday, November 30, 2010 4:48 AM
To: '[email protected]'
Subject: st: RE: RE: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
That is possible as Dimitriy pointed out, but it wouldn't make the problem soluble in one line, as the missings would just be reversed in time.
Nick
[email protected]
Dimitriy Masterov
=================
I think you can construct a "fake" hour variable easily:
gsort id -hour;
bys id: gen hour2=_n;
bysort id hour2: do your thing
Hoffman, George
===============
Thanks again.
If only -by, and -bysort, could take a reverse modifier (like gsort id -hour)
Nick Cox
=========
You could package this, but at root Stata needs to look at _all_ the values for a panel before it can decide that _all_ are missing. Hence I think there isn't a one-line solution, except trivially if you write a program to do it.
Nick Cox
========
This came up in a different form a few days ago. See my post on 24 Nov
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1011/date/article-968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Hoffman, George
===============
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But - if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
===============
This works. Thanks!
Nick Cox
========
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
===============
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0-100 and missing
Nick Cox
========
We need to know more about how -hour- is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
===============
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/