Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]
Date
Wed, 7 Sep 2011 07:38:31 +0100
That's interestingly tricky. Here's one way to do it. Let's first
initialise a variable
gen myindicator = 0
Let's get the (string) suffixes 0806-0110 spelled out one by one to work with
unab LFS : LFS*
local LFS : subinstr local LFS "LFS" "", all
So we want to add in each LFS variable if and only if any of the
-date?- variables gives the corresponding date:
qui foreach v of local LFS {
replace myindicator = myindicator + LFS`v' if inlist("`v'",
date1, date2, date3, date4, date5, date6)
}
replace myindicator = myindicator == 6
You could try looping over the -date?- instead, but I think the above
should work.
See on the -inlist()- trick
http://www.stata.com/statalist/archive/2011-04/msg00618.html
and more generally
SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
(help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
Q1/09 SJ 9(1):137--157
shows how to exploit functions, egen functions, and Mata
for working rowwise; rowsort and rowranks are introduced
for a survey of working row-wise. The best single line of advice,
however, is to -reshape- panel data like yours to long, as most things
are easier that way.
Nick
On Wed, Sep 7, 2011 at 5:04 AM, Fry, Jane <[email protected]> wrote:
> I'm a bit new to data manipulation using Stata and I have a query: I'd like to set up an indicator variable based on the sum of the values in a selection of other variables.
>
> So, in my dataset I have variables on individual characteristics (like birth month and year) and a series of binary variables on labour force status (in/out) for consecutive months and years from Aug 2006 - Jan 2010:
> LFS0806 LFS0906 LFS1006 ... LFS1109 LFS1209 LFS0110.
>
> I would like to create a binary indicator variable to show whether or not an individual is in the labour force for 6 consecutive months --
> e.g. LFS0107, ... , LFS0607=1.
> The tricky bit is that the 6 month window for each individual ends in the month when they turn 25 -- i.e. the window shifts according to birthday.
>
> I have set up an 'initial date' identifier variable (date1) that tells me when to begin the window and a 'final date' identifier variable (date2) that tells me when to end the window. So date1 and date2 are string variables of the form "MMYY".
>
> e.g. for the first observation, date1="0107" and date2="0607", so LFS0107 ... LFS0607 are relevant here.
> for the next observation, date1="0906" and date2="0307", so LFS0906 ... LFS0307 are relevant here.
>
> I think what I need to do is generate a new variable X=. and then replace its values (for each individual) with a 1 or 0 if the sum of the relevant LFS variables is 6.
> i.e. the sum of LFSMMYY to LFS(MM+6)YY = 6 (or each LFS is 1).
>
> Trouble is, I don't know how to do it. I thought something like an egen X = rowtotal("LFS"+date1 - "LFS"+date2) might work but I was wrong! Is there anyone who can help?
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/