Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Counting Number of Program Days
From
Phil Clayton <[email protected]>
To
[email protected]
Subject
Re: st: Counting Number of Program Days
Date
Tue, 1 Nov 2011 14:32:10 +1100
Hi Nikki,
I would do this by reshaping the data to long format.
Phil
. * enter data
. clear
. input str10 First str10 Last A10_01_10 A10_05_10 A10_010_10 A10_11_10
First Last A10_01_10 A10_05_10 A10_010~0 A10_11_10
1. "Jane" "Doe" 1 1 . .
2. "John" "Doe" . 1 0 1
3. end
.
. * clean up variable name/s
. * (alternatively you could clean these up after reshaping)
. rename A10_010_10 A10_10_10
. list, clean noobs
First Last A10_01~0 A10_05~0 A10_10~0 A10_11~0
Jane Doe 1 1 . .
John Doe . 1 0 1
.
. * reshape to long format
. reshape long A, i(First Last) j(datestr) string
(note: j = 10_01_10 10_05_10 10_10_10 10_11_10)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 2 -> 8
Number of variables 6 -> 4
j variable (4 values) -> datestr
xij variables:
A10_01_10 A10_05_10 ... A10_11_10 -> A
-----------------------------------------------------------------------------
. rename A attended
. replace attended=0 if missing(attended)
(3 real changes made)
. list, clean noobs
First Last datestr attended
Jane Doe 10_01_10 1
Jane Doe 10_05_10 1
Jane Doe 10_10_10 0
Jane Doe 10_11_10 0
John Doe 10_01_10 0
John Doe 10_05_10 1
John Doe 10_10_10 0
John Doe 10_11_10 1
.
. * calculate first and final attendance dates for each person
. gen date=date(datestr, "MD20Y")
. egen startdate=min(date) if attended, by(First Last)
(4 missing values generated)
. egen enddate=max(date) if attended, by(First Last)
(4 missing values generated)
. bysort First Last (startdate): replace startdate=startdate[1]
(4 real changes made)
. bysort First Last (enddate): replace enddate=enddate[1]
(4 real changes made)
. format %td date startdate enddate
. list, clean noobs
First Last datestr attended date startdate enddate
Jane Doe 10_01_10 1 01oct2010 01oct2010 05oct2010
Jane Doe 10_05_10 1 05oct2010 01oct2010 05oct2010
Jane Doe 10_11_10 0 11oct2010 01oct2010 05oct2010
Jane Doe 10_10_10 0 10oct2010 01oct2010 05oct2010
John Doe 10_05_10 1 05oct2010 05oct2010 11oct2010
John Doe 10_11_10 1 11oct2010 05oct2010 11oct2010
John Doe 10_10_10 0 10oct2010 05oct2010 11oct2010
John Doe 10_01_10 0 01oct2010 05oct2010 11oct2010
.
. * for each date, could that person have attended?
. gen byte couldattend=date>=startdate & date<=enddate
.
. * sum up the possible attendances per person
. egen maxpossible=sum(couldattend), by(First Last)
.
. list, clean noobs
First Last datestr attended date startdate enddate coulda~d maxpos~e
Jane Doe 10_01_10 1 01oct2010 01oct2010 05oct2010 1 2
Jane Doe 10_05_10 1 05oct2010 01oct2010 05oct2010 1 2
Jane Doe 10_11_10 0 11oct2010 01oct2010 05oct2010 0 2
Jane Doe 10_10_10 0 10oct2010 01oct2010 05oct2010 0 2
John Doe 10_05_10 1 05oct2010 05oct2010 11oct2010 1 3
John Doe 10_11_10 1 11oct2010 05oct2010 11oct2010 1 3
John Doe 10_10_10 0 10oct2010 05oct2010 11oct2010 1 3
John Doe 10_01_10 0 01oct2010 05oct2010 11oct2010 0 3
.
. * or instead of the last egen you could just collapse the dataset
. collapse (sum) couldattend, by(First Last)
. list, clean noobs
First Last coulda~d
Jane Doe 2
John Doe 3
.
On 01/11/2011, at 1:45 PM, Nicole Johnson wrote:
> Hi all,
>
> I have a dataset that is basically set up like an attendance roll book. It has the person’s name and then each variable is a date that the program was held. The person has a 1 if they attended that day. It looks like this:
>
> First Last A10_01_10 A10_05_10 A10_010_10 A10_11_10
> Jane Doe 1 1 . .
> John Doe . 1 0 1
>
> The records go from October through June, but the program did not meet every day. As noted above, the variable names indicate the date. I was able to use a loop to extract the date of first attendance and last attendance, but I need to now calculate the total number of days the person ‘could’ have attended the program between their date of first attendance and date of last attendance. SO in the above example I would be able to say that John Doe attended 2 out of 3 possible program days. Of course since the data in my dataset has many more dates, this is much harder! Any help is appreciated.
>
> I guess I should mention I used the following to calculate some additional variables that may be of use which include string values for date first attended that match the variable names and date values, also the total number of program days.
>
> Any help is much appreciated – thank you!
> Nikki
>
> ***Macro to find first date of attendance and create string variable 'firstfound'
> local first 1
> gen firstfound = ""
> foreach v of varlist A10_01_2008-A06_20_2009 {
> replace firstfound = "`v'" if `v' == `first' & missing(firstfound)
> }
>
> ***Macro to find last date of attendance and create string variable 'lastfound'
> local last 1
> gen lastfound = ""
> foreach v of varlist A10_01_2008-A06_20_2009 {
> replace lastfound = "`v'" if `v' == `last'
> }
>
> ***Transforming string 'firstfound' into date value first_attend_0809
> . gen firstfound1=substr(firstfound, 2, 10)
> . generate first_attend_0809=date(firstfound1,"MDY")
> . format first_attend_0809 %td
>
> ***Transforming string 'lastfound' into date value last_attend_0809
> . gen lastfound1=substr(lastfound, 2, 10)
> . generate last_attend_0809=date(lastfound1,"MDY")
> . format last_attend_0809 %td
>
> local start firstfound
> gen days_possible = 0
> foreach v of varlist A10_01_2008-A06_20_2009 {
> replace days_possible = days_possible+1
> }
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/