Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: transform data from spell format into ordinary panel data
From
"Seed, Paul" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: transform data from spell format into ordinary panel data
Date
Fri, 16 Aug 2013 09:32:21 +0000
Dear Statalist,
Nick Cox is right as usual; but Darjusch Tafreschi
appears to also want a rectangular data set with
entries also for the months of unemployment.
One extra line is needed.
****************************
** End Stata code
****************************
clear
input panelid spellid t1 t2
1 1 502 503
1 2 504 604
2 1 502 555
2 2 556 600
2 3 601 604
3 1 550 553
end
gen nspell = t2 - t1 + 1
expand nspell
bysort panelid spellid : gen t = t1[1] + _n - 1
fillin panelid spellid
****************************
** End Stata code
****************************
Date: Thu, 15 Aug 2013 16:55:12 +0100
From: Nick Cox <[email protected]>
Subject: Re: st: transform data from spell format into ordinary panel data
Stata is very, very good at these problems. Here is one way, and there
may be user-written programs or official commands that are even
quicker.
. input panelid spellid t1 t2
panelid spellid t1 t2
1. 1 1 502 503
2. 1 2 504 604
3. 2 1 502 555
4. 2 2 556 600
5. 2 3 601 604
6. 3 1 550 553
7. end
. gen nspell = t2 - t1 + 1
. expand nspell
(204 observations created)
. bysort panelid spellid : gen t = t1[1] + _n - 1
Nick
[email protected]
On 15 August 2013 16:36, Darjusch Tafreschi <[email protected]> wrote:
> the title pretty much describes my problem:
>
> I have a data set that contains persons and their employment episodes in the following format which I'm used to call "spell format " (not sure if thats a common expression (?). It is structured as follows:
>
>
> Person-ID | Emploment-Episode-ID | start | end | Income | sector? | hrsperweek ...
>
> Any person can have multiple employment spells, each with start, end, income, hoursperweek worked and a bunch of more variables. Moreover, the durations of the employment states can vary across and within persons.
>
> The date is not in a typical day-month-year format, but represented by a number that represents the time elapsed since 1970/01/01.
>
>
> It looks like this then:
>
> 1 1 502 503 3.500 € public sector 42 hrsperweek
> 1 2 504 604 3.900 € public sector 42 hrsperweek
>
> 2 1 502 555 2.200 € private sector 20 hrsperweek
> 2 2 556 600 4.000 € private sector 42 hrsperweek
> 2 3 601 604 4.500 € private sector 40 hrsperweek
>
> 3 1 550 553 1.500 € self-employed 60 hrspwerweek
>
>
> I hope you can see that not necessarily the whole time period is covered, there can be gaps in which persons have been unemployed or studying or whatever.
>
> I would like to transform this data into something like a standard balanced panel dataset which gives me the state for every person in every month over the whole period (in this example the period 502-604). In particular it should look like this:
>
> Month | Person-ID | Emploment-Episode-ID | Income | sector? | hrsperweek ...
>
> In the end it shold be a HUGE data file looking like this:
>
> 502 1 1 ...
> 502 2 1 ...
> 502 3 -
> 503 1 1 ...
> 503 2 1 ...
> 503 3 -
> 504 1 2 ...
> 504 2 1 ...
> 504 3 -
>
> and so on.
>
>
> I looked into statas survival capabilities, but am not sure if those are really helpful here.
>
> Can anyone tell me how to approach my problem??
Paul T Seed, Senior Lecturer in Medical Statistics,
Division of Women’s Health, King’s College London
Women’s Health Academic Centre, King's Health Partners
(+44) (0) 20 7188 3642.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/