Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: how to get the mean value for this panel data?
From
Hey Sky <[email protected]>
To
[email protected]
Subject
Re: st: how to get the mean value for this panel data?
Date
Wed, 16 Feb 2011 19:30:00 -0800 (PST)
Hey David
thanks for your answer.
I have test the code you suggested and it works perfect for the example data. I
will test it tomorrow for the real data and believe it would not have problem.
if there has, I will come back and ask for your help. thanks for your kind help.
Have a Nice Evening
Nan
from Montreal
----- Original Message ----
From: David Kantor <[email protected]>
To: [email protected]
Sent: Wed, February 16, 2011 10:09:29 PM
Subject: Re: st: how to get the mean value for this panel data?
At 09:36 PM 2/16/2011, Hey Sky wrote:
>Dear Stataers
>
>I have met a problem to calculate mean value in panel data. example data and
>detail of my question as follows.
>
>
>* example data
>clear
>input id jobid wage1 wage2 wage3
>1 1 1 2 .
>1 2 . 3 .
>2 1 1 . .
>2 2 . 2 2
>3 1 1 2 .
>3 2 . 3 .
>3 3 . . 4
>4 1 2 4 6
>5 1 2 3 .
>6 1 2 . 4
>end
>
>
>the wage1~3 represents the wages in each wave if the person works.
>for example,
>the id=1 person works in 1st and 2nd wave but not 3rd. he switches job in 2nd
>wave thus he has two wage for jobid=1, his first job. surely for
>id=2 person, he
>has two jobs, switches job in 2nd wave and keeps it till 3rd wave.
>for id=4, he
>keeps his job for all 3 waves and id=5 lost his only job in 3rd wave.
>
>
>now, I need to get a mean wage for this 3 waves panel data. if they have not
>switched job, use this job's wage in each wave to calculate the mean wage;
>otherwise, use the new job wage in a new wave.
>
>
>for example,for people who did not switch job, such as id=4, the mean wage
>should equal to (2+4+6)/3, which egen x=rowmean(wage1 wage2 wage3)
>can do. but
>how about id=1? his mean wage should equal to (1+3)/2; or id=2, mean
>wage=(1+2+2)/3; and id=3, mean wage=(1+3+4)/3. how to have a code to
>deal with
>all these situations automatically?
>
>
>I have thousands obs in my real data. thus it is impossible to do it by hand.
>thanks for any suggestions.
I would first reshape this to long. See -help reshape-
reshape long wage , i(id jobid) j(wave)
You would then have, say for id=1,
id jobid wave wage
1 1 1 1
1 1 2 2
1 1 3 .
1 2 1 .
1 2 2 3
1 2 3 .
Get rid of missings:
drop if mi(wage)
Now, in case there are multiple wage values for the same id and wave,
we want to select one preferred value.
Presuming that a higher jobid signifies a later time, and therefore,
a wage value that supersedes the others, we want to select the latest one:
sort id wave jobid
(Actually, I would do -assertky id wave jobid-; see -ssc desc assertky-.)
Note that wave comes before jobid; we are changing the sort order.
by id wave (jobid): keep if _n==_N
The dataset is now unique by id and wave, with the preferred values selected.
At this point either...
by id: egen meanwage = mean(wage)
or do a -collapse-.
(All suggested code is untested.)
HTH
--David
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/