Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Identify Person in a rotating panel
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: Identify Person in a rotating panel
Date
Fri, 4 Feb 2011 18:52:39 +0000
I haven't tried to understand the detail here. But I pick up the main idea as wanting to identify individuals by persistent attributes. My gut feeling is that it shouldn't be anywhere as difficult as this code implies.
You can explore that by using -duplicates- on the variables concerned.
Or, to get to your problem directly, then something like
egen id = group(birthm birthy sex race)
is a start at getting identifiers. Then you have to see how many of those identifiers -- values of -id- -- occur more than twice.
Your problem with -replace- is probably that the variable is born as a -float- and can't hold all large integers distinctly.
Nick
[email protected]
Ulrich Brandt
I have written a litte do-File in order to identify Persons in a
Panel-Dataset because unfortunately they have no unique ID. I know that
every Person only appears twice with a time-gap of 6 months, so it`s
kind of a "rotating panel". In my approach i am trying to identify
person by comparing attributes which cant change over time, like race,
month and year of birth, sex etc. together with a observation difference
of 6 month. The subparts of my code work properly but if i try to run
them together the results arent plausible. I am using Stata 10.1,
Windows XP (32 Bit). For example i have got one person in the dataset:
YYYYMM BIRTHM BIRTHY SEX RACE
198103 February 1895 Male White ex
198109 February 1895 Male White ex
February, Male and White ex are only datalabels. The characteristic
value behind the label are in this case Male=1, White ex=1, February = 2
YYYYMM BIRTHM BIRTHY SEX RACE
198103 2 1895 1 1
198109 2 1895 1 1
I used some combined foreach loops to generate all combinations of
properties and a while loop to generate the different dates with 6
months time-gap. If two obervations have these generated properties the
should get an ID called "test" by using an upcounting local macro called
"i".
Here is the code. The example posted generates just the time period
between 1980-1982.
------------------------------------------------------------------------
---------------------
gen test=.
local i = 1
levelsof RACE , local (value_race)
levelsof SEX , local (value_sex)
levelsof BIRTHY , local (value_birthy)
levelsof BIRTHM , local (value_birthm)
foreach race of local value_race{
local race = `race'
foreach sex of local value_sex{
local sex = `sex'
foreach birthy of local value_birthy{
local birthy = `birthy'
foreach birthm of local
value_birthm{
local birthm = `birthm'
local year 1980
while (`year'<
1982) {
forvalues month
= 1/12{
if
(`month' <=3) {
local
datefirst "`year'0`month'"
local
datelast "`year'0`month'+6"
}
if
(`month' >=4 & `month'<=6) {
local
datefirst "`year'0`month'"
local
datelast "`year'0`month'+6"
}
if
(`month' >=7 & `month' <=9){
local
datefirst "`year'0`month'"
local
datehelp = `year'+1
local
datelast "`datehelp'0`month'-6"
}
if
(`month' >=10 & `month' <=12){
local
datefirst "`year'`month'"
local
datehelp = `year'+1
local
datelast "`datehelp'`month'-6"
}
if
(`month' ==12){
local
year = (`year')+1
}
replace
test=`i++' if (RACE == `race' & SEX == `sex' & BIRTHY == `birthy' &
BIRTHM == `birthm') & (YYYYMM == (`datefirst') |
YYYYMM == (`datelast'))
}
}
}
}
}
}
The code generates combinations like
1118828--198111--198205 (for example white
ex-male-1882-August-198111-198205)
1118828--198112--198206
1118829--198001--198007
1118829--198002--198008
And so on
I have got two problems, when i use this code with my data.
I suppose that the macro "i" counts up everytime the loops passes
through even if there is no obervation with this combination because the
output generates very high ids numbers. But i want that it only counts
up when a combination like this exists.
Secondly i know that there is a logical error in the "replace"- line at
the end. But i didnt found my fault. I want that every two observations
from the same person gets the same ID. But with this code the macro
counts up and generates an id for 1118828--198111 or 1118828--198205 or
1118828--198111--198205. My goal is that it only generates an ID for
1118828--198111--198205. In other words it should only count up and
generate when two observations with the same properties exist.
I hope that someone has some suggestions how to fix my problem.
I hope the way i posted everything is right and understandable, if not
please correct me. It`s my first time posting on stata-list.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/