Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify Person in a rotating panel


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Identify Person in a rotating panel
Date   Fri, 4 Feb 2011 22:45:20 -0500


I doubt that you have to write your own program. Try -reclink- by Michael Blasnik ("findit") or one of Stata's other contributed matching programs (e.g. -cem- -psmatch2-. In many panel studies, the dates would not be _exactly_ six months apart, and a good program will allow for this.


Steve
[email protected]

On Feb 4, 2011, at 1:38 PM, Ulrich Brandt wrote:

Hello,

I have written a litte do-File in order to identify Persons in a
Panel-Dataset because unfortunately they have no unique ID. I know that
every Person only appears twice with a time-gap of 6 months, so it`s
kind of a "rotating panel". In my approach i am trying to identify
person by comparing attributes which cant change over time, like race,
month and year of birth, sex etc. together with a observation difference
of 6 month. The subparts of my code work properly but if i try to run
them together the results arent plausible. I am using Stata 10.1,
Windows XP (32 Bit). For example i have got one person in the dataset:

YYYYMM	BIRTHM	BIRTHY	SEX	RACE
198103	February	1895		Male	White ex
198109	February	1895		Male	White ex

February, Male and White ex are only datalabels. The characteristic
value behind the label are in this case Male=1, White ex=1, February = 2

YYYYMM	BIRTHM	BIRTHY	SEX	RACE
198103	2		1895		1	1
198109	2		1895		1	1

I used some combined foreach loops to generate all combinations of
properties and a while loop to generate the different dates with 6
months time-gap. If two obervations have these generated properties the
should get an ID called "test" by using an upcounting local macro called
"i".

Here is the code. The example posted generates just the time period
between 1980-1982.

------------------------------------------------------------------------
---------------------

gen test=.
local i = 1

levelsof RACE , local (value_race)
levelsof SEX , local (value_sex)
levelsof BIRTHY , local (value_birthy)
levelsof BIRTHM , local (value_birthm)
	

		foreach race of local value_race{
		local race = `race'
			foreach sex of local value_sex{
			local sex = `sex'
				foreach birthy of local value_birthy{
				local birthy = `birthy'
					foreach birthm of local
value_birthm{
					local birthm = `birthm'
							local year 1980	
							while (`year'<
1982) {						
							forvalues month
= 1/12{
								if
(`month' <=3) {
								local
datefirst  "`year'0`month'"
								local
datelast "`year'0`month'+6"

								}
								if
(`month' >=4 & `month'<=6) {
								local
datefirst "`year'0`month'"
								local
datelast "`year'0`month'+6"
								}
								if
(`month' >=7 & `month' <=9){
								local
datefirst  "`year'0`month'"
								local
datehelp = `year'+1
								local
datelast "`datehelp'0`month'-6"
								}
								if
(`month' >=10 & `month' <=12){
								local
datefirst  "`year'`month'"
								local
datehelp = `year'+1
								local
datelast "`datehelp'`month'-6"
								}
								if
(`month' ==12){
								local
year = (`year')+1
								}
								replace
test=`i++' if (RACE == `race' & SEX == `sex' & BIRTHY == `birthy' &
BIRTHM == `birthm') & (YYYYMM == (`datefirst') |
YYYYMM == (`datelast'))
			
							}
							}
							

							}
							}

							
							
					}		
				}


The code generates combinations like
1118828--198111--198205 (for example white
ex-male-1882-August-198111-198205)
1118828--198112--198206
1118829--198001--198007
1118829--198002--198008
And so on

I have got two problems, when i use this code with my data.
I suppose that the macro "i" counts up everytime the loops passes
through even if there is no obervation with this combination because the
output generates very high ids numbers. But i want that it only counts
up when a combination like this exists.
Secondly i know that there is a logical error in the "replace"- line at
the end. But i didnt found my fault. I want that every two observations
from the same person gets the same ID. But with this code the macro
counts up and generates an id for 1118828--198111 or 1118828--198205 or
1118828--198111--198205. My goal is that it only generates an ID for
1118828--198111--198205. In other words it should only count up and
generate when two observations with the same properties exist.

I hope that someone has some suggestions how to fix my problem.
I hope the way i posted everything is right and understandable, if not
please correct me. It`s my first time posting on stata-list.

Best regards

Ulrich Brandt

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index