Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: generate new distinct individual ids by organizational id and year
From
Erik Aadland <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: generate new distinct individual ids by organizational id and year
Date
Wed, 8 May 2013 07:22:51 +0000
Dear Jeph and Statalist.
Thank you for your suggested code. This is on the right track, but I still have some problems.
I tweaked the code at little bit, as max(ind_id) is for the total dataset and not by org_id:
egen last_id = max(ind_id) // get the max id in the dataset
egen tagged = tag(org_id year) // tag for expanding
expand missing_employees if tagged, gen(isnew) // create new records for missing
clonevar new_ind_id = ind_id
bys org_id year isnew : replace new_ind_id = last_id+_n if isnew
I run this code with the following small example dataset:
year org_id ind_id missing_employees
2000 1 1 1
2000 1 2 1
2000 2 3 2
2000 2 3 2
2000 2 4 2
2000 3 5 0
2001 1 2 2
2001 1 7 2
2002 1 2 1
The desired result (in terms of new_ind_id) should be something like:
year org_id ind_id missing_employees new_ind_id
2000 1 1 1 1
2000 1 2 1 2
2000 1 8
2000 2 3 2 3
2000 2 3 2 3
2000 2 4 2 3
2000 2 9
2000 2 10
2000 3 5 0 5
2001 1 2 2 2
2001 1 7 2 2
2001 1 8
2001 1 11
2002 1 2 1 8
When running the code I get the following result:
year org_id ind_id missing_employees last_id tagged isnew new_ind_id
2000 1 1 1 7 1 0 1
2000 1 2 1 7 0 0 2
2000 2 3 2 7 1 1 8
2000 2 3 2 7 0 0 3
2000 2 3 2 7 1 0 3
2000 2 4 2 7 0 0 4
2000 3 5 0 7 1 0 5
2001 1 2 2 7 1 1 8
2001 1 2 2 7 1 0 2
2001 1 7 2 7 0 0 7
2002 1 2 1 7 1 0 2
One problem seems to be that the code does not appear to recognize missing_employees == 1.
Another problem seems to be that the codes uses the same distinct new_ind_ind==8 for both org_id=1 and org_id ==2.
Any input on this problem would be greatly appreciated.
Sincerely,
Erik
> Date: Tue, 7 May 2013 09:39:13 -0400
> From: [email protected]
> To: [email protected]
> Subject: Re: st: generate new distinct individual ids by organizational id and year
>
> If I understand correctly,
>
> bys org_id : egen last_id = max(ind_id) // get the max id for each org
> egen tagged = tag(org_id year) // tag for expanding
> expand missing_employees if tagged, gen(isnew) // create new records for missing
> clonevar new_ind_id = ind_id
> bys org_id year isnew : replace new_ind_id = last_id+_n if isnew
>
> hth,
> J
>
>
>
>
>
> On 5/7/2013 4:18 AM, Erik Aadland wrote:
> > Dear Statalist.
> >
> > I have an unbalanced panel dataset containing organizations with distinct org_id, and employees within these organizations with distinct ind_id. Some employees are missing. I have calculated the total number of missing employees per org_id per year, called "missing_employees".
> >
> > Now, by each org_id and year I need to generate a set of distinct ind_ids corresponding to the number of missing employees. The new_ind_ids generated for a distinct org_id and year should preferably be reused in later years.
> >
> > In laymans terms, I need to generate new ind_id observations for each org_id and year so that I have an ind_id for all employees in each org_id and year.
> >
> > Example:
> >
> > year org_id ind_id missing_employees
> > 2000 1 1 1
> > 2000 1 2 1
> > 2000 2 3 2
> > 2000 2 3 2
> > 2000 2 4 2
> > 2000 3 5 0
> > 2001 1 2 2
> > 2001 1 7 2
> >
> > Given such a structure, I need to generate distinct new_ind_ids e.g. starting after the last distinct ind_id I have in my dataset (e.g. if last ind_id==100, new_ind_id starts at 101):
> >
> > year org_id ind_id missing_employees new_ind_ids
> > 2000 1 1 1
> > 2000 1 2 1
> > 2000 1 101
> > 2000 2 3 2
> > 2000 2 3 2
> > 2000 2 4 2
> > 2000 2 102
> > 2000 2 103
> > 2000 3 5 0
> > 2001 1 2 2
> > 2001 1 7 2
> > 2001 1 101
> > 2001 1 104
> >
> > Note that for org_id==1 in 2000, I generate 1 new_ind_id == 101 since one employee was missing. In 2001 I have 2 employees missing for org_id == 1. Since I have already generated new_ind_id == 101 for org_id==1, this can be used again in 2001 and only one new_ind_id needs to be generated; new_ind_id == 104.
> >
> > If the number of missing employees for a distinct org_id goes down from one year to the next, I would like to drop the corresponding number of new_ind_ids for that year (i.e. remove some of the reused new_ind_ids).
> >
> > Any input on these problems would be greatly appreciated.
> >
> > Kind regards,
> >
> > Erik.
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
> >
> >
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/