A bit more thought, and I have
to say that I am probably giving
some good and some bad advice here.
-egen, group()- is often a good way
of generating unique identifiers, but
applied in different datasets it won't
in general lead to identifiers that
can be used in merging.
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Nick Cox
> Sent: 08 December 2004 19:46
> To: [email protected]
> Subject: RE: st: RE: using the 'real' command
>
>
> So the beginning and end of the problem
> is the need for a unique identifier.
>
> In this situation, I would try
>
> egen id = group(cl0?), label
>
> as a somewhat lazier way to climb the mountain.
>
> It would seem that you need to do something
> similar in the other dataset.
>
> If your ids are simple integers, the numeric
> format before and the string format after
> don't sound like an issue. If your ids are not
> simple integers, you are probably going to
> get major problems by forcing them to be integers
> when they are really numbers with fractional parts.
>
> Specifically, I don't like the look of
>
> tostring ... , force
>
> As -tostring-'s putative parent (parthenogenesis
> is fun), I underline that -force- is an explicit
> signal that you know you could lose information
> when you do this. Any use of -force- prior to
> a -merge- is inviting trouble, as for a -merge-
> you really do want your identifiers to be correct and not
> mangled.
>
> Yet further: the -tostring- / -real()- / -tostring-
> sequence looks fairly weird, especially as -real()-
> itself can happily play havoc with stuff it
> doesn't understand.
>
> Nick
> [email protected]
>
> [email protected]
> >
> > thank you for your prompt reply. yes, it appears i neglected
> > to put the
> > delimiter ... it works fine now but I still have my original
> > problem of forcing
> > the display format to be %2.0f.
> >
> > here is the context: I am working with a household survery on
> > child labor force.
> > I am trying to generate a unique id by concatenating a few
> > string variables. the
> > reason I am creating the unique id is so that i can merge the
> > data set with
> > anoterh survey on labor force from the same country. The
> > variables (cl01 thru
> > cl08) originally came in numeric format (with a display
> > format of %2.0f). What I
> > am trying to do is convert these variables to string
> > variables while keeping the
> > display format.
> >
> > unfortunately, the display format that I get when i convert
> > to string is %9.s
> >
> > this is what my do file looks like:
> >
> > tostring cl01-cl08, replace force;
> > gen id1=real(cl01); format id1 %02.0f;
> > gen id2=real(cl02); format id2 %02.0f;
> > gen id3=real(cl03); format id3 %02.0f;
> > gen id4=real(cl04); format id4 %02.0f;
> > gen id5=real(cl05); format id5 %02.0f;
> > gen id6=real(cl06); format id6 %03.0f;
> > gen id7=real(cl07); format id7 %02.0f;
> > gen id8=real(cl08); format id8 %02.0f;
> > tostring id1-id8, replace usedisplayformat;
> > egen hhid=concat(id1 id2 id3 id4 id5 id6 id7 id8);
> >
> > any advice is appreciated
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/