[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: "Wrong" result with encode / merge ?

From	"Austin Nichols" <[email protected]>
To	[email protected], [email protected]
Subject	Re: st: "Wrong" result with encode / merge ?
Date	Thu, 23 Nov 2006 08:04:22 -0500

Thomas Erdmann--
You should merge on id7temp, not id7, since when you -encode-, the
string id7temp is converted to numeric values, and the order of
assignment may differ across your two datasets (try -la li- to see the
assignment in each dataset).  If you want to use a numeric id, you can
generate one using a one-to-one mapping, using -gen- and a loop over
all possible characters, but it is more straightforward to merge on
the string var.

On 11/23/06, Thomas Erdmann <[email protected]> wrote:

Hi,

I have a dataset with ids that look like: AT18679U (two strings followed by
5 numbers, optionally followed by another string)

Between the two datasets I would like to merge only the first 7 digits are
equal, therefore I generated

generate id7temp=substr(id,1,7)
encode id7temp, gen(id7)
sort id7

and merged the two datasets by id7. When I quality checked the results there
were several mismatches, which don't seem to happen if I use the string id
and not the encoded one. Why is that?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- AW: st: "Wrong" result with encode / merge ?
  - From: "Thomas Erdmann" <[email protected]>

References:
- Re: st: (st) Automatical aggregation from 4 digit codes to 3,2 digitcodes?
  - From: Philipp Rehm <[email protected]>
- st: "Wrong" result with encode / merge ?
  - From: "Thomas Erdmann" <[email protected]>

Prev by Date: Re: st: "Wrong" result with encode / merge ?
Next by Date: st: RE: RE: Box Plot
Previous by thread: AW: st: "Wrong" result with encode / merge ?
Next by thread: AW: st: "Wrong" result with encode / merge ?
Index(es):
- Date
- Thread