Hi Suzy,
Stata only recognizes variables as being either string or numeric - even if
the data look like numbers, they could be stored as strings. Stata tries to
decide on how to store the data as it is imported; from what I can tell it
looks at the first record in each variable (column) as it comes in and
stores it as numeric only if all the characters are numbers. If a later
record in the variable has non-numeric characters in it, Stata will store it
as . (missing) and it should alert you that variable x has non-numeric
characters. If you lucked out and the first record has non-numeric
characters in it, Stata will store the values in the entire variable as
strings. It seems likely that this is what happened in your case - in other
words what you see as 1234 is actually stored as "1234".
When you destring that quantity you get 1234, as expected. However when
Stata comes across "V234", that doesn't resolve to a number after
destringing, so it puts a . (missing) value in that place, exactly as it
would have done on importing.
From the way your question is phrased, it looks as though rather than
generating new variables with destring, you -replace-d your originals (and
used the -force- option?), in which case you're out of luck unfortunately
(hopefully you still have the original source?)
It might help if you submit a toy dataset to describe how your data look.
For example
Patient var1 var2 var3
1001 1234 V234 med
1002 1233 1431 small
1003 65 14-1 small
1004 2.4 333 large
In this case, Stata would bring "Patient" in as a numeric variable, var2 and
var3 as string variables, and var1 as numeric.
Now, had the data looked like
Patient var1 var2 var3
1001 1234 1234 med
1002 1233 V234 small
1003 65 14-1 small
1004 2.4 333 large
Stata would have decided that var2 should come in as a numeric variable, and
you would end up with
patient var1 var2 var3
1001 1234 1234 med
1002 1233 . small
1003 65 . small
1004 2.4 333 large
(notice also that Stata will change Patient to patient, although it will
store "Patient" as a variable label)
Likewise, if you took the first example and process it with
.destring, replace force
(which is pretty reckless, data integrity-wise), you'll end up with
patient var1 var2 var3
1001 1234 . .
1002 1233 1431 .
1003 65 . .
1004 2.4 333 .
Is this along the lines of your situation?
-JW
-----Original Message-----
From: Suzy [mailto:[email protected]]
Sent: Friday, August 27, 2004 4:06 PM
To: [email protected]
Subject: Re: st: RE: destringing values led to Stata recoding them as
missing
I meant to say - would the restring option restore my datapoints?
Suzy wrote:
Hi John,
I don't know if this matters but I'm not starting with purely string
variables. I have variables that have datapoints of which some are
string and some are numeric. Also John, would the destring option
restore my original values as now the destringed values are missing....
Suzy
Wallace, John wrote:
Suzy
You might want to consider -encode- instead of destring. Presumably
you're
starting with string variables. Encode will create a new variable with
incrementing value in (I believe) alphabetical order of the original
variable, plus it will make a value label corresponding to the original
string. This is useful if you need to be able to relate the new
variable
value back to the original string.
e.g.
.encode var1, gen(code1)
-JW
-----Original Message-----
From: Suzy [mailto:[email protected]] Sent: Friday, August 27,
2004 2:45 PM
To: [email protected]
Subject: st: destringing values led to Stata recoding them as missing
Dear Statalisters;
I have seven variables of over 300,000 observations each. Within
each variable, I have over 2000 different values. These datapoints
represent specific codes - for example : (72200 = intervertebral
disc disorder). Within each of these seven variables, there are
datapoints (values) with dashes or alphabets (Ie: 4109- or V2389).
The majority of the values though, are purely numeric (23405). I used
the destring option so that I could analyze the data and Stata
treated all those datapoints that included dashes and alphabets as
missing. Now there is a period . where there used to be a value. I
have two questions:
1. Will the restring option restore the datapoints?
2. How can I successfully "destring" these values so that I can
include them in my analysis?
Any help and/or specific code would be very helpful as I am only
marginally competent with Stata basics.
Thank you!
Suzy
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/