Hi Suzy,
first I think it is worth stressing that you should indeed very carefully
consider the point with which Daniel came up recently: Although it is
technically not very hard to remove nonnumeric characters from your string
to allow the destring command to produce numbers (see below), you should be
sure that it is that what you want.
You should carefully thing whether your data is really numeric in the
meaning of interval or ratio (or at least ordinal) level of measurement. You
can of course for example perform a regression analysis with an RHS variable
in which is a value 1002=diabetes and 1003=malaria and 1004=hernia or what
ever and Stata will give you estimates for that regression, but interpreting
those coefficients will not be very meaningful.
But besides these objections, you may use the following code to remove
everything which is not a number from your string variables.
#delimit;
foreach varname of varlist xvar {;
local i 1;
while `i'<=_N {;
local digit 1;
local tempstring "";
while `digit' <= length(`varname'[`i']) {;
local s_digit =substr(`varname'[`i'],`digit',1);
if ("`s_digit'">="0"&"`s_digit'"<="9") local
tempstring="`tempstring'`s_digit'";
local digit=`digit'+1;
};
replace `varname'="`tempstring'" in `i';
local i=`i'+1;
};
destring(`varname'), replace;
};
Please note that you have to wirte all the variable names of the variables
which you want to convert into numeric instead of xvar in the first opening
line.
Please note further that the code will replace all the values in your
original variables whith numeric ones.
The program does as follows:
Original (from your message):
. d
Contains data
obs: 4
vars: 4
size: 100 (99.9% of memory free)
----------------------------------------------------------------------------
---
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
---
patient float %9.0g
var1 str5 %9s
var2 str6 %9s
var3 str6 %9s
----------------------------------------------------------------------------
---
Sorted by:
Note: dataset has changed since last saved
. l
+-----------------------------------+
| patient var1 var2 var3 |
|-----------------------------------|
1. | 1001 1235- V2347 456 |
2. | 1002 1233 143135 E28950 |
3. | 1003 38568 05476- 89076 |
4. | 1004 126 333 v5678 |
+-----------------------------------+
Will be as follows after running the code (which may take some time in your
cases with 300k observations)
. d
Contains data
obs: 4
vars: 4
size: 80 (99.9% of memory free)
----------------------------------------------------------------------------
---
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
---
patient float %9.0g
var1 long %10.0g
var2 long %10.0g
var3 long %10.0g
----------------------------------------------------------------------------
---
Sorted by:
Note: dataset has changed since last saved
. l
+----------------------------------+
| patient var1 var2 var3 |
|----------------------------------|
1. | 1001 1235 2347 456 |
2. | 1002 1233 143135 28950 |
3. | 1003 38568 5476 89076 |
4. | 1004 126 333 5678 |
+----------------------------------+
Best wishes
Christian.
-----Urspr�ngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Suzy
Gesendet: 28 August 2004 05:26
An: [email protected]
Betreff: Re: st: Re: destringing values led to Stata recoding them as
missing
Dear Daniel,
I used the destring option because I wasn't able to analyze the data as
is - I would get error messages regarding not being able to analyze
string. These values are codes that represent disorders, so you are
correct. But since I am a fairly new user of Stata, I just figured that
it couldn't read those values because of the dashes or the alpha-numeric
since the datapoints that were only numbers were read and analyzed with
no problem.
Daniel Egan wrote:
>Hi Suzy,
>
>Just to be clear, are you sure you want to create numeric values? The usual
>reason for destringing a variable is that it IS a numeric variable that has
>typos which cause it to be regarded as text. Is this is a continuous
>variable that does have a numeric (linear etc) relationship. If each of
>these string variables represent different disorders, you should have a
good
>methodological reason for making them numeric. Otherwise, keep them in an
>"apples and oranges" arrangement of strings, i.e. diabetes (1003) is not
>"one more than" malaria (1002)...
>
>In essence, if you want to use each of these variables as categoricals,
they
>are fine as is - as strings. You will be able to analyze them as strings,
in
>a categorical or dummy variable sense.
>
>
>I may be way off here, but just wanted to make sure you knew you could
>analyze them as is.....
>
>Apologies if I am being obvious.
>
>Dan
>
>----- Original Message -----
>From: "Suzy" <[email protected]>
>To: <[email protected]>
>Sent: Friday, August 27, 2004 5:44 PM
>Subject: st: destringing values led to Stata recoding them as missing
>
>
>| Dear Statalisters;
>|
>| I have seven variables of over 300,000 observations each. Within each
>| variable, I have over 2000 different values. These datapoints
>| represent specific codes - for example : (72200 = intervertebral disc
>| disorder). Within each of these seven variables, there are datapoints
>| (values) with dashes or alphabets (Ie: 4109- or V2389). The majority
>| of the values though, are purely numeric (23405). I used the destring
>| option so that I could analyze the data and Stata treated all those
>| datapoints that included dashes and alphabets as missing. Now there is a
>| period . where there used to be a value. I have two questions:
>|
>| 1. Will the restring option restore the datapoints?
>|
>| 2. How can I successfully "destring" these values so that I can include
>| them in my analysis?
>|
>| Any help and/or specific code would be very helpful as I am only
>| marginally competent with Stata basics.
>|
>| Thank you!
>| Suzy
>|
>|
>| *
>| * For searches and help try:
>| * http://www.stata.com/support/faqs/res/findit.html
>| * http://www.stata.com/support/statalist/faq
>| * http://www.ats.ucla.edu/stat/stata/
>|
>*
>* For searches and help try:
>* http://www.stata.com/support/faqs/res/findit.html
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/
>
>
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/