Use the -sieve()- function in Nick Cox's excellent -egenmore- (SSC):
******************************
clear*
input str8 percent str15 words
"45%" "1234 sjkhdfjh kjdfk"
"45%" "1234 sjkhdfjh kjdfk"
"45%" "1234 sjkhdfjh kjdfk"
"45%" "1234 sjkhdfjh kjdfk"
"45%" "1234 sjkhdfjh kjdfk"
end
egen percentnum = sieve(percent), keep(numeric)
egen wordsnum = sieve(words), keep(numeric)
destring percentnum wordsnum, replace
li, clean
su
******************************
T
On Fri, Aug 21, 2009 at 4:04 PM, Taylor Cook<[email protected]> wrote:
> I am working with CMS's Hospital Compare data for the first time. One
> of the sets lists recommended treatment for a condition (ex:aspirin
> for heart attack), the percent of patients with the condition that
> received the treatment (Score), and the total number of patients who
> presented with the condition (SampelSize).
> The variables I am interested in, Score and SampleSize, are both
> string variables and, here is the tricky part, CMS recorded the data
> with numeric and non-numeric symbols. For example, all of the scores
> are "95%" and the sample size is "106 patients." These percent symbols
> and the word "patient" have made it difficult to destring. Any
> suggestions would be greatly appreciated.
> Thanks,
> Taylor
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/