Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: identifying letters in a string variable


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: identifying letters in a string variable
Date   Thu, 1 Sep 2005 00:35:09 +0100

I didn't read the question carefully enough. 

What is asked for is something like 

egen N = sieve(strvar), char(0123456789)
egen S = sieve(strvar), keep(a) 
gen catvar = 2 
replace catvar = 1 if N == strvar
replace catvar = 3 if S == strvar 

Nick 
[email protected] 

Nick Cox
 
> A loose test of whether a string variable has only 
> numbers is that -real(strvar)- is not missing, 
> always remembering the possibility of missing 
> values. 
> 
> real("")
> 
> and 
> 
> real(".") 
> 
> both return numeric missing. 
> 
> In the -egenmore- package on SSC there is 
> a function -sieve()- which may help here. 
> 
> sieve(strvar) , { keep(classes) | char(chars) | omit(chars) } 
> selects characters from strvar according to a specified criterion 
> and generates a new string variable containing only those characters. 
> This may be done in three ways. First, characters are classified using
> the keywords alphabetic (any of a-z or A-Z), numeric (any of 0-9), 
> space or other. keep() specifies one or more of those classes: 
> keywords may be abbreviated by as little as one letter. Thus 
> keep(a n) 
> selects alphabetic and numeric characters and omits spaces and other 
> characters. Note that keywords must be separated by spaces. 
> Alternatively, 
> char() specifies each character to be selected or omit() 
> specifies each
> character to be omitted. Thus char(0123456789.) selects numeric 
> characters and the stop (presumably as decimal point); omit(" 
> ") strips 
> spaces and omit(`"""') strips double quotes. (Stata 7 required.) 
> 
> So you could look at a string variable like this. 
> 
> egen N = sieve(strvar), keep(n) 
> capture assert N == strvar 
> if _rc { 
> 	// characters present 
> 	egen S = sieve(strvar), keep(a) 
> 	capture assert S == strvar 
> 	if _rc { 
> 		// must be a mixture
> 		<code for this case> 
> 	} 
> 	else { 
> 		// must be all string
> 		<code for this case> 
> 	}
> else { 
> 	// must be all numeric 
> 	<code for this case> 
> } 
> drop N S 
> 
> Nick 
> [email protected] 
> 
> TEWODAJ MOGUES
> 
> > I looked through the string functions to try to find out 
> > which variable 
> > values of a string variable has letters plus numbers, only 
> > letters, and 
> > only numbers, but didn't come up with anything. E.g. 
> suppose i wanted 
> > to create a categorical variable that takes on 1 when stringvar has 
> > only numbers, 2 if a mix of numbers and letter, and 3 if 
> only letters:
> > 
> > stringvar catvar
> > 1           1
> > 12          1
> > id14        2
> > run         3
> > 5K          2
> > SPRINT      3 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index