|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Data management question
I agree with Nick -- who beat me to the punch on this very
recommendation. Some slight additional value-added: -trim()- will
not remove spaces within a string, only at the beginning and end, and
Nick's suggestion below will count those spaces as if they too were
"X"s. To make your count variable (which I call "n") robust to such a
possibility, you could instead use:
gen n = length(subinstr(x, " ", "", .))
where "x" is your original string variable. Ideally, you can
"robustify" this approach further by using regular expression
functions, but I have not yet determined a clean way to remove _all_
possible non-letter characters ([:space:] characters in regex) -- at
best, -regexr()- only replaces the _first_ occurrence of a [:space:]
character with a blank, not all of them. Suggestions on this point
would be welcome.
Also, if you want a truly robust solution, you should decide whether
non-"X" letter characters should be counted or not. Again, clever use
of a regex expression ought to help with this, but personally I have
not figured out how to get Stata to deliver on this potential.
Hope this helps,
Mike
On Jul 16, 2009, at 8:39 AM, Nick Cox wrote:
As a footnote, observe that if the issue is counting "X" in values of
-kisses- such as "XX", "XXXXX", etc. then
gen nkisses = length(kisses)
will be a more direct and efficient solution so long as no other
characters are observed. -length(trim(kisses))- will protect against
accidental leading and trailing spaces.
Nick
[email protected]
Susan Olivia
Thanks Tirthankar,
Using Nick Cox's command is way more efficient. My earlier
attempts were very inefficient.
Tirthankar Chakravarty
There are probably many ways of doing this, but here is
way using Nick Cox's -egenmore- (SSC, Nick Winter is
attributed as the author of the -noccur()- function used
here) package:
clear*
set obs 100
g crosses = " "
local cross "x"
forv i=1/ 100 {
qui: replace crosses="`cross'" in `i'
local cross "`cross'x"
}
// ssc install egemore, replace
egen noccur = noccur(crosses), string("x")
su noccur
On Thu, Jul 16, 2009 at 2:26 AM, Susan
Olivia<[email protected]>
I have a variable (say number of days) and is a string
variable. This variable is represented by XXXX
(basically the number of X denotes the number of days).
I would like to create a numeric value for this variable
(i.e. 4 crosses = 4) . Is there a way I can easily do
this in Stata? I tried the 'encode' and 'destring'
commands, but these commands didn't do what I after.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/