Martin's guess at what Andrea means resembles mine. Let's spell out the
low-level logic implemented more generally in -egen, noccur()- (by Nick
Winter) and -egen, nss()- within -egenmore- from SSC.
By the way, I can't improve on the explanation in -egenmore-'s help:
"The inclusion of noccur() and nss(), two almost identical functions,
was an act of sheer inadvertence by the maintainer."
In the case of a single desired character, the low-level logic is very
simple:
initialise: count <- 0
loop from the start to the end of a string {
look at each character
if it's the desired character { count <- count + 1 }
}
What is nice is that this loop can be extended automatically to all the
observations in a variable.
Thus in Martin's example, a low-level way to get the counts is
gen numberofocc = 0
qui forval j = 1/30 {
replace numberofocc = numberofocc + (substr(mystr, `j', 1) ==
"|")
}
Here the 30 is large enough to get at the last character in the longest
string. This is really _much less_ code than using -egen- because of the
code that the call to -egen- implies.
Note that
local maxposs = real(substr("`: type mystr'", 4, .))
is the maximum possible length of mystr, while
gen nchars = length(mystr)
su nchars, meanonly
local maxact = r(max)
returns the maximum actual length. See also the help for -extended_fcn-
for how to get this done in a macro.
Thus
local mystr "SMCL makes cooler logs"
local mystr : subinstr local mystr "o" "o", all count(local howmanyo)
di `howmanyo'
is a way to count "o"s. (Note that there is nothing to stop you changing
the "o"s to something else, which is the more characteristic use of this
construct. Nor is an explicit loop out of the question either.)
Nick
[email protected]
Martin Weiss
I take "cell" to denote an observation of a -string- variable... Install
Nick`s -ssc inst egenmore- and:
*************
clear*
input str30 mystr
"first st|r|"
"se|cond str|ing"
"third string"
"fou||rth| strin|g"
end
compress
egen numberofocc= /*
*/ noccur(mystr) , string(|)
list, noobs
*************
Andrea Rispoli
is there a command that I can use to count the number of "|" in a cell?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/