In this case, we know (a) the observations concerned make up disjoint
sets and (b) each subset is a minority of the data. Thus using -if- is
slowing things down considerably compared with using the equivalent
command with -in-, by what has been called Blasnik's Law.
What's more difficult is to suggest a general recipe for this,
particularly because in other problems (a) might not be satisfied. Thus
my suggestion is broad: use -in- not -if- where possible.
Nick
[email protected]
Friedrich Huebler
I am looking for a way to increase the speed of -replace-. I have a
long string variable consisting of several words that should be
reduced to a shorter string, depending on the text in each
observation. The problem can be reproduced with the auto data. Assume
that we want to replace the text in the variable "make" by a single
word. Assume further that the text we are looking for (e.g. "Chev.")
is not necessarily at the beginning of the string but that it can be
anywhere in the variable. My solution is shown below but it is slow
with more than 200 -replace- commands and about 150,000 observations.
Is there a faster solution?
sysuse auto
replace make = "AMC" if strpos(make,"AMC")>0
replace make = "Buick" if strpos(make,"Buick")>0
replace make = "Cadillac" if strpos(make,"Cad.")>0
replace make = "Chevrolet" if strpos(make,"Chev.")>0
replace make = "Dodge" if strpos(make,"Dodge")>0
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/