Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Increase speed of -replace-


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Increase speed of -replace-
Date   Thu, 8 May 2008 20:19:29 +0100

In this case, we know (a) the observations concerned make up disjoint
sets and (b) each subset is a minority of the data. Thus using -if- is
slowing things down considerably compared with using the equivalent
command with -in-, by what has been called Blasnik's Law. 

What's more difficult is to suggest a general recipe for this,
particularly because in other problems (a) might not be satisfied. Thus
my suggestion is broad: use -in- not -if- where possible. 

Nick
[email protected] 

Friedrich Huebler

I am looking for a way to increase the speed of -replace-. I have a
long string variable consisting of several words that should be
reduced to a shorter string, depending on the text in each
observation. The problem can be reproduced with the auto data. Assume
that we want to replace the text in the variable "make" by a single
word. Assume further that the text we are looking for (e.g. "Chev.")
is not necessarily at the beginning of the string but that it can be
anywhere in the variable. My solution is shown below but it is slow
with more than 200 -replace- commands and about 150,000 observations.
Is there a faster solution?

sysuse auto
replace make = "AMC" if strpos(make,"AMC")>0
replace make = "Buick" if strpos(make,"Buick")>0
replace make = "Cadillac" if strpos(make,"Cad.")>0
replace make = "Chevrolet" if strpos(make,"Chev.")>0
replace make = "Dodge" if strpos(make,"Dodge")>0


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index