Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: How to perfom very simple manipulations in large data sets more efficiently
From
"Tiago V. Pereira" <[email protected]>
To
[email protected]
Subject
st: How to perfom very simple manipulations in large data sets more efficiently
Date
Mon, 15 Aug 2011 13:57:23 -0300 (BRT)
I thank Stas and Nick for their helpful comments on my last query.
All the best
Tiago
--
Dear statalisters,
I have to perform extremely simple tasks, but I am struggling with the low
efficiency of my dummy implementations. Perhaps you might have smarter
ideas.
Here is an example:
Suppose I have two variables, X and Y.
I need to the get value of Y that is associated with the smallest value of X.
What I usually do is:
(1) simple approach 1
*/ ------ start --------
sum X, meanonly
keep if X==r(min)
local my_value = Y[1]
*/ ------ end --------
(2) simple approach 2
*/ ------ start --------
sort X
local my_value = Y[1]
*/ ------ end --------
These approaches are simple, and work very well for small data sets. Now,
I have to repeat that procedure 10k times, for data sets that range from
500k to 1000k observations. Hence, both procedures 1 and 2 become clearly
slow.
If you have any tips, I will be very grateful.
All the best,
Tiago
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/