Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to perfom very simple manipulations in large data sets more efficiently |
Date | Fri, 12 Aug 2011 10:51:23 -0400 |
These procedures look pretty reasonable, but for extremely large data processing, you indeed might want to have a different solution. Sorting is O( _N log(_N) ) operation, and if you can find an O(_N) operation (which should be possible), that would help you. What exactly do you do with `my_value' afterwards? And how exactly do you organize your work flow with your 10K data sets? On Fri, Aug 12, 2011 at 10:43 AM, Tiago V. Pereira <tiago.pereira@mbe.bio.br> wrote: > Dear statalisters, > > I have to perform extremely simple tasks, but I am struggling with the low > efficiency of my dummy implementations. Perhaps you might have smarter > ideas. > > Here is an example: > > Suppose I have two variables, X and Y. > > I need to the get value of Y that is associated with the smallest value of X. > > What I usually do is: > > (1) simple approach 1 > > */ ------ start -------- > sum X, meanonly > keep if X==r(min) > local my_value = Y[1] > */ ------ end -------- > > (2) simple approach 2 > > */ ------ start -------- > sort X > local my_value = Y[1] > */ ------ end -------- > > These approaches are simple, and work very well for small data sets. Now, > I have to repeat that procedure 10k times, for data sets that range from > 500k to 1000k observations. Hence, both procedures 1 and 2 become clearly > slow. > > If you have any tips, I will be very grateful. > > All the best, > > Tiago > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/