Sergey,
Thank you for your reply. My OWN simple code takes 40 minutes to run
on historical date ( around 10 years ) which sortrows takes minutes.
Is'nt that amazing ?
Thank you,
Ashim.
On Wed, Aug 27, 2008 at 8:33 AM, Sergiy Radyakin <[email protected]> wrote:
> Hello All,
>
> I have probably missed the first part of this thread, but is this what
> needs to be done?
>
> From this:
> x1 x2 x3
> 2 3 1
> 3 5 4
> 9 7 9
>
> Get this:
> x1 x2 x3
> 1 2 3
> 3 4 5
> 7 9 9
>
> For all I know, there is no linear complexity O(n) sorting
> algorithm(http://en.wikipedia.org/wiki/Sorting_algorithm)
>
> Here you can see how different algorithms actually perform:
> http://linux.wku.edu/~lamonml/algor/sort/sort.html
>
> For N observations and K variables, Nick suggested reshaping data to
> get NxK observations, sort them (perhaps on a complex key) and
> reshape it back. I would doubt that this strategy improves
> performance, because of the above considerations of nonlinearity of
> sorting with respect to N:
>
> "sorting K values N times" is much simplier than "sorting NxK values".
> Just how much easier - it depends. When K is rather small, the gain
> can be quite big. Consider e.g. K=2 :) -- a simple conditional swap --
> linear! (number of swaps in the worst case = N, number of comparisons:
> also N).
>
> Reshape itself works quite slow. Like really slow. Because it is
> implemented as an .ado program. A commentary by the author of rowsort
> notes that "It may be faster to reshape, sort within blocks, and
> reshape again". So I take that as a sign of even worse performance.
>
> Another issue to consider is memory requirements. How much memory does
> reshape require? (we are all so concerned about fitting our data into
> 2GB/1GB/700MB/... limit, but often forget that once it is there,
> working with it requires memory too!) In my experience reshape
> requires at least two times as much:
>
> . memory
> bytes
> --------------------------------------------------------------------
> Details of set memory usage
> overhead (pointers) 14,800 14.45%
> data 37,000 36.14%
> ----------------------------
> data + overhead 51,800 50.59%
> free 50,592 49.41%
> ----------------------------
> Total allocated 102,392 100.00%
> --------------------------------------------------------------------
> Other memory usage
> set maxvar usage 2,041,738
> set matsize usage 1,315,200
> programs, saved results, etc. 45,523
> ---------------
> Total 3,402,461
> -------------------------------------------------------
> Grand total 3,504,853
>
> . reshape long r p, i(id)
> (note: j = 1 2 3 4 5 6)
> no room to add more observations
> r(901)
>
> (sorry for mentioning this taboo topic on public, have you ever
> estimated how much memory does an XYZ command require? Even Stata
> people guesstimate here).
>
> So Ashim, don't get discouraged and write your own code. It might work
> better for your particular case.
>
> One last advice is that sometimes the problem can be resolved by a
> radically different solution. E.g. if your variables are, say age of
> children for a particular mother in a [e.g. DHS] survey, you might
> find a dataset already in the long-format and start working with it.
> The dataset description will let you know what is the ordering there.
> In particular, data may be ordered with extra (not available to you)
> information which may be lost during sorting.
>
> E.g.: the codebook may say that children are ordered by age: 6, 8, 8,
> 12. You might not know whether 2nd and 3rd children are twins, or were
> born in the same year, but the data providers might very well do!
> Changing their places (remember that Stata does not guarantee sort
> stability for observations, unless special option is used, and I am
> not sure how this works for sorting rows) will change their birth
> ranks - quite an important characteristic, if we believe a large body
> of literature on the topic.
>
> Hope this helps.
>
> Anyways, I don't immediately see how -sortrows- or -rowssort- will
> deal with this situatuion:
>
> motherid child_id1 child_id2 child_id3 age1 age2 age3 gender1 gender2 gender3
> .....
> and I might want to sort children by age keeping their ordering stable
> and moving their (string) ids and (numeric) gender dummies in sync.
>
> Regards, Sergiy Radyakin
>
>
>
>
>
>
>
> On 8/26/08, Nick Cox <[email protected]> wrote:
>> This generated various suggestions.
>>
>> My main thought is that you should never have to write your own sort
>> programs, bubble sort or other, in Stata.
>>
>> Even sorting rows can make use of Stata's own sort functionality -- as
>> in a combination of -reshape- and -sort-, as mentioned by Maarten Buis
>> -- or as in -rowsort or -sortrows-, as mentioned in part by Martin
>> Weiss.
>>
>> -rowsort- is on SSC and was written for Stata 8.
>>
>> -sortrows- is also on SSC and was written for Stata 9. It makes use of
>> Mata and in large part supersedes -rowsort-, so long as you have Stata 9
>> at least.
>>
>> Nick
>> [email protected]
>>
>> Ashim Kapoor
>>
>> I have a simple code to write.
>>
>> I have 6 variables v1, v2 , v3, v4 ,v5 , v6.
>>
>> I want to sort "according to each observation". That is , in the end
>> for each i I want , v1[i] < = v2[i] <=... v6[i].
>>
>> So I wrote a simple bubble for each row and then loop from 1 to _N.
>>
>> Now the problem is that this works but it's VERY VERY slow. Any
>> comments anyone ?
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/