Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: programming loops efficiently
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: programming loops efficiently
Date
Thu, 15 Nov 2012 10:19:57 +0000
This code of mine
gen max = 1 if !missing(Vh01)
gen maxhr = Vh01 if !missing(Vh01)
forvalues x = 2/24 {
local X : di %02.0f `x'
replace max = `x' if Vh`X' > maxhr & !missing(Vh`X')
replace maxhr = Vh`X' if Vh`X' > maxhr & !missing(Vh`X')
}
will fail if -Vh01- is missing, but some other -Vh??- is non-missing.
Here's one simpler solution:
gen maxhr = .
gen max = .
forvalues x = 1/24 {
local X : di %02.0f `x'
replace maxhr = max(maxhr, Vh`X')
replace max = `x' if Vh`X' == maxhr
}
Nick
On Wed, Nov 14, 2012 at 12:36 PM, Nick Cox <[email protected]> wrote:
> This code is very likely to bite. First off, in Elizabeth's problem it
> seems highly likely that many values are missing. Presumably, she has
> 25 variables because that many are needed in some cases, but in many
> cases several of those variables, especially the last few, will be
> missing.
>
> Thus the comparison
>
> ... if Vh`x' > maxhr
>
> will often just find the (last) missing value, as numeric missing
> counts as greater than any non-missing value, and each time the
> condition is true, the corresponding variables will be changed by a
> -replace-.
>
> My earlier post linked to a discussion (including a reference) that
> discusses better code for when missings are present.
>
> But Alex's code can be tweaked to avoid this problem.
>
> gen max = 1 if !missing(Vh01)
> gen maxhr = Vh01 if !missing(Vh01)
>
> forvalues x = 2/24 {
> local X : di %02.0f `x'
> replace max = `x' if Vh`X' > maxhr & !missing(Vh`X')
> replace maxhr = Vh`X' if Vh`X' > maxhr & !missing(Vh`X')
> }
>
> Note that the -rename- can be avoided by using a format that insists
> on leading zeros for 1...9.
>
> Nick
>
> On Wed, Nov 14, 2012 at 10:21 AM, Alex Armand <[email protected]> wrote:
>
>> If you want a variable stating the person with max hours for each observation (row) then this code should produce what you need.
>>
>> For simplicity I would rename variable Vh01-Vh09 into Vh1-Vh9.
>> ________________________________
>> * This defines the variable that contains the person with maximum
>>
>> gen max = 1
>> gen maxhr = Vh1
>>
>> * Loop control for others
>>
>> forvalues x = 2/24 {
>>
>> replace max = `x' if Vh`x' > maxhr
>> replace maxhr = Vh`x' if Vh`x' > maxhr
>>
>> }
>> ________________________________
>>
>> Alex
>>
>>
>> Il giorno 14/nov/2012, alle ore 10.53, Breeze, Elizabeth ha scritto:
>>
>>> I am creating some variables and I am sure that my syntax is needlessly long where there is a repeat pattern to the commands.
>>>
>>> I have variables Vh01-Vh25 which give the number of hours worked by persons number 01-25 respectively.
>>> I want to find which person worked maximum hours and what that maximum was
>>> egen maxhr = rowmax(Vh01-Vh25) gives me the maximum number of hours
>>>
>>> Is there a quick way to find the person number with the max no. hours by taking advantage of those last two digits of Vh01-Vh25?
>>> I can do it with many lines of syntax but am sure there must be a neater way using some form of loop.
>>> Each record in the dataset concerns one interviewee and persons 01-25 are people who work for the interviewee
>>>
>>> A further complication is that more than one person may work that maximum number of hours.
>>>
>>> Also is there an equivalent to rowmax that gives the second to largest value in the series?
>>>
>>> Grateful for any tips. I am not a programmer
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/