Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Advice on content of a do file
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Advice on content of a do file
Date
Fri, 26 Jul 2013 14:12:47 -0400
On Fri, Jul 26, 2013 at 12:02 PM, Stas Kolenikov <[email protected]> wrote:
> Sergiy,
>
> both the original solution and your solution are hacks exploiting
> rather sophisticated Stata functionality and strong coupling. If for
Stas,
the only function I used is word() and it is a simple and
well-documented function.
As for strong coupling, your code is strong-coupled to
EXACTLY the same extent. If stability happens to be a
string variable your code will also break down, and it will
break down with exactly the same error code and error
message as mine. As far as the user is concerned, the
two pieces of code are completely equivalent. Moreover,
if the stability has a value out of range (e.g. 6) both
your's code and mine will result in a missing value. The
word() function however would require only one pass
through the data (hence works faster, about two times
faster in this example) and it looks shorter.
1: 4.54 / 100000 = 0.0000
2: 2.40 / 100000 = 0.0000
It also has the benefit that if the variable name changes,
there is only one place to change it in the code, while in
your case you'll need to revise each of the replace
statements.
As for the responsible developers, at least they wrote a
minimal description of how to use it... Could have been
worse, a piece of Fortran art. In any case my understanding from
the description of the original poster is that this is a code snippet,
rather then a standalone final product. As such they face very
different code requirements.
Best, Sergiy Radyakin
> whatever incompatibility reason it breaks down, the end user is left
> to guess what's going on. They still make a lot of assumptions about
> the user of the code and their data. A responsible developer would
> provide robust code that might:
>
> * check that stability variable exists, and is of the right type
> capture confirm numeric variable stability
> if _rc {
> display as err "Variable stability is not found, or is of wrong type"
> exit 7
> }
> * check that it has the right range
> assert inlist(stability,1,2,3,4) if !missing(stability)
> if _rc {
> display as err "The values of stability are expected to range from 1 to 4"
> exit 7
> }
> * and assign the new values to that variable, whatever solution is
> used for that.
>
> See also "How to be assertive" at
> http://www.stata-journal.com/sjpdf.html?articlenum=dm0003.
>
>
> -- Stas Kolenikov, PhD, PStat (ASA, SSC)
> -- Senior Survey Statistician, Abt SRBI
> -- Opinions stated in this email are mine only, and do not reflect the
> position of my employer
> -- http://stas.kolenikov.name
>
>
>
> On Fri, Jul 26, 2013 at 10:28 AM, Sergiy Radyakin
> <[email protected]> wrote:
>> the code works just fine, as in the following example:
>> sysuse auto,clear
>> matrix UTILS=(-0.001,0.101,0.191,0.222\/*
>>> */-0.024,0.096,0.189,0.228\/*
>>> */0.006, 0.084, 0.156, 0.188\/*
>>> */0.021, 0.091, 0.159, 0.181\ /*
>>> */ -0.003, 0.069, 0.154, 0.181)
>>
>> gen stability=_n
>> gen sta_index=UTILS[1,stability[_n]]
>>
>> However it relies on stability being a numeric variable. The error
>> message 509 is consistent with stability being a string variable. You
>> may want to investigate why this happened, and if it can be
>> -destring-'ed or otherwise converted to numeric codes.
>>
>> Note, that Stas is right that there are other ways to write the same.
>> I would prefer here:
>>
>> generate sta_index = real(word("-0.001 0.101 0.191 0.222",stability))
>>
>> Best, Sergiy
>>
>>
>> On Fri, Jul 26, 2013 at 11:16 AM, Stas Kolenikov <[email protected]> wrote:
>>> This is a TERRIBLE way to write code. It relies too heavily on having
>>> things specified exactly as the developer thought, rather than
>>> adapting to the users' data. Instead, it should have been written as
>>>
>>> gen sta_index = .
>>> replace sta_index = -0.001 if stability == 1
>>> replace sta_index = 0.101 if stability == 2
>>> replace sta_index = 0.191 if stability == 3
>>> replace sta_index = 0.222 if stability == 4
>>>
>>> etc.
>>>
>>> It looks like the developers thought their code would work, but never
>>> tried it. And it did not, in practice.
>>>
>>>
>>> -- Stas Kolenikov, PhD, PStat (ASA, SSC)
>>> -- Senior Survey Statistician, Abt SRBI
>>> -- Opinions stated in this email are mine only, and do not reflect the
>>> position of my employer
>>> -- http://stas.kolenikov.name
>>>
>>>
>>>
>>> On Fri, Jul 26, 2013 at 1:12 AM, Peter King <[email protected]> wrote:
>>>> Hi All,
>>>>
>>>> I'm preparing to analyse survey responses to the ICECAP-A questionnaire. The
>>>> ICECAP-A developers have provided code to substitute into a Stata do file to
>>>> allow the calculation of values or tariffs for each respondent.
>>>>
>>>> The code is as follows:
>>>>
>>>> matrix UTILS=(-0.001,0.101,0.191,0.222\/*
>>>>
>>>> */-0.024,0.096,0.189,0.228\/*
>>>>
>>>> */0.006, 0.084, 0.156, 0.188\/*
>>>>
>>>> */0.021, 0.091, 0.159, 0.181\ /*
>>>>
>>>> */ -0.003, 0.069, 0.154, 0.181)
>>>>
>>>> gen sta_index=UTILS[1,stability[_n]]
>>>>
>>>> gen att_index=UTILS[2,attachment[_n]]
>>>>
>>>> gen aut_index=UTILS[3,autonomy[_n]]
>>>>
>>>> gen ach_index=UTILS[4,achievement[_n]]
>>>>
>>>> gen enj_index=UTILS[5,enjoyment[_n]]
>>>>
>>>> gen tariff=sta_index+att_index+aut_index+ach_index+enj_index
>>>>
>>>>
>>>> I have formatted my data as specified below:
>>>>
>>>> "This code, when substituted into a Stata do file, will allow calculation of
>>>> ICECAP-A tariffs for each respondent in a study, based on their answers to
>>>> the five classification questions. Statistical analyses can then be
>>>> conducted on these tariffs. Indeed they can be conducted on the five index
>>>> values also, to ascertain sensitivity of these to differences in factors.
>>>>
>>>>
>>>> "Data should be set up with one study participant per row. As specified on
>>>> the ICECAP-A questionnaire, coding should be such that the 'top' level (full
>>>> capability for an attribute) should take the value '4', down to the bottom
>>>> level (no capability) which should take the value '1'. NB this coding is the
>>>> opposite of that used in instruments such as the EQ-5D (where 1 is top
>>>> level). The five variables, containing a respondent's five ICECAP-A
>>>> responses should be named stability, attachment, autonomy, achievement,
>>>> enjoyment."
>>>>
>>>> When I run the do file with the specified content I get the following
>>>> reply/output:
>>>>
>>>> . do "C:\Documents and Settings\User\My Documents\FCSPRU Files\Current
>>>> projects\BIS 2006\Surve
>>>>> ys\Postal survey\Wave 2 2012\ICECAP Tariff.do"
>>>>
>>>> . matrix UTILS=(-0.001,0.101,0.191,0.222\/*
>>>>> */-0.024,0.096,0.189,0.228\/*
>>>>> */0.006, 0.084, 0.156, 0.188\/*
>>>>> */0.021, 0.091, 0.159, 0.181\ /*
>>>>> */ -0.003, 0.069, 0.154, 0.181)
>>>>
>>>> . gen sta_index=UTILS[1,stability[_n]]
>>>> matrix operators that return matrices not allowed in this context
>>>> r(509);
>>>>
>>>> end of do-file
>>>>
>>>> r(509);
>>>>
>>>> .
>>>> I am new to using do files and wonder whether I should be including other
>>>> information in the do file? Or is something else wrong in the code?
>>>>
>>>> Any suggestions from more experienced users will be greatlt appreciated.
>>>>
>>>>
>>>> Many thanks,
>>>>
>>>> Peter King
>>>> *
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/