Dear Nick,
On your 1., you're right: -genscore- is very sensitive to the missing
value and this is the main reason why I have wrote this command: I must
improve the hlp file to explain that.
On your 2., you're right BUT the *missing* option consider by default
only a point ".", so without using this option, you obtain what you want
and a user who forget the *missing* option cannot have any problem. I
don't think this is a dangerous option, but I am interesting by advices
of others users !
On your 3. That's right !
I modify the hlp file following these remarks and the ado file to
support -if- and -in-. Thank you Nick for your mail !
On the end of your mail, (my poor english can produce bad
interpretations) my idea was to write a user-written Stata packages
accessible from -egen- (I believe this is possible), not to propose to
StataCorp to adopt it !! I am sorry if I have bad explain my idea.
Nevertheless, I have learned full of things in your mail about the way
to adopt something in official Stata. What is a certification script ?
Best,
Jean-Benoit
Nick Cox a �crit :
-genscore- appears to be a variation on the existing
official -egen- functions -rowmean()- and -rowtotal()-
(-rmean()- and -rsum()- in Stata < 9).
So that any interested can see what I am talking
about, here is a simplified version of the program.
(I take responsibility for any bugs introduced into this
slimmed-down version.)
program genscore_simplified
version 9.0
syntax varlist [, SCore(namelist min=1 max=1) MEan MIssing(string)]
if "`score'" == "" local score score
if "`missing'" == "" local missing .
capture confirm new variable `score'
if _rc {
di as err "The variable {hi:`score'} already defined"
exit 198
}
quietly {
gen `score' = 0
foreach v of local varlist {
replace `score' = ///
cond(`v' == `missing', ., `score' + `v')
}
if "`mean'" != "" {
replace `score' = `score' / `: word count `varlist''
}
}
end
There are three main differences that I can see, apart
from cosmetic syntax details, compared with the official -egen-
functions:
1. -genscore- is ultra-sensitive to missing values.
A single missing value in one of the variables
processed is enough to produce a missing result.
In contrast, -egen, rowtotal()- and -egen, rowmean()-
are ultra-indulgent and return missing if and only
if all arguments are missing.
This difference can clearly be important in terms
of what you want. However, it is -genscore-'s main
feature, yet the fact is not documented at all
in the help; nor is there a cross-reference to -egen-.
In the next revision, I suggest that this be made clear.
2. There is an option -missing()- which allows you
to declare that in your data a particular value has
the meaning of missing.
Again in practice, there could be all sorts of reasons
why you import data which contain idiosyncratic codings
for missing data. By and large, it is best to map
those to missing using -mvdecode- as soon as possible.
Maintaining a particular coding which you and only you
know means missing is very dangerous. Forget that once
and you produce garbage results.
So, from one point of view, this option supports
dangerous Stata practices.
3. There is no support for -if- or -in-.
In addition, some help file examples are in terms of
a -genscore()- option, but the option is -score()-.
That's a typo for the next fix.
Jean-Benoit offers his program for adoption in -egen-. That
is a StataCorp decision, but I can offer another user
perspective on the much more general question. These
comments go far beyond the immediate detail of this program.
StataCorp are increasingly going to get much, much pickier
about adoption of user-written software. There are two
main reasons for this:
* The existence of -net- (and features parasitic on it
like -ssc-) much reduces the need for official adoption
of user-written stuff. The whole point is that if it's
good and you like it, you can have it, and it should work
seamlessly with official Stata.
* Some users vastly underestimate how big a deal it is to
adopt something in official Stata. Suppose you wrote
a program and you did a good job. What next?
1. The code may be good, but is it up to StataCorp standards?
Unlikely!
2. The help may be good, but is it up to StataCorp standards?
Unlikely!
3. You did write a dialog, didn't you? (Most user-programmers,
me included, stop short of writing the dialog too.)
4. You did write a certification script, didn't you? (Same
story, more or less.)
5. Somebody has to write a manual entry. Perhaps just the
help file, rejigged, but often a much bigger deal. (You just
added some pages to a very fat series of manuals.)
6. Once this is in official Stata, and visible, it is something
else on which technical support may be sought.
7. Once this is in official Stata, it is something else that must
be maintained as the rest of Stata changes.
Also, from the total perspective, there are let's say 1000-odd
user-written Stata packages in the public domain. (That's an order of
magnitude figure. It's at least several hundred, but not I
think yet approaching 10,000.) Of course, no one, I presume, wants
StataCorp to adopt all of them. (Your wish is much more reasonable:
you just want StataCorp to adopt all of those interesting and useful
to you, but so does everybody else!)
(On -egen- functions alone, the number in the public domain
is I guess of the order of 100.)
Let's say StataCorp should be real picky and choose the best 100.
What would that mean? Probably setting aside all other work for 2
years and a few more manual volumes...
Nick
[email protected]
Jean-Benoit Hardouin
Thanks to Kit Baum two new modules are available on SSC :
- -biplotvlab- is an improvement of the Ken Higbee's code
presented yesterday on the Statalist to draw a biplot graph with the
label of the variables. The improvements concern a gap
between the text
and the ends of the arrows, and the possibility to give
characteristics
to the texts (color, size...). The labels of the variables are
displayed and, if one or several variables have not a label,
the name
of these variables are displayed. For example this module is
a nice way
to produce biplots with temporary variables.
- -genscore- is a small module to easily create a new variable
containing the score computed as the sum or the mean of several
variables. It is possible to define a given modality as a missing
value. I think that this module could be improved by
integrating it in
-egen- (for the next version of this module ?)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/