Dear Statalist and especially Nick Cox
I think I have a solution to my problem. It is to loop over each level in
my (`in') variable and calculate the mean of the thresholds between
categories:
. local low = `all_n_below_this_level' / `Total_N'
. local high = `all_n_incl_this_level' / `Total_N'
. replace `out' = (invnorm(`low') + invnorm(`high')) / 2 if `in' == `level'
With the highest and lowest levels as special cases:
. local lowest = 1 / (`Total_N'+ 1)
. local highest = `Total_N'/(`Total_N' + 1)
The above solution (I think) would reproduce an interval scale if the
assumptions of normality are met. Nick Cox wrote:
> Note, however, that ridit(x) = 1 - ridit(-x), i.e. it
> satisfies a simple and desirable symmetry property.
That's nice and I think that the rididt version would also be more exact in
estimating a z-score for a particular subject - minimizing the residual
between the latent variable and the estimated z-score. But, (I suggest) bias
z-slightly from an interval scale.
---------------------------------------------------------------------------
I realize that this list is more of a Q&A list than a discussion forum.
However, I can't help myself sometimes ...
If anyone would find it interesting to discuss transformation issues on this
list, feel free to comment. I at least, will appreciate it.
----------------------------------------------------------------------------
Why transform data? I have found three major positions in this issue:
1) You shouldn't transform your data because the results might be impossible
to interpret. Like r(health, sleep) = 0.4 when health is log transformed and
sleep is inverted. What does it mean?
2) You should always transform your data if it deviate from the normal
distribution and you would like to use parametric statistics. If you don't,
you cant trust the results. You should use an algorithm that treats all
values alike and try different transformations according to a rule (like the
ladder of power) to finally choose the one that makes your data "most"
normal.
3) You should transform your data (by any means) if it makes your data more
representative of the latent variable you are measuring. This would be the
main rationale behind polychoric correlations, rankit and my own suggestion
above.
I would like to hear your opinion in taking position 2 to the extreme by
implementing procedures from position 3. This way data would be as "normal"
as it would possible be, minimizing the violations against normality.
I'm not sure if this makes me an ultracynic or just confused ....
> Attempts to find highly exact solutions to problems posed by
> highly inexact data always seem somewhat strained to me,
> but a ultracynic might claim that to be a definition of
> statistical science in general...
-------------------------------------------------------------------------
> The ridit transformation as implemented in -egen, ridit()-
> is just one that has worked well for me in some exploratory
> contexts. (Some references are in the help file for -distplot-.)
> I have used it as a transformation procedure rather than an attempt
> to estimate a latent quantity. So, you might want to modify it.
> Feel free to copy the code and hack away.
Thanks Nick, I will and I have. I'm new to stata (1,5 months) but I have
already began to love the programmability. And this list, and all the free
user written programs (and it feels like you have written half of them ...
amazing). It is not unlikely that I will write one or two myself for the
community. When I feel I can make a difference.
Michael Ingre
-----------------
PhD-student
Department of Psychology
Stockholm University
National Institute for
Psychoscial Medicine
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/