Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: proportion (percentage) data transformation


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: proportion (percentage) data transformation
Date   Mon, 6 Feb 2006 16:18:55 -0000

The angular transformation 

arcsin(square root of p) 

is used more commonly than just the arcsine transformation
as far as I can tell. 

The angular looks arbitrary if not bizarre, but emerges 
out of a variance-stabilizing argument for the binomial, 
as I recall. 

For most purposes, you are probably better off with 
logit. It is true that logit 0 and logit 1 are not 
defined, but that doesn't trouble model-fitting software, 
and in any case if you have data spikes at 0 or 1, 
you probably need a more complicated model accounting
for bimodality or trimodality. The major argument
I think is that if and as you want to complicate a model
by adding more predictors, interactions, etc. you can
stay quite happily with the same Stata commands, whereas 
an angular transformation would need more thought ad hoc. 

The logit is a stronger transformation. Some graphical 
experiments with 

set obs 101
range p 0 1
gen angular = asin(sqrt(p))
gen logit   = logit(p) 
scatter angular logit 
scatter angular logit p 

etc. will give you some feel for what is going on. Only 
in the far tails will logit behave in a qualitatively 
different manner from angular.  

Transformation isn't inescapable here. An alternative
is model the response as a beta distribution. 

Nick 
[email protected] 

sstww 
 
> To use proportion (percentage) data as a dependent
> variable in a regression, you would need to transform
> the data before doing regression for two purposes: to
> confine the projected value within 0-1 and to make the
> data distribution closer to normal. I have read a
> description on stata Q&A about using logistic
> transformation for proportion (percentage) data
> (y=ln(x/(1-x)) and it seems working fine. However,
> recently, I read about another highly recommendated
> transformation method for percentage data, arcsine
> transformation: (y=sine(x)^-1). Can anyone tell me
> about the pros and cons of these two methods for
> transforming proportion (percentage) data, and which
> one should be used for what situation?
 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index