Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: right-skewed proportion data |
Date | Sat, 7 Jul 2012 12:08:14 +0100 |
On getting the number of variables in the dataset: I like -ds- too, but pushing the entire varlist through -ds- is over the top. The number of variables is directly accessible as c(k). . sysuse auto (1978 Automobile Data) . di c(k) 12 On your question to Maarten: No; with -glm- and -link(logit)- the default family is binomial, and vice versa. That may boggle the mind if you were raised on some dogma that discrete is discrete and continuous is continuous and ne'er the twain shall meet. (I can't do that in idiomatic German, or indeed any kind of German.) But we often cross this boundary. Just as -poisson- is often a good model for a continuous response that is non-negative, so binomial is a good first approximation model for continuous proportions. Consider the variance-mean relationship for _any_ variable on [0, 1]. If that's the support then if the mean is 0 the variance is also 0, and if the mean is 1 the variance is also 0. (The mean can only be 0 or 1 if _all_ values are 0 or 1, and in each case the variance is thus 0.) It is not axiomatic that for _any_ continuous proportion the variance is greatest for a mean of 0.5, so far as I know, but this arm-waving shows that the binomial has qualitatively much of the right kind of behaviour. Note also that for p near 0 (1- p) is near 1, so a variance of p(1-p) is close to a variance of p. So, for a mean proportion near 0, variance can be Poisson-like even though the variable is most definitely bounded at 0 and 1. On Sat, Jul 7, 2012 at 11:02 AM, Jörg Eulenberger <j.eulenberger@web.de> wrote: > > Dear Francisco, > Yes i want to model it. I want to do a missing-analysis. Dependvar is > the right-skewed and the undependvars are differed survey methods > (online vs paper and pencil) under control of gender, etc. > > Dear Maarten, > thanks a lot. So, ifI understand youproperly the correct family is > gaussian? > > glm av uv uv, link(logit) vce(robust) > Am 07.07.2012 11:40, schrieb Francisco Rowe: >> What do you want to do? Do you want to model it? >> >> Francisco. >> >> On 07/07/2012, at 4:50 PM, Jörg Eulenberger wrote: >> >>> >>> >>> Dear Statalisters, >>> >>> i have a problem with right-skewed dependvariable. The range of >>> this variable are 0-1 (proportion data 0%-100%). The Distribution looks like poisson, but the values are not discret. >>> >>> I created the dependvariable by counting the missings (item-non-responce) row wise. Then i >>> standardize this variable by the number of all possible variables >>> (automatic filtering causes different count of vars). >>> >>> ************* >>> ds >>> local varnumber `: word count `r(varlist)'' >>> gen varnumber_without_filter_missing = `varnumber'-filter_missing >>> gen item_non_response_percent = (100/varnumber_without_filter_missing)*number_item_non_response >>> gen item_non_response_percent_r = item_non_response_percent /100 /* range 0-1 */ >>> ****************** >>> >>> I found the article http://www.stata-journal.com/sjpdf.html?articlenum=st0147for handling >>> proportion data. But what is the right way to handling a right-skewed >>> (proportion) dependvar? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/