|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Dependent var is a proportion, with large spike in .95+
.
Here is an article I used for a spiked distribution. It is probably
not the same situation as yours, however.
Genetics. 2003 Mar;163(3):1169-75.
Mapping quantitative trait loci in the case of a spike in the phenotype
distribution.
Broman KW.
Department of Biostatistics, Johns Hopkins University, Baltimore,
Maryland 21205,
USA. [email protected]
A common departure from the usual normality assumption in QTL mapping
concerns a
spike in the phenotype distribution. For example, in measurements of
tumor mass,
some individuals may exhibit no tumors; in measurements of time to
death after a
bacterial infection, some individuals may recover from the infection
and fail to
die. If an appreciable portion of individuals share a common phenotype
value
(generally either the minimum or the maximum observed phenotype), the
standard
approach to QTL mapping can behave poorly. We describe several
alternative
approaches for QTL mapping in the case of such a spike in the phenotype
distribution, including the use of a two-part parametric model and a
nonparametric approach based on the Kruskal-Wallis test. The
performance of the
proposed procedures is assessed via computer simulation. The
procedures are
further illustrated with data from an intercross experiment to
identify QTL
contributing to variation in survival of mice following infection with
Listeria
monocytogenes.
PMCID: PMC1462498
PMID: 12663553 [PubMed - indexed for MEDLINE]
On Sep 3, 2008, at 3:22 PM, Dan Weitzenfeld wrote:
Hi Statalist,
I am trying to determine which testing factors drive a proportion
dependent variable, PercentNoise.
In searching the archives, I came across -betafit-, and a link to the
FAQ: "How do you fit a model when the dependent variable is a
proportion?" In that response, Allen McDowell and Nic Cox write, "In
practice, it is often helpful to look at the frequency distribution: a
marked spike at zero or one may well raise doubt about a single model
fitted to all data."
That describes my situation exactly: I have a marked spike in my
histogram at the top bin, roughly .95 - 1. I am wondering how to
account for this.
Does -betafit- take such a possibility into account?
Can someone briefly describe how I could use multiple models to fit
all the data, as implied in the FAQ response?
My fallback is setting a pass/fail bar and converting my proportions
to a binary, then using probit/logit. But the obvious drawback is
that I am throwing away information by collapsing the continuous
(albeit bounded) proportion variable to a binary.
Thanks in advance for any suggestions,
Dan
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/