[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Clustered dataset question

From	Joseph Coveney <[email protected]>
To	Statalist <[email protected]>
Subject	Re: st: Clustered dataset question
Date	Sat, 20 Sep 2003 13:40:14 +0900
Dale Steele posted:

-------------------------------------------------------------------------------

I have a dataset containing information on 66 different subjects.  Each
subject is presented with a series of 36 stimuli  (including zero
magnitude) in random order.  Their response to each stimulus is coded as
0 - not detected, 1-detected.  Each subject is presumed to have a
"threshold" magnitude at which the stimulus can be detected.  My goal is
to estimate that threshold (and its variance) for each subject.  

I have been thinking of the threshold as the predicted stimulus magnitude
for which the probability of detection is 50%.  My naive initial approach
was to run 66 separate logistic regression models.  Is there a better
way?  Thanks!

--------------------------------------------------------------------------------

I believe that there is a body of psychometrics literature dealing with this 
kind of problem.  -findit- or google for item response theory (IRT) or Rasch 
model might provide an entrypoint.  

Of interest is the FAQ written by Jeroen Weesie on Stata Corp's website, 
www.stata.com/support/faqs/stat/rasch.html .  Quoting from that, "Another 
purpose of a Rasch analysis is to estimate the subject parameter eta. In the 
fixed-effects approach, the etas are commonly estimated by maximum likelihood 
conditional on the CLM theta-estimates. For the random-effects case, the etas 
are commonly estimated by posterior means."  CML (conditional maximum 
likelihood), here, is referring to -xtlogit , fe-.  I believe that -gllamm- / -
gllapred- will provide the corresponding random-effects estimator of each 
subject's threshold.

There is also a body of literature in psychophysics dealing with assessing 
stimulus-detection thresholds; Stata's commands that allow estimating receiver 
operating characteristic (ROC) functions might also be of interest.  

The impetus to perform individual logistic regressions for each subject is in 
the same spirit that was expressed in Stata Corp's admonishment against 
unconditional fixed-effects ordered probit a couple of weeks ago on the list.  
They recommended avoiding unconditional fixed-effects nonlinear regression 
unless you feel comfortable with estimating each panel separately.

At the risk of getting trounced on the list twice in as many weeks for the  
same thing, I'll mention unconditional fixed-effects probit as an alternative 
in Dale's case.  In general, unconditional maximum likelihood estimators for 
fixed-effects nonlinear (and linear) models cannot provide consistent 
estimators for the subject-specific intercept terms, so these coefficients 
(and, in nonlinear models, other parameter estimates as well) will have at 
least some bias.  For this reason, fixed-effects logit, probit, ordered 
regression models and so on are avoided, in general.  But for some practical 
applications, the situation is not always so dismal as the received wisdom 
would lead us to believe--Prof. William Greene's website at New York 
University's Stern School is an excellent source of information on this topic.  
As an illustration, I've provided a quick-and-dirty Monte Carlo simulation of a 
probit-parameterized model of Dale's situation, with 70% intraclass correlation 
for the threshold latent variable.  If I've got things correctly specified (a 
big if), then the bias in individual-subject estimates of threshold is in the 
neighborhood of 5% with an unconditional fixed-effects probit model.  If this 
magnitude of bias is acceptible in practice for Dale's purposes, then 
unconditional nonlinear regression represents a viable alternative with this 
sample size.

In addition, there are approaches to ameliorate such bias, such as the 
jackknife (Hahn and Whitney, 2003), which in my simulations with fixed-effects 
ordered probit works quite well when panel depths are at least five or eight, 
even using Professor Greene's challenging specification for the fixed-effects 
ordered probit model in his numerical study (Greene, 2002).  Note that these 
simulations take a long time when the sample size is in the hundreds--there is 
a method for improving efficiency in fixed-effects nonlinear regressions with 
dummy variables for individual subjects that is described in documents on 
Professor Greene's website, but I cannot find where Stata has implemented it.

Greene, William (2002 February), The behavior of the fixed effects estimator in 
nonlinear models. Unpublished; available on his website, 
www.stern.nyu.edu/~wgreene, as document EC-02-05Greene.pdf.

Hahn, Jinyong, and Newey, Whitney (2003 July), Jackknife and analytical bias 
reduction for nonlinear panel models.  Available at 
http://econ-www.mit.edu/faculty/?prof_id=wnewey&type=paper.

Joseph Coveney

--------------------------------------------------------------------------------

program define simsteele, rclass
    version 8.1
    drop _all
    set obs 66
    generate byte subject = _n
    generate float subject_threshold = invnorm(uniform())
    forvalues stimulus = 1/36 {
        generate float subject_stimulus_threshold`stimulus' = ///
         0.7 * subject_threshold + sqrt(1 - 0.7^2) * invnorm(uniform())
        generate byte detected`stimulus' = ///
          subject_stimulus_threshold`stimulus' > invnorm(`stimulus' / 37)
    }
    keep subject subject_threshold detected*
    reshape long detected, i(subject) j(stimulus)
    xi: probit detected i.stimulus i.subject
    predict float linear_predictor, xb
    by subject: egen float threshold_hat = mean(linear_predictor)
    regress threshold_hat subject_threshold if _Istimulus_2
    return scalar slope = _b[subject_threshold]
    return scalar intercept = _b[_cons]
end
*
clear
set more off
set seed 20030920
simulate "simsteele" slope = r(slope) intercept = r(intercept), ///
  reps(400)
summarize slope intercept, detail
exit

--------------------------------------------------------------------------------



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: Re: adjusted percentages and adjusted odds ratios
Next by Date: Re: st: Clustered dataset question
Previous by thread: re: st: Clustered dataset question
Next by thread: Re: st: Clustered dataset question
Index(es):
- Date
- Thread