|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Increasing variance of dependent variable, logit, inter-rater agreement
From |
Steven Samuels <[email protected]> |
To |
[email protected] |
Subject |
Re: st: Increasing variance of dependent variable, logit, inter-rater agreement |
Date |
Sat, 28 Feb 2009 19:45:56 -0500 |
--
Anupit,
All this detail is welcome and clear. I don't really know how to
model all of this simultaneously, or, even if there would be any
benefit in doing so. I hope that others will read your description
and chime in.
Some thoughts: I've read the abstracts of the Feinstein-Cichetti
articles, and I think that your original idea of predicting positive
agreement from a regression model is good. Be sure to use a flexible
model for age. I think that you need a model with more variability
than logistic assumes. Consider -hetprob- , which fits a probit
model. If you have vehicles that were retested over time, also
consider longitudinal data methods (-xt- prefix) If the remote
sensing device was not recalibrated between individual observations,
you probably also have non-independent errors for observations taken
on the same device at the same time. If you used different remote
sensing devices to retest the same vehicle of the same age, then you
can add random- or fixed- device terms to a predictive model. If you
know about environmental conditions that would have affected errors
in the remote-sensing, be sure to add those as predictors. With so
many observations, you can afford to divide your data, develop your
model on one piece, and test on the other.
Best wishes,
Steve
On Feb 27, 2009, at 8:28 PM, Supnithadnaporn, Anupit wrote:
<>
Dear Steven,
I appreciate your reply to my post. I am sorry if my explanation is
too long.
Thank you,
Anupit
Please give more detail about what is being assessed. Is there a gold
standard, measured or latent, for what these technologies are trying
to agree upon?
The subject of my study is the in-used vehicles. In some areas of
the US,
there is a regulation that requires a vehicle to be tested for its
emission.
In the past, this instrument measured the real tailpipe emission.
The test
is typically performed at the commercial inspection station. If the
amount of emission surpasses the threshold standard, the vehicle
fails.
The owner of failing vehicle has to repair his/her vehicle until it
meets
the standard level otherwise he/she cannot renew the vehicle
registration.
However, this tailpipe-test technology has been replaced by the new
one called
OBD II test. This test no longer measures the tailpipe emission.
Instead,
it gives the fail result if there is an error codes relating to the
emission
control part of the vehicle.
Despite the different technologies measuring different things, they
share the common goal of the regulation. That is to identify the
high-polluting vehicles.
* What is the first technology that measures characteristics and
arrives at a pass-fail? How does it make this decision? Was age one
of these characteristics?
So, the first technology is the OBD II that detects the error codes
and yield
the pass-fail result which is the *nominal level*. Having certain
error codes
means that the vehicle is likely to emit high level of pollution
beyond the
standards. As the vehicle become older, it is likely to pollute more.
Moreover, the OBD II which is the computer unit of the vehicle is
likely to
malfunction. If the OBD II is malfunction, it can give either the
false-pass
or false-fail result.
* How was the cut point y2b arrived at?
Fortunately, the regulator also has set up several unobtrusive
monitoring
stations on road. Basically, this technology uses the remote-
sensing device
(RSD) to measure the real tailpipe emission from numerous vehicles
running
pass by. This is the second technology in my analysis. It measures
the real
tailpipe emission which is the *interval level*. And the threshold
is based
on the EPA regulation set for particular type of vehicle make,
model year,
and weight - *the cut point of y2b*.
* You say that the variability of y2a increases with age. Is the
level of y2a related to age?
Correct. As a vehicle is getting older, its emission level is
likely to be
high due to deterioration. Moreover, its emission can vary vastly
different
from one measurement (by RSD) to the other. This is what I am
trying to
take into account in my analysis
My data is a pooled-cross section time series of 4 years.
My unit of analysis is a matched pair of a vehicle tested by OBD II
and
measured by RSD on road in the same year-testing cycle.
My hypothesis is that the OBD-RSD agreement is greater for the
older vehicle
fleets. My sample size ~ 80,000 observations.
Of the total, 72% is classified as 'agree'
For 28% of 'disagree' group, around 90% is the Fail-RSD, Pass-OBD.
During the early analysis, I split the vehicles into different age
groups
from 3-9 years. I obtain Kappa for each group and compare them.
However,
I run into problem of Kappa when the prevalence (the disagree cases
for
each age-group) is small.
Cicchetti DV, Feinstein AR. High agreement but low Kappa: II Resolving
the paradoxes. J Clin Epidemiol 1990; 43:551-8
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/