The data I am refering to is panel data. The purpose of the analysis
is to detect possible errors. I have on average 50 observations on 100
subjects.
On 5/3/06, Robert A Yaffee <[email protected]> wrote:
> There are many types of outliers, depending upon
> whether you have time series or panel data.
> In time series, there are additive outliers, innovational
> outliers, outlier patches, for example. Some have worse
> effects than others. Adjacent outliers may smear or
> mask others. They may have good or bad leverage.
> One should have the choice of detecting, modeling, or
> replacing them depending upon their theoretical significance.
> What kind of analysis is being done here?
> RY
>
> Robert A. Yaffee, Ph.D.
> Research Professor
> Shirley M. Ehrenkranz
> School of Social Work
> New York University
>
> home address:
> Apt 19-W
> 2100 Linwood Ave.
> Fort Lee, NJ
> 07024-3171
> Phone: 201-242-3824
> Fax: 201-242-3825
> [email protected]
>
> ----- Original Message -----
> From: n j cox <[email protected]>
> Date: Tuesday, May 2, 2006 9:37 am
> Subject: Re: st: Detecting Outliers
>
> > The short answer is Yes, many of them.
> > A longer answer is more difficult to do well
> > given such little information.
> >
> > We have just had a thread on an overlapping
> > question. Look for "outliners" [sic] in
> > the archives.
> >
> > You don't quite say so, but these sound like
> > panel data. For concreteness, I guess 500
> > patients and 10 observations on each, one
> > for each year. My guesses have some
> > influence on my suggestions.
> >
> > What is an outlier in this context? Presumably
> > a patient who differs from many others; or
> > an observation that differs from the rest
> > of the patient's history. Both could make
> > sense, e.g. in the case of anorexic/bulimic
> > patients, or patients who had a really bad
> > year, say a fight with cancer or being
> > caught up in "Lost".
> >
> > First off, if a patient's height varies more than
> > trivially over 10 years, either there is something
> > going on, say growth for young people or some aging
> > effect, or there is a error in the data.
> >
> > Weight fluctuations would seem rather different
> > and everyone knows reasons for various kinds
> > of weight change even in adulthood. It would
> > seem a bit more difficult to pick up
> > on errors (meaning mistakes).
> >
> > There are lots of things you can do. You
> > could set up a loop to plot the time series
> > for each patient. For 500 patients that would
> > be a little tedious, but it is a direct
> > approach.
> >
> > You could try reductions, e.g.
> >
> > last height - first height
> > last weight - first weight
> > mean height over period
> > mean weight over period
> > some measure of variability of each
> >
> > and look for outliers on pairwise plots
> > of each. A scatterplot matrix often
> > shows errors even in data that have
> > supposedly been cleaned. Often
> > the cleaning is univariate, but a
> > weird data value can show up like
> > a run in fabric.
> >
> > My prejudice is that no testing or
> > measuring approach beats graphics
> > for finding outliers.
> >
> > Nick
> > [email protected]
> >
> >
> > Raphael Fraser
> >
> > I have 10 years data (5000 observations) on patients heights and
> > weights. Is there any ado-file that could assist in locating possible
> > outliers?
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>