Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: Automatic fit of distribution


From   David Hoaglin <[email protected]>
To   [email protected]
Subject   Re: Re: st: Automatic fit of distribution
Date   Thu, 11 Jul 2013 22:46:30 -0400

Richard,

I wonder where we would go if you make a large change in the subject!
Your change may have started small, but it soon took in all of model
building.  I can't cover that broad topic in this thread.

Since I have a lot of exploratory data analysis in my background, I
put heavy emphasis on understanding what's going on in the data.  If
we don't examine the data (via graphical displays, not just
descriptive statistics), we risk missing unusual behavior, which may
be an important part of the story.

Stepwise regression is often used without careful thought about the
data.  Automation and absence of judgment are the problem.

Sometimes it's possible to set aside part of the data for validation,
and not look at that part until all the model-building is done.

Enough for now.

David Hoaglin

On Thu, Jul 11, 2013 at 2:04 PM, Richard Williams
<[email protected]> wrote:
> Changing the subject slightly -- it is often recommended that you examine
> your data, e.g. do graphs or whatever, run various diagnostics. I am
> inclined to agree; indeed I always tell people to start with assorted
> descriptive statistics before launching into their high tech models.
> However, things like stepwise regression are widely condemned. Again I am
> inclined to agree, but I have a hard time explaining what exactly the
> difference is. In both cases, aren't you looking at the data first and using
> that information to guide your model building? By graphing the data first,
> couldn't that lead to over-fitting, and run the risk that analysis with
> different data would lead to different results? If, say, my visual
> examination or diagnostics have led me to add squared terms or even use a
> different statistical method, aren't my p values misleading? It seems like a
> lot of the cautions and concerns raised with stepwise could also be raised
> for approaches that are considered much more acceptable. My instincts go
> with the conventional wisdom but I am not sure how I would respond if
> pressed on these matters.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index