Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Strange behaviour of -correlate- command
From
Zurab Sajaia <[email protected]>
To
statalist <[email protected]>
Subject
RE: st: Strange behaviour of -correlate- command
Date
Thu, 9 Dec 2010 19:50:53 -0500
You're absolutely right, and my manually calculated used mean of the prod i.e. dividing by n instead of (n-1), my bad, going home now :$.
Thanks a lot,
Zurab
----------------------------------------
> Subject: Re: st: Strange behaviour of -correlate- command
> From: [email protected]
> Date: Thu, 9 Dec 2010 16:30:54 -0800
> To: [email protected]
>
> If I recall correctly, Excel doesn't calculate the COVAR quite right. For some reason, it uses (1/n) rather than (1/n-1). That likely explains your odd results.
>
> --
> Nicholas J. Sanders, Ph.D.
> Postdoctoral Fellow
> Stanford Institute for Economic Policy Research
> 366 Galvez St, Room 228
> Stanford, CA 94305
>
> On Dec 9, 2010, at 4:23 PM, Zurab Sajaia wrote:
>
> > Dear all,
> >
> > I've encountered a problem for which I can't find an explanation so far, it seems that I'm getting wrong estimates of covariance, results differ if I use -correlate- command or do calculations manually (I tried exporting data to Excel and used COVAR() function there and it seems that Excel is on my side),
> > so I was wandering whether something is indeed wrong in Stata, or I'm doing it incorrectly (perhaps it's time to stop working and go home?)...
> >
> > So here the deal, I've uploaded an example dataset to the web (30kb):
> >
> > .use http://www.adeptanalytics.org/download/temp/corr_bug.dta, clear
> >
> > .corr y r, c
> > (obs=2419)
> > | y r
> > -------------+------------------
> > y | 2.8e+07
> > r | 1142.05 .083368
> >
> >
> >
> > but if I do it manually:
> >
> > .summarize y, meanonly
> > .generate double y1 = y - r(mean)
> >
> > .summarize r, meanonly
> > generate double r1 = r - r(mean)
> >
> > generate double prod = y1 * r1
> >
> > summarize prod
> > Variable | Obs Mean Std. Dev. Min Max
> > -------------+--------------------------------------------------------
> > prod | 2419 1141.579 2152.761 -53.76514 47015.59
> >
> >
> > The same result (1141.579) I get using Excel's COVAR() function.
> > Do you have any ideas what can be happening here?
> >
> > Thanks,
> > Zurab
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/