Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Problem with Stata handling of large dataset
From
Scott Merryman <[email protected]>
To
[email protected]
Subject
Re: st: Problem with Stata handling of large dataset
Date
Mon, 5 Aug 2013 08:59:09 -0500
On Mon, Aug 5, 2013 at 8:42 AM, Palan, Stefan
([email protected]) <[email protected]> wrote:
> Hi everybody,
>
> I have noticed a problem with Stata (SE 12.1, 64 bit) when working with large datasets. When I type the following:
>
>
> ----------------------------------------------------------------------
> clear
> set obs 63000000
> gen long id=_n
> gen long y=int(id/5)
> gen long z=int((id-1000)/5)
> gen long yz=y-z
> sum yz
> ----------------------------------------------------------------------
>
>
> I get the following output:
>
>
> ----------------------------------------------------------------------
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> yz | 63000000 200 .0035635 199 200
> ----------------------------------------------------------------------
>
>
> Shouldn't the standard deviation be zero, and min equal max equal mean?
>
No. The -int()- function truncates the value towards 0. So when id = 1,
y= int(id/5) = int(.2) =0
z =int((id-1000)/5) = int(-199.8) = -199
yz = y - z = 199
When id =20
y= int(id/5) = int(4) =4
z =int((id-1000)/5) = int(-196) = -196
yz = y -z = 200
Scott
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/