Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Overriding a loop if 0 observations using tabstat
From
Robert Picard <[email protected]>
To
[email protected]
Subject
Re: st: Overriding a loop if 0 observations using tabstat
Date
Thu, 29 Apr 2010 10:40:47 -0400
As I woke up this morning I had another though related to this: if
Stata had a -compact- command to remove the extra space, then
programmers that know what they are trading-off (increased cache hits
and cache lines that contain 100% data against the extra overhead to
make space for new variables) could get an easy way to improve the
performance of their code when appropriate. I'm not sure I want to
chance changing memory allocation mid-program but I can see how I
could use a -compact- command.
Just a suggestion; please disregard if difficult to implement.
Robert
On Wed, Apr 28, 2010 at 5:27 PM, Vince Wiggins, StataCorp
<[email protected]> wrote:
> Stata y
> +------------+
> |1235678|1234|
> |1235678|1234|
> |1235678|1234|
> | ... | ...|
>
> Stata datasets usually are not stored this densely. Normally, there would be
> free space at the end of each record where more variables can be added.
>
> Stata y free space
> +-----------------------------
> |1235678|1234| ...
> |1235678|1234|
> |1235678|1234|
> | ... | ...| ...
>
> Moreover, you are likely to have even more free space at the end of each
> record if you have allocated more memory to Stata. This lets Stata add and
> drop variables quickly.
>
> So, with 10 MB allocated, RRK's data might look like
>
> Stata y free space
> +---------------------+
> |1235678|1234|12345678|
> |1235678|1234|12345678| (1)
> |1235678|1234|12345678|
> | ... | ...| ... |
>
> And, with 1000 MB allocated, it might look like
>
> Stata y free space
> +-------------------------------------
> |1235678|1234|123456789... ... ...
> |1235678|1234|123456789... ... ... (2)
> |1235678|1234|123456789... ... ...
> | ... | ...| ...
>
> With the dataset organized as in (1), each record is 20 characters wide,
> including free space, and so there is enough room to store all of the data,
> including free space in the cache. With the dataset organized as in (2), that
> might not be true. Since we have 100,000 records and 8 MB of cache, if the
> records are wider than 8*2^20/100000 = 83.9 characters, then the entire data
> area will not fit into cache.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/