SamL
> I am using the -contract- command to make a smaller version of a very very
> large file on a unix system. The file is so large that I may need to use
> virtual memory. The large file is also sorted by most of the variables
> that I will -contract- on, hence, cases with the same values are clustered
> together. My question is whether there is a way to make -contract- take
> advantage of this clustering. I anticipate that if this is possible only
> one pass of the data will be needed, whereas if it is not possible, I am
> not sure how many passes will be needed. As the file is over 7GB,
> contains more than 10 million cases, and may necessitate the use of
> virtual memory, any such savings would be substantial. Any assistance is
> greatly appreciated.
>
-contract- is really quite a simple command. To
understand this and any other answers better, you
should type
which contract
to find out where contract.ado is on your system
and then use a text editor (Stata's own -doedit-
will do fine) to look at the code.
Specifically,
1. At the heart of -contract- is a -sort- on
the varlist supplied, and to the extent that
the data are already sorted, that will go faster,
but I doubt that menory use is affected.
2. There aren't any special tweakable options
to -contract- to affect memory use.
3. If you didn't need some of the features
of -contract-, you could write your own
slimmed down version, but my guess is that
the effect on memory will be slight.
To say more would, I guess, need more knowledge of
Stata's handling of memory with very large
files than I possess.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/