Hi Listers
I am cleaning data (that's all I seem to be doing) and I am a little
puzzled here. A while ago, I asked on how to identify illegal entries when
a variable takes on values in batches (e.g. 11 to 19 21 to 25 etc). Nick
Cox pointed me to
. egen OK = eqany(cropcod2), values(110/120 220/227 330/334 440/446)
. list houscode cropcod2 if !OK
This has very well for, however, today I tried
. egen OK = eqany(inpcode), values(500/505 599 601 1100/1111 /*
. */ 1200/1201 2100/2160/ 2200/2220 2299 2300/2302)
. list houscode inpcode if !OK
I get an error message;
. egen OK = eqany(inpcode), values(500/505 599 601 1100/1111 /*
> */ 1200/1201 2100/2160 2200/2220 2299 2300/2302)
varlist not allowed
r(101);
Any one familiar with this and a way around it?
Roni
--- William Gould <[email protected]> wrote:
> Salah Mahmud" <[email protected]>, following up on a thread, asked,
>
> > Is the "observation pointer" the only overhead as far as data storage
> is
> > concerned?
>
> to my posting that,
>
> > The size reported by -describe- is obtained by
> >
> >
> > 1,692,789 * ( 4 + 4 ) = 13,542,312
> > / | \
> > # of obs | \
> > | \
> > width of data plus 4
> > 1 float = 4 bytes
> >
>
> No, the 4 bytes is not all, but it is the important amount and the
> answer to
> Salah's question really depends on how you define overhead.
>
> First off, what I said about the number reported by -describe- is
> exactly
> accurate: that is what -describe- reports. There is, however, more to
> a
> dataset than the variables and observations, such as variable names,
> variable
> labels, value labels, display formats, characteristics, etc.
>
> When -describe- reports the "size" of the data, it ignores all of that,
> but
> obviously all those things appear in the .dta dataset, so that will tend
> to
> make the .dta dataset size larger than the number reported by
> -describe-,
> while the extra 4 bytes per observation, which only gets added when the
> data
> is copied to memory, makes the .dta dataset smaller.
>
> Then there is overhead as I tend to think of it: the memory cost of
> maintaining the memory image of the data and all of its features. The 4
> bytes
> per observation is an example of this, and almost every feature of the
> data --
> each value label, each variable label (but not each variable name) --
> also has
> the overhead of pointers that track each piece of information. This
> amounts
> to about 16 bytes per piece of information, and sometimes more.
>
> This overhead, however, does not usually add up to much because the
> number of
> pieces of information being tracked is on the order of the number of
> variables
> in the dataset, rather than the number of observations. It was,
> however,
> dealing with overhead like this that was the largest issue in producing
> Stata/SE, which could allow lots more varibles.
>
> Anyway, the dataset label and each value label, variable label, and
> characteristic adds 16 bytes to the memory image in addition to the
> contents
> of the information piece itself. The date-and-time stamp adds 16 bytes
> (plus the date-and-time stamp).
>
> Really, the 4 bytes per observation is the important number.
>
> -- Bill
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/