Let me guess: You forgot a hard return after the call to -egen-, so the line
continues and Stata thinks that the following -drop- is part of the -egen-
call. At least that is what the -trace- suggests...


I have been trying to implement Nick's proposition of using -egen- to  
remove unwanted ids. When I do it interatively in Stata it works  
nicely. But when I run exacly the same code from within the program I  
end up with error messages. See below. What may be the problem?

. use us_data_ret, clear

. desc

Contains data from us_data_ret.dta
   obs:    17,516,468
  vars:             5                          10 Nov 2009 07:02
  size:   525,494,040 (60.5% of memory free)
               storage  display     value
variable name   type   format      label      variable label
id              int    %8.0g                  ID
dscd            str6   %9s                    DSCD
date            float  %td
year            int    %8.0g
totalReturn     double %10.0g
Sorted by:  id  date

. egen nvalid = count(totalReturn), by(id)

. drop if nvalid < 156
(2257110 observations deleted)

When run trough a program, Stata issues the following error:
option drop not allowed

-set trace on- refuses work when implemented directly after -egen- but  
works when put before -egen- and then it shows (just the final part of  
the output):

     - capture noisily `vv' _g`fcn' `type' `dummy' = (`args') `if'  
`in' `cma' `byopt' `options'
     = capture noisily  _gcount float __000009 = (totalReturn)   ,   
by(id) drop if nvalid < 156
------------------ begin _gcount  
       - version 6, missing
       - syntax newvarname =/exp [if] [in] [, BY(varlist)]
option drop not allowed
-------------------- end _gcount  
     - global EGEN_SVarname
     - global EGEN_Varname
     - if _rc { exit _rc }
------------------------- end egen  

Again: What is going on? Why can I issue the code lines separately but  
not as part of a program? Has it something to do with the size of the  
database (-egen- takes a few second to finish.

By the way. I have now implemnted -meanonly- and it improves the  
performance of my program with approx 20 percent when run on a small  
sample. That is great.


Quoting Nick Cox <[email protected]>:

> Having looked again at the code, the problem appears to be   
> identifying panels for which the number of non-missing values of   
> -TotalReturn- is at least a predefined value stored in a local macro  
>  -requiredEstimationPeriod-.
> That is
> egen nvalid = count(TotalReturn), by(id)
> drop if nvalid < `requiredEstimationPeriod'
> Nick
> [email protected]
> Nick Cox
> Martin answered the question here, but various secondary points   
> arise from looking at the code. Most are on style and most are of   
> some wider interest.
> 1. The loop consists of repeated -drop-ping of observations not   
> desired, working with the remaining subset and then a -restore- of   
> the original. It is difficult to say in general what is most   
> efficient and what most elegant but for a situation like that below   
> I'd normally just add an extra condition excluding the observations   
> not wanted, rather than repeatedly doing major surgery on the   
> dataset. However, others could equally point out that applying -if-   
> on a very large dataset can be time-consuming.
> 2. If only the minimum and maximum are needed from a -summarize- it   
> is best just to use a -meanonly- option. (The name -meanonly- is   
> misleading, as I've had occasion to remark before.)
> 3. Code like
> 	local `minDate' = r(min)
> 	<stuff> if <stuff> date >= ``minDate''
> looks legal but odd. You are probably using more levels of macros   
> than you need. It's hard to tell because the code isn't completely   
> self-contained (that's not a criticism; it wasn't necessary for your  
>  question).
> 4. Code in which you loop over the contents of a local macro and   
> change that macro within the loop can be tricky. Watch out!
> 5. The -if- condition in
> 	summarize totalReturn if totalReturn != .
> is unnecessary as -summarize- always ignores missings.
> 6. To get minimum and maximum dates in a panel, no looping is necessary as
> egen mindate = min(date), by(id)
> egen maxdate = max(date), by(id)
> will do it. Similarly it looks as if your main problem does not need  
>  any looping either, as it should yield to -egen- operations. Look  
> at  -egen, count()- in particular.
> 7. More generally, it is not always positive to know too many other   
> languages if they lead you to seek a Stata equivalent of other code   
> when there's a Stataish way to do it without any real programming.
> Nick
> [email protected]
> Joachim Landström
> I have what I hope to be a minor problem that I nevertheless fail to find
> solution to. Suppose that I have a local macro panelVar that contains
> ids. Based on a selection criterion I wish to remove some panel ids from
> panelVar. How do I do that? I use Stata/MP 10.1 in Windows XP 32-bit.
> More specifically see example below. Suppose the panel id is called id and
> the time series variable is date. Per id & date I have the actual content
> the form of totalReturn (tDelta is 7):
> **** Begin Example ****
> local estimationPeriod = 3
> local requiredEstimationPeriod = `estimationPeriod' * floor( 365 /
> ``tDelta'' )
> levelsof id, local(panelVar)
> preserve
> quietly foreach i of local panelVar ///
> 	{
> 		restore, preserve
> 		drop if id != `i'
> 		summarize date if totalReturn != .
> 		local `minDate' = r(min)
> 		local `maxDate' = r(max)
> 		summarize totalReturn if totalReturn != . ///
> 					& date >= ``minDate'' & date <=
> ``maxDate''
> 		if  `r(N)' < `requiredEstimationPeriod' ///
> 			{
> 			***** Here I wish to update the local macro panelVar
> such `i' is removed *********
> 			}
> 		else ///
> 			{
> 			}
> 	}
> **** End Example ****
Joachim Landström

