RE: st: RE: RE: RE: on counting and fractions in stata

From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: RE: RE: RE: on counting and fractions in stata
Date   Fri, 28 Oct 2011 18:32:35 +0100

Yes, I believe it is. 

As an over-arching comment, it is really is best to use Stata terminology in a Stata forum, "variables" for "columns" and so forth. 

It is usually a bad idea in using Stata to store other stuff in the same variable as the main contents of a variable. But given that you started this way, and assuming names -column1- ... -column3-: 

gen work = . 

forval j = 1/3 {
	qui replace work = column`j' > column`j'[1] 
	su work in 2/L if column`j' < ., meanonly 
	di "column`j'" 
	di "total     " r(sum)
	di "mean      " r(mean)
	di "n         " r(N) 

[email protected] 

[email protected]

thanks a lot for helping me resolve this problem. 

I have one quick question:do you believe it is possible to carry out this estimation in Stata at once for more than one series of observations.

I can sort my data for example in the following way (in each column - the first value corresponds to the threshold value followed by the daily observations):

column1    column2   column3

threshold1  threshold2  threshold3

series 1     series 2     series 3

I guess I should tell Stata in a first place to refer to the first value in each column as the threshold. Then - to compare with the rest of the observations in this column only; and report the number of cases in which the threshold value is crossed as well as the percentage share of these cases. Third - to tell Stata to move to the next column. 

Do you believe this is doable in Stata language?

Von: Nick Cox <[email protected]>

You can naturally also go 

gen byte above = series > 3.1 
su above, meanonly 
local mean = r(mean)
local total = r(N) 
local count = r(sum) 

and any extra conditions can be put on the -summarize-. 

In each case, "series > 3.1" would in careful code be "series > 3.1 & series < ." 

Nick Cox

If you have other conditions, the pattern is 

count if series > 3.1 & <condition> 
local count = r(N)
count if <condition>
local total = r(N)
local fraction = `count'/`total' 

[email protected] 

Nick Cox

. count if series > 3.1 
. local count = r(N) 
. local total = _N 
. di `count' 
. di `total' 
. di `count'/`total' 

-count- is a much-overlooked command. People seemingly don't find it or think it too simple. See e.g. 

SJ-7-1  pr0029  . . . . . . . . . . . . . . .  Speaking Stata: Making it count
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/07   SJ 7(1):117--130                                 (no commands)
        discusses count used with a loop over observations
        or variables

SJ-7-4  dm0033  . . . . . . Speaking Stata: Counting groups, especially panels
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/07   SJ 7(4):571--581                                 (no commands)
        discusses how to count panels through reduction commands
        or through tabulation commands and how to overcome
        problems that do not yield easily to these approaches

[email protected] 

[email protected]

I face the following problem in Stata:

I have a series of daily observations on a given variable and I also have one estimate of this variable that serves as a threshold. 

I would like to estimate how many times this threshold is exceeded by comparing the value of this estimate with the available data on daily observations. Is it possible to estimate fractions in Stata in this particular way? 

Below is a simple example: 

series   estimate: 3.1

Given this data,I would like to get the following estimates (comments in brackets):

count:1 (threshold is exceeded only once)
fraction: 1/4=0.25 (one out of 4 cases)

