Javier Escobal wrote
> I have a data base that has the following form:
>
> id cluster X
> 1 1 0.5
> 2 1 0.7
> 3 1 0.4
> .. . .
> .. . .
> .. . .
> 100 3 0.6
> 101 3 0.6
> 102 3 0.8
> 103 3 0.2
>
> that is observations can be grouped in clusters (of different size). I
> am interested in constructing different statistics: for example for each
> observation "i" I need to capture the average and standard deviation of
> all observations that belong to the same cluster where "i" belongs
> excluding observation "i".
For the mean:
.. sort cluster
.. by cluster: gen sumx = sum(X)
.. by cluster: replace sumx = sumx[_N] - X
.. by cluster: gen meanx = sumx/(_N-1)
For the standard deviation the answer seems to be more difficult. At the
moment I only can think about a solution with a loop over the observations
within each cluster. There must be a better solution and I am sure that I
have overlooked somethink obvious. But anyway, you may use the following as a
starting point:
gen temp = .
gen std = .
egen group = group(cluster) /* this might be not necassary */
sort group
local K = group[_N]
local last 0
forvalues k = 1/`K' {
local first = 1 + `last'
count if group == `k'
local N = r(N)
local last = `first' + (`N'-1)
forvalues i = `first'/`last' {
replace temp = .
replace temp = (X - meanx[`i'])^2 if _n~= `i' & group == `k'
replace temp = sum(temp)
replace std = temp[_N]/(`N'-2) if _n== `i'
}
}
drop temp group
regards
uli
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/