| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Segmenting a dataset
At 03:46 PM 5/17/2007, Morrison Hodges wrote:
I have a dataset of 10 variables and 5000 observations. I need to calculate
the median of each variable in groups of 30 observations, i.e., the median
of each variable in observations 1-30, then the median for 31-60, then
61-90, etc. I know I can get the median from the p50 value of -summarize-,
but I'm not sure how to obtain consecutive segments of 30 observations each
to perform -summarize- on. Can anyone help?
Thanks, Morry Hodges
Do you want to just see what the medians are? If so, just do..
summarize var1 var2 ... in 1/30, det
summarize var1 var2 ... in 31/60, det
etc.
You can do this in a loop, if you prefer:
forvalues j = 1(30) `=_N' {
summarize var1 var2 ... in `j' / `=min( `j'+30, _N), det
}
----
On the other hand, do you want the values deposited in the dataset?
If so then, first get a "group" variable.
gen int group = floor(_n / 30)
Now if you want the values deposited into the data as constants by group...
bysort group: egen med1 = median(var1)
and so on for the other variables.
If you want just a set of collapsed values...
collapse (median) med1 = var1 (median) med2 = var2 ... , by(group)
I hope this helps.
--David
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/