Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | krishanu karmakar <krishkarmakar@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Trying to simulate sampling distribution of mean |
Date | Wed, 30 Jan 2013 00:15:31 -0500 |
Thank you, for all the help. Krishanu On Tue, Jan 29, 2013 at 9:03 PM, Nick Cox <njcoxstata@gmail.com> wrote: > The -simulate- call would need to be revised to pick up r(mean). > > > On Wed, Jan 30, 2013 at 1:30 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> If you want to do it this way, you can simplify your program >> >> program ybar >> qui use big.dta, clear >> sample 60, count >> su age, meanonly >> end >> >> I think that should still work. -syntax- does nothing for you. >> -summarize- leaves r(mean) in its wake any way. Taking a variable and >> putting it in another and taking a saved result and putting it in >> another can both be excised. >> >> Nick >> >> >> On Wed, Jan 30, 2013 at 12:04 AM, krishanu karmakar >> <krishkarmakar@gmail.com> wrote: >>> Thank you Dr. Cox, >>> >>> I did a little bit more searching and with the help of your answer I >>> modified my -ybar- program as follows >>> >>> ----------------------------- >>> program define ybar, rclass >>> syntax [,] >>> qui use big.dta, clear >>> sample 60, count >>> gen y1 = age >>> summ y1 >>> return scalar my = r(mean) >>> end >>> >>> local reps 5 >>> simulate rmy=r(my), saving(sdistmean`i', replace) nodots reps(`reps'): ybar >>> ----------------------------------- >>> yes, I should probably put the -use- command as an option to the >>> -ybar- program to make it more generally usable. But, otherwise, it is >>> now working as i wanted it to. >>> >>> Thank you again. >>> Krishanu >>> >>> >>> On Tue, Jan 29, 2013 at 6:51 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Your program -ybar- does exactly the same thing every time, so >>>> inevitably the results are the same. If you look again at the help for >>>> -simulate- you will see that the example program -lnsim- includes its >>>> own random variate generation. Conversely, you do use -sample 0.1- but >>>> you use it outside your program. >>>> >>>> Otherwise put, -simulate- does not actually do stochastic simulation; >>>> it is just a framework that runs and collates the results of a program >>>> you write -- and that program must do the simulation >>>> >>>> In your case, there is an easy way of getting random samples from your >>>> dataset. Just chop the dataset into blocks randomly and summarize each >>>> block. . >>>> >>>> If you shuffle your data >>>> >>>> set seed 2803 >>>> gen random = runiform() >>>> sort random >>>> >>>> and create blocks of size 100 >>>> >>>> gen block = ceil(_n/100) >>>> >>>> then >>>> >>>> egen mean = mean(age), by(block) >>>> egen tag = tag(block) >>>> l mean if tag >>>> >>>> that will give you 1000 means each for blocks of size 100. For some >>>> reason, it seems that you only want 5, and that means you can throw >>>> 995 away. >>>> >>>> Nick >>>> >>>> On Tue, Jan 29, 2013 at 11:15 PM, krishanu karmakar >>>> <krishkarmakar@gmail.com> wrote: >>>> >>>>> The following is my code >>>>> >>>>> ==== code start ===== >>>>> >>>>> program define ybar, rclass >>>>> syntax [,] >>>>> replace y1 = y2 >>>>> summarize y1 >>>>> return scalar m_y = r(mean) >>>>> end >>>>> >>>>> >>>>> local reps 5 >>>>> >>>>> quietly use big.dta, clear >>>>> generate y2 = age >>>>> sample 0.1 >>>>> >>>>> quietly{ >>>>> gen y1=. >>>>> simulate m_age=r(m_y), saving(meandata, replace) nodots reps(`reps'): ybar >>>>> } >>>>> >>>>> ==== code ends ===== >>>>> >>>>> What I am trying to do. >>>>> I have a dataset named "big.dta" with 100,000 observations. The only >>>>> variable in this dataset is "age". >>>>> >>>>> I want to first draw a sample of size 100 from this dataset and >>>>> calculate the mean for the variable "age". I want to draw 5 such >>>>> samples and store the mean of "age" from each sample as the variable >>>>> "m_age" in a new dataset called "meandata". So this dataset will have >>>>> 5 observations. >>>>> >>>>> My code is running, but wrongly. I am getting stata to save the >>>>> "meandata", but all the five observations (mean of age from 5 >>>>> different samples) are stored as equal in value. That means stata is >>>>> not drawing 5 different samples, but only one sample. Could anyone >>>>> help by showing which line my code should I change? > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- Read it: http://www.stata.com/support/faqs/res/statalist.html Specially Question 3. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/