[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Taking random samples from data

From	"Ben Jann" <[email protected]>
To	[email protected]
Subject	Re: st: Taking random samples from data
Date	Fri, 1 Aug 2008 14:17:49 +0200

See

 . ssc describe gsample

You could type

 . gsample 600, id(cluster) wor

ben

On Thu, Jul 31, 2008 at 5:38 PM, Song <[email protected]> wrote:
> Hi All,
>
> I have a question about taking random samples from my data. My dataset has
> around 12,500 user ID's with 200,000 observations total and I want to take
> around 500-600 (number of users) random samples. The problem is that each
> member has multiple observations and I want to take all sub-observations for
> each member. Each ID has 4 to 21 observations. For example, if ID number 5
> has 10 observations, I want to take all 10 observations given ID number 5 is
> included in the sample.
>
> I tried the following and ended up with 580 number of users with around
> 8,800 observations. This method works, but I wonder if there is there any
> better way for this job, because I have to drop duplicated samples with this
> method.
>
> gen idcnt=_N
> bsample 600, cluster(id)     /* sampling with replacement: I do not know how
> to take cluster samples without replacement. */
> bysort id: egen idcount=count(id)
> compare idcount idcnt
> duplicates tag, gen(dup)
> drop if dup==1                /* To drop duplicated samples */
>
> I would greatly appreciate your help.
> Reo.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: about ivprobit
Next by Date: Re: st: How can I do a weighted logit?
Previous by thread: st: about ivprobit
Next by thread: Re: st: How can I do a weighted logit?
Index(es):
- Date
- Thread