The think is I am trying to get random samples from a data set. I have
to use the traning and houldout data. But I have to get this samples
with the same criteria that the original data set was built.
In order to do this in a first stage I have to get a random sample from
the primary units (it is composed of the group of housings ). In a
second stage I need to get another random sample from the secondary
units (it is composed of the group of households). Is the cluster
command useful for this?
again,
thanks,
Sandra
sgsr100 wrote:
>
> Hi Joseph,
> And how can I guarantee that I am keeping the remaining 67% for
> houldout. What do you mean with
> <command> if training?
> with this
> Iam keeping the remaining?
> Thanks
>
> Joseph Coveney wrote:
> >
> > You can flag training samples by -generate byte training = (uniform() < X)-,
> > where X is the percentage of the total dataset that you want to use for
> > training. So, for example, if you want to use 33% for training and keep the
> > remaining 67% for holdout, then the commands would be:
> >
> > set seed <seed>
> > generate byte training = uniform() < 0.33
> > <command> if training
> >
> > Joseph Coveney
> >
> > ----------------------------------------------------------------------------
> > sgsr100 wrote:
> >
> > >Hi,
> > >Do you know how can a get a training sample randomly chosen from stata
> > >and how can I keep the remaining observations as a holdout sample.
> > >Thanks,
> > >Sandra
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/