Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: st: creating random groups of observations
From
Clyde B Schechter <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: Re: st: creating random groups of observations
Date
Thu, 6 Dec 2012 19:24:20 +0000
Luca Campanelli wants to randomly assort 4000 words into 1000 groups with 4 words each, and he wants to assure that each group has a satisfactory mix of long and short words. He doesn't specify exactly what criterion defines a satisfactory mix, so it is hard to be concrete. But here are a few thoughts.
First, depending on the frequency distribution of long and short words (and even what is meant by long and short in this context), it may not even be posssible. For example, if there are only 100 "short" words in the data set, then clearly the goal cannot be achieved.
Assuming that long and short words are all prevalent in sufficient numbers then creating 1000 groups of 2 long words and 1000 groups of 2 short words, then combining each long word group with its correspondingly numbered short word group might do it, again depending on exactly what you have in mind.
If Luca has in mind some more complex criterion such as constraints on the mean and variance of the number of characters in each group's words, that is something I would not try to accomplish in Stata. It could be done in C++ or a similar programming language using a branch-and-bound algorithm. But expect it to take a long time to run even on a fast machine: you are trying to tame a combinatorial explosion by imposing a few constraints. And, again, be prepared for the possibility that the actual distribution of word lengths precludes the existence of any solution at all--which you would only find out after a very long time.
Best of luck.
Clyde Schechter
Dept. of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/