Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: creating combinations of all 49 variables and counting their frequencies
From
Jacob Model <[email protected]>
To
[email protected]
Subject
Re: st: creating combinations of all 49 variables and counting their frequencies
Date
Sat, 1 Mar 2014 21:50:18 -0800
So I think there's a real question if order matters for your
combinations. So let's say A, B, and C are different adoptions. Is the
combination ABC equivalent to BAC?
Some folks at stackoverflow have talked about this... here's the links
http://stackoverflow.com/questions/3467914/is-there-an-algorithm-to-generate-all-unique-circular-permutations-of-a-multiset
And here's a CS theoretical paper which talks about how you might
implement an algorithm to do this.
http://www.cis.uoguelph.ca/~sawada/papers/alph.pdf
http://www.sciencedirect.com/science/article/pii/S0196677400911088
My guess (as an amateur programmer) is that you probably could write
some algorithm that would take into account that combinations can be
deconstructed as subsets of each other and you'll automatically know
they're 0. So if you're working from the bottom up... if you already
know that all AB is zero and all CD is zero... by construction ABCD
will have zero frequency. So you wouldn't have to compute any
combination that contained AB or CD.
Another way of thinking about this may be in a network framework with
each adoption pair being a tie between nodes. So if you had A and B
adopt you could think of them having a tie. If A, B and C adopted it
would be a triad. Etc. The advantage with this is you could store it
as an edgelist, which is pretty efficient. In other words, you could
put in every observation of groups of two features - a much more
manageable number - and create a resulting database that could tell
you the frequency of larger combinations.
-Jacob
On Sat, Mar 1, 2014 at 9:09 PM, Krisha Lim <[email protected]> wrote:
> Hi,
>
> I have 49 binary variables. I am interested in doing all combinations for those 49 variables and calculating the frequencies. I am not sure how to do this in STATA. The tuples command just generates all the tuples but it stopped after the 9999999 tuples. Would you be able to help me?
>
> To give a context, each binary variable indicates adoption (so 1= adopt). I want to figure out the most used technique or combination of techniques used in my dataset. I know this will be a very very large number, but hope there's a way to do it.
>
> Thanks!
>
> Krisha
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/