Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Contract/Collapse Combination
From
Lucas <[email protected]>
To
[email protected]
Subject
Re: st: Contract/Collapse Combination
Date
Tue, 22 May 2012 09:51:06 -0700
Nick,
A composite 6-digit identifier is not a problem. I indicated I did
not think it possible to make such an identifier for each cell of
15-way crosstab. So, we are not disagreeing.
I don't think contract is buggy. I think a simple (conceptually,
perhaps not computer "programmingly") extension of contract to allow
multiple (or at least 2) frequency counts seems a good idea if
possible, and consistent with the stata-proposed solution of
addressing slow estimation on big data with collapsing data and using
frequency counts.
I won't alert stata--they are listening anyway, and they can easily
come back at me and say I should get more memory. And, of course, I'd
agree. But, still, we'd be left with a command seemingly within
whispering distance of providing a general solution to a common task,
but not going that final distance.
Thanks, though.
Sam
On Tue, May 22, 2012 at 9:37 AM, Nick Cox <[email protected]> wrote:
> The solution here of producing a composite identifier looks likely to fail. You are putting a very big number into a -float- variable and expect to retain every last bit of precision. See
>
> http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
>
> for why that is a bad idea.
>
> As for the rest, you seem to be claiming that -contract- is buggy. That is important if true, and you should send in a report containing incontrovertible evidence to Stata tech-support.
>
> Nick
> [email protected]
>
> Lucas
>
> Brendan,
>
> My original note indicated exactly the solution you propose, of doing
> it twice and merging. But this is incredibly risky, because there is
> no way to assure every combination appears in both files. Even the
> "zero" option apparently cannot assure this. Believe me, I tried this
> with about 6 variables, and the file sizes do not equate across
> runs--not to mention that one has to be pretty certain everything is
> sorted exactly right. I do not know *why* the problem occurred, it
> occurred, and perhaps it is that the file is so big, that problems
> emerge that do not exist for smaller datasets (e.g., sorted cases fall
> out of sorts, as it were).
>
> At any rate, my response was to make an id based on the 6 variables:
>
> gen id=(x1*10000)+(x2*1000)+. . .+(x6) ;
>
> This works for 6 dichotomous variables; it will not work for 15
> variables of various types, because the id# will exceed the largest
> value allowed in stata.
>
> THUS, it seems a more general solution is needed, that does not
> require a later merge.
>
> As for your collapse example, it is unclear, as you start with data
> that is already collapsed. The problem is the data is not collapsed,
> and the aim is to get it into the collapsed form.
>
> On Tue, May 22, 2012 at 7:50 AM, Brendan Halpin <[email protected]> wrote:
>> On Tue, May 22 2012, Lucas wrote:
>>
>>> Is there a way to use the contract command and obtain frequencies for
>>> TWO variables rather than just ONE? A corollary question would be, Is
>>> there a way to use the contract command and obtain the count of 1's on
>>> TWO separate dichotomous variables?
>>
>> That is what my example achieves, though using -collapse- instead of
>> -contract-.
>>
>> Another way of doing it would be to separate the data by entercol, and
>> -contract- or -collapse- it twice, once for entercol==1 and once for
>> entercol==0, and then merge the resulting files by the 15 crosstab
>> variables.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/