Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Creating square matrix with obs counts from 2 non unique id variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Creating square matrix with obs counts from 2 non unique id variables
Date
Tue, 19 Apr 2011 19:07:52 +0100
There was a write-up of -tabcount- in
SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/03 SJ 3(4):420--439 (no commands)
reviews three user-written commands (tabcount, makematrix,
and groups) as different approaches to tabulation problems
On Tue, Apr 19, 2011 at 7:00 PM, Nick Cox <[email protected]> wrote:
> Here is another way to do it. This uses -tabcount- (SSC).
>
> levelsof ID1, local(l1)
> levelsof ID2, local(l2)
> local l : list l1 | l2
> tabcount ID1 ID2 , v1(`l') v2(`l')
>
> ----------------------------
> | ID2
> ID1 | 10 11 12
> ----------+-----------------
> 10 | 1 1
> 11 | 1 1
> 12 |
> ----------------------------
>
> . tabcount ID1 ID2 , v1(`l') v2(`l') matrix(foo)
>
> ----------------------------
> | ID2
> ID1 | 10 11 12
> ----------+-----------------
> 10 | 1 1
> 11 | 1 1
> 12 |
> ----------------------------
>
> . mat li foo
>
> foo[3,3]
> 10 11 12
> 10 0 1 1
> 11 1 0 1
> 12 0 0 0
>
>
>
> On Tue, Apr 19, 2011 at 5:44 PM, Nick Cox <[email protected]> wrote:
>> This is ugly but it works.
>>
>> The idea is that you clone the data and then flip the variables round
>> in the clone. That ensures a square matrix. You can take out the fake
>> entries by rounding down as each was given a very small weight.
>>
>> Someone should be able to improve on this.
>>
>> clear
>> input obs ID1 ID2
>> 1 10 12
>> 2 10 11
>> 3 11 12
>> 4 11 10
>> end
>> local N = _N
>> local Np1 = _N + 1
>> expand 2
>> replace ID1 = ID2[_n - `N'] in `Np1'/L
>> replace ID2 = ID1[_n - `N'] in `Np1'/L
>> gen wt = cond(_n <= `N', 1, 1e-9)
>>
>> (1)
>>
>> collapse (sum) wt, by(ID1 ID2)
>> replace wt = round(wt)
>>
>> OR
>>
>> (2)
>>
>> tab ID1 ID2 [iw=wt], matcell(foo)
>> levelsof ID1, local(names)
>> mata : st_matrix("foo", round(st_matrix("foo")))
>> matrix rownames foo = `names'
>> matrix colnames foo = `names'
>>
>> On Tue, Apr 19, 2011 at 4:47 PM, Diogo L Pinheiro
>> <[email protected]> wrote:
>>
>>> I've been trying to create a square matrix with counts of observations from
>>> two non unique ID variables. In this case, one id variable has the code for
>>> an organization an individual was and one for where the individual is. For
>>> example:
>>>
>>>
>>> Observation ID1 ID2
>>> 1. 10 12
>>> 2. 10 11
>>> 3. 11 12
>>> 4. 11 10
>>>
>>> In the example above, observation 1 was at organization #10 at one time and
>>> organization #12 at another. Observation 2 was at 10 and then 11. And so on.
>>>
>>>
>>> From that, I want to create a square matrix that would look something like:
>>>
>>>
>>> 10 11 12
>>>
>>> 10 0 1 1
>>> 11 1 0 1
>>> 12 0 0 0
>>>
>>> So that I could have all the counts of each combination of organizations.
>>>
>>> The main problem is getting a square matrix in the end. I can create a non
>>> square matrix in this case by using
>>> tab ID2, gen(newID)
>>> collapse (sum) newID, by(ID1)
>>>
>>> but the problem is that this won't be a square matrix, as not every ID
>>> number shows up in both variables.. In the example above it would be missing
>>> a line (since not every ID number present in ID1 is present in ID2, and vice
>>> versa).
>>>
>>> So how could I get a square matrix from the situation above? I've found
>>> commands to create similar matrices, but they require unique IDs and only
>>> provide binary results for the existing combinations.
>>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/