Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Creating square matrix with obs counts from 2 non unique id variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Creating square matrix with obs counts from 2 non unique id variables
Date
Tue, 19 Apr 2011 19:00:29 +0100
Here is another way to do it. This uses -tabcount- (SSC).
levelsof ID1, local(l1)
levelsof ID2, local(l2)
local l : list l1 | l2
tabcount ID1 ID2 , v1(`l') v2(`l')
----------------------------
| ID2
ID1 | 10 11 12
----------+-----------------
10 | 1 1
11 | 1 1
12 |
----------------------------
. tabcount ID1 ID2 , v1(`l') v2(`l') matrix(foo)
----------------------------
| ID2
ID1 | 10 11 12
----------+-----------------
10 | 1 1
11 | 1 1
12 |
----------------------------
. mat li foo
foo[3,3]
10 11 12
10 0 1 1
11 1 0 1
12 0 0 0
On Tue, Apr 19, 2011 at 5:44 PM, Nick Cox <[email protected]> wrote:
> This is ugly but it works.
>
> The idea is that you clone the data and then flip the variables round
> in the clone. That ensures a square matrix. You can take out the fake
> entries by rounding down as each was given a very small weight.
>
> Someone should be able to improve on this.
>
> clear
> input obs ID1 ID2
> 1 10 12
> 2 10 11
> 3 11 12
> 4 11 10
> end
> local N = _N
> local Np1 = _N + 1
> expand 2
> replace ID1 = ID2[_n - `N'] in `Np1'/L
> replace ID2 = ID1[_n - `N'] in `Np1'/L
> gen wt = cond(_n <= `N', 1, 1e-9)
>
> (1)
>
> collapse (sum) wt, by(ID1 ID2)
> replace wt = round(wt)
>
> OR
>
> (2)
>
> tab ID1 ID2 [iw=wt], matcell(foo)
> levelsof ID1, local(names)
> mata : st_matrix("foo", round(st_matrix("foo")))
> matrix rownames foo = `names'
> matrix colnames foo = `names'
>
> On Tue, Apr 19, 2011 at 4:47 PM, Diogo L Pinheiro
> <[email protected]> wrote:
>
>> I've been trying to create a square matrix with counts of observations from
>> two non unique ID variables. In this case, one id variable has the code for
>> an organization an individual was and one for where the individual is. For
>> example:
>>
>>
>> Observation ID1 ID2
>> 1. 10 12
>> 2. 10 11
>> 3. 11 12
>> 4. 11 10
>>
>> In the example above, observation 1 was at organization #10 at one time and
>> organization #12 at another. Observation 2 was at 10 and then 11. And so on.
>>
>>
>> From that, I want to create a square matrix that would look something like:
>>
>>
>> 10 11 12
>>
>> 10 0 1 1
>> 11 1 0 1
>> 12 0 0 0
>>
>> So that I could have all the counts of each combination of organizations.
>>
>> The main problem is getting a square matrix in the end. I can create a non
>> square matrix in this case by using
>> tab ID2, gen(newID)
>> collapse (sum) newID, by(ID1)
>>
>> but the problem is that this won't be a square matrix, as not every ID
>> number shows up in both variables.. In the example above it would be missing
>> a line (since not every ID number present in ID1 is present in ID2, and vice
>> versa).
>>
>> So how could I get a square matrix from the situation above? I've found
>> commands to create similar matrices, but they require unique IDs and only
>> provide binary results for the existing combinations.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/