Joe J. <[email protected]> needed to create a numeric, symmetric matrix
recording the number of agreements between firms and, when last we left off,
we had created a matrix M in Mata containing the desired result,
: M
[symmetric]
1 2 3 4 5
+---------------------+
1 | 0 |
2 | 0 0 |
3 | 2 1 0 |
4 | 1 0 1 0 |
5 | 0 1 1 0 0 |
+---------------------+
and copied the matrix back to the Stata data in memory, so we had,
. list
+------------------------+
| f1 f2 f3 f4 f5 |
|------------------------|
1. | 0 0 2 1 0 |
2. | 0 0 1 0 1 |
3. | 2 1 0 1 1 |
4. | 1 0 1 0 0 |
5. | 0 1 1 0 0 |
+------------------------+
The code we used to copy M to Stata was crude, but it worked.
Joe says
> [...] I need to:
>
> 1. Generate a string variable, let's call it Company, that contains the
> company code, as in the file 'mapping' in your code (you named this
> variable Partner) . This variable will help me merge other firm-level
> variables with the matrix, and
>
> 2. I would like the company codes to be the column headings (that's as the
> variable names) as well of the matrix, replacing f1, f2, etc in your
> code. In a sense these variables are the transpose of the string
> variable Company.
If I understand Joe, he wants this,
+-------------------------------------------------+
| company f_11A f_11K f_12Z f_14T f_21S |
|-------------------------------------------------|
1. | 11A 0 0 2 1 0 |
2. | 11K 0 0 1 0 1 |
3. | 12Z 2 1 0 1 1 |
4. | 14T 1 0 1 0 0 |
5. | 21S 0 1 1 0 0 |
+-------------------------------------------------+
I wonder whether Joe really wants that because I suspect that is not a
useful way to store or analyze these data, but I may be wrong, so
first, let's get the data organized in this way.
Obtaining what Joe wants
------------------------
Our example above is 5x5, but in Joe's real data, the result is
3,338 x 3,338. Joe got the program to work, however.
So let's go back to the point where we typed,
. mata: M = agmat(5, "p1 p2 p3")
and then went about moving M to Stata, and do the move to Stata in a
more sophisticated way.
To remind you, mapping.dta looked like this:
. use mapping, clear
. list
+----------------+
| partner code |
|----------------|
1. | 11A 1 |
2. | 11K 2 |
3. | 12Z 3 |
4. | 14T 4 |
5. | 21S 5 |
+----------------+
If I had partner from this dataset stored in a Mata colvector called partner,
then partner[i] would the identity of the i-th firm. So here is how we are
going to start:
. mata: M = agmat(5, "p1 p2 p3")
. use mapping
. mata:
: partner = st_sdata(., "partner")
: M
[symmetric]
1 2 3 4 5
+---------------------+
1 | 0 |
2 | 0 0 |
3 | 2 1 0 |
4 | 1 0 1 0 |
5 | 0 1 1 0 0 |
+---------------------+
: partner
1
+-------+
1 | 11A |
2 | 11K |
3 | 12Z |
4 | 14T |
5 | 21S |
+-------+
The solution is,
: st_dropvar(.) // equivalent to -drop all- in Stata
: st_addobs(cols(M))
: companyidx = st_addvar("str8", "company")
: varidx = st_addvar("int", "f_" :+ partner')
: st_sstore(., companyidx, partner)
: st_view(V=., ., varidx)
: V[.,.] = M
: end
The result is,
. list
+-------------------------------------------------+
| company f_11A f_11K f_12Z f_14T f_21S |
|-------------------------------------------------|
1. | 11A 0 0 2 1 0 |
2. | 11K 0 0 1 0 1 |
3. | 12Z 2 1 0 1 1 |
4. | 14T 1 0 1 0 0 |
5. | 21S 0 1 1 0 0 |
+-------------------------------------------------+
How I would organize these data
-------------------------------
I would organize these data in the long form, like this,
+-----------------------------------+
| c1 c2 company1 company2 n |
|-----------------------------------|
1. | 1 2 11A 11K 0 |
2. | 1 3 11A 12Z 2 |
3. | 1 4 11A 14T 1 |
4. | 1 5 11A 21S 0 |
5. | 2 3 11K 12Z 1 |
|-----------------------------------|
6. | 2 4 11K 14T 0 |
7. | 2 5 11K 21S 1 |
8. | 3 4 12Z 14T 1 |
9. | 3 5 12Z 21S 1 |
10. | 4 5 14T 21S 0 |
+-----------------------------------+
In this organization, the 0 agreements between 1 & 2 (11A & 11K) is recorded
only once; the 0 agreements between 2 & 1 (11K & 11A) is not recorded. The
rule is that if you want to know the agreement between i and j, order i and j
so that i<j, and look for the obs. coresponding to c1==i & c2==j. The same
rule applies for companies recorded in string form. In this organization, it
will be easy to merge in variables containing company characteristics.
Anyway, the code to do that is,
: st_dropvar(.) // equivalent to -drop all- in Stata
: st_addvar("int", ("c1", "c2"))
: st_addvar("str8", ("company1", "company2"))
: st_addvar("int", "n")
: N = rows(M)
: st_addobs(N*(N+1)/2 - N)
: st_view(n=., ., "n")
: st_view(c1=., ., "c1")
: st_view(c2=., ., "c2")
: st_sview(company1=., ., "company1")
: st_sview(company2=., ., "company2")
: for (i=obs=1; i<=rows(M); i++) {
: for (j=i+1; j<=cols(M); j++) {
: c1[obs] = i
: c2[obs] = j
: company1[obs] = partner[i]
: company2[obs] = partner[j]
: n[obs] = M[i,j]
: obs++
: }
: }
: end
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/