Joe J <[email protected]> asked,
> What I am trying to do is create a numeric matrix from, let's say, 3
> string variables. My data set provides [...]
> [I] was wondering if Mata could help solve this ?
Yes. Mata is a great way to solve this problem.
Joe has data like,
partner1 partner2 partner3
11A --- 12Z
12Z 21S 11K
14T 11A 12Z
and from this, he wants to create data like,
11A 12Z 21S 11K 14T
11A 0 2 0 0 1
12Z 2 0 1 1 1
21S 0 1 0 1 0
11K 0 1 1 0 0
14T 1 1 0 0 0
The numbers in the matrix record the numbewr of pairs in the original data.
Observations in the original data record agreements among companies.
The companies are coded 11A, 12Z, etc. He wants a matrix recording the
number of agreements between companies.
I could do this entire problem using only Mata, but that would just be more
work that is necessary. Mata really shows its power when used with Stata,
and vice-versa. So I tooks Joe's original data and formed form it:
. list
+-----------------------------------------------+
| partner1 partner2 partner3 p1 p2 p3 |
|-----------------------------------------------|
1. | 12Z 21S 11K 3 5 2 |
2. | 14T 11A 12Z 4 1 3 |
3. | 11A --- 12Z 1 . 3 |
+-----------------------------------------------+
New variables are p1, p2, and p3 are just like partner 1, partner2,
and partner3, except that I have assigned numeric codes to the companies.
1 is cokmpany 11A, 2 is 11K, and so on. Those numbers will become my
row and column numbers in the Mata program I will write. Before
getting into Mata, however, let me show you the Stata code that took the
original data and added p1, p2, and p3:
------------------------------------------------------------------
input str3 (partner1 partner2 partner3)
11A --- 12Z
12Z 21S 11K
14T 11A 12Z
end
save long1
list
gen id = _n
reshape long partner, i(id)
drop id _j
drop if partner=="---"
sort partner
by partner: keep if _n==1
gen code = _n
sum code
save mapping
program fixvar
args oldvar newvar
rename `oldvar' partner
sort partner
merge partner using mapping
keep if _merge==1 | _merge==3
drop _merge
rename partner `oldvar'
rename code `newvar'
end
use long1, clear
fixvar partner1 p1
fixvar partner2 p2
fixvar partner3 p3
list
------------------------------------------------------------------
I could write about the code, but I think it is self explanatory for anyone
who wants to spend the time reading it.
With that dataset, here's the Mata code to produce the desired matrix:
------------------------------------------------------------------
mata:
real matrix agmat(real scalar N, string scalar varnames)
{
st_view(V, ., tokens(varnames))
A = J(N, N, 0)
for (j=1; j<=rows(V); j++) {
for (i1=1; i1<=cols(V); i1++) {
for (i2=1; i2<=cols(V); i2++) {
if (i1!=i2) {
k1 = V[j, i1]
k2 = V[j, i2]
if (k1!=. & k2!=.) {
A[k1,k2] = A[k1,k2] + 1
}
}
}
}
}
return(A)
}
end
------------------------------------------------------------------
To produce the desired result, I then typed
. mata:
: agmat(5, "p1 p2 p3")
[symmetric]
1 2 3 4 5
+---------------------+
1 | 0 |
2 | 0 0 |
3 | 2 1 0 |
4 | 1 0 1 0 |
5 | 0 1 1 0 0 |
+---------------------+
: end
The code was reasonably straight forward. We started with the data,
+-----------------------------------------------+
| partner1 partner2 partner3 p1 p2 p3 |
|-----------------------------------------------|
1. | 12Z 21S 11K 3 5 2 |
2. | 14T 11A 12Z 4 1 3 |
3. | 11A --- 12Z 1 . 3 |
+-----------------------------------------------+
Just look at the columns for p1, p2, and p3. We want to start with a
5x5 matrix A = 0, and then add 1 to the elements (starting at observation
1) (3,5), (3,2), (5,3), (5,2), (2,3), (2,5), and then we move onto observation
2, and so on.
That's what the code does; going across observations, it takes every
combination of pairs of p1, p2, and p3 and adds 1 to a 5x5 matrix that
started out containing 0.
That matrix is now in Mata. We could save it in a Mata variable by
typing,
: M = agmat(5, "p1 p2 p3")
More likely, however, I'm guessing Joe will want to form a Stata dataset from
the result. There are lots of ways Joe could do that, and nearly all of them
are more clever than what I'm about to show you, but what follows is the
easiest to understand:
. mata: M = agmat(5, "p1 p2 p3")
. drop _all
. set obs 5
. gen f1 = 0
. gen f2 = 0
. gen f3 = 0
. gen f4 = 0
. gen f5 = 0
. mata:
: st_view(V=., ., .)
: V[.,.] = M
: end
The result of which is,
. list
+------------------------+
| f1 f2 f3 f4 f5 |
|------------------------|
1. | 0 0 2 1 0 |
2. | 0 0 1 0 1 |
3. | 2 1 0 1 1 |
4. | 1 0 1 0 0 |
5. | 0 1 1 0 0 |
+------------------------+
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/