Hoetker, Glenn
>
> Hoping someone can help me with a problem involving dividing up a
> variable. My data consists of patent numbers and inventors and looks
> like this:
>
> nmi
> wku
> Schmitt, Ty; Gandre, Jerry
> 5586003
> Sato, N. Albert; Baker, David C.; Waldron, Christie J.
> 5586324
> Swamy, N. Deepak
> 5587885
>
> I would like it to look like this:
>
> nmi
> wku
> Schmitt, Ty
> 5586003
> Gandre, Jerry
> 5586003
> Sato, N. Albert
> 5586324
> Baker, David C.
> 5586324
> Waldron, Christie J.
> 5586324
> Swamy, N. Deepak
> 5587885
>
> That is, I want to create a record containing each inventor
> and his or
> her associated patent number. If Ty Schmitt had five
> patents, he should
> show up in five records. The number of inventors per
> patent varies from
> one to many.
>
> I've looked for egen functions (and their extensions) and done some
> experimenting, but am floundering. Any help would be very
> appreciated!
The "nmi whu" stuff I don't understand.
I am going to be optimistic and assume it is a preamble
you can strip off.
My suggestion is to use -split- from SSC and -reshape-.
For -split-,
. ssc inst split
For -reshape-, we're using the Third Law of Reshaping:
* You may need two -reshape-s to get where you want to be *.
Here's my log:
. l
whatever
1. Schmitt, Ty; Gandre, Jerry
2. 5586003
3. Sato, N. Albert; Baker, David C.; Waldron, Christie J.
4. 5586324
5. Swamy, N. Deepak
6. 5587885
First we set up row and column identifiers for a -reshape-:
. egen id = seq(), b(2)
. egen field = seq(), t(2)
. l
whatever id
field
1. Schmitt, Ty; Gandre, Jerry 1
1
2. 5586003 1
2
3. Sato, N. Albert; Baker, David C.; Waldron, Christie J. 2
1
4. 5586324 2
2
5. Swamy, N. Deepak 3
1
6. 5587885 3
2
Now we map each pair of observations into one:
. reshape wide whatever, i(id) j(field)
. l
Observation 1
id 1 whatev~1 Schmitt, Ty;.. whatev~2
5586003
Observation 2
id 2 whatev~1 Sato, N. Alb.. whatev~2
5586324
Observation 3
id 3 whatev~1 Swamy, N. De.. whatev~2
5587885
. rename whatever1 who
-split- works on some separator. Here it's a semi-colon:
. split who, p(;)
variables created as string: who1 who2 who3
We have the original -who- and the parts -who?-.
The original will just be in the way:
. drop who
Now the finish is in sight:
. reshape long who, i(id)
. drop if who == ""
. compress
. l
id _j whatever2 who
1. 1 1 5586003 Schmitt, Ty
2. 1 2 5586003 Gandre, Jerry
3. 2 1 5586324 Sato, N. Albert
4. 2 2 5586324 Baker, David C.
5. 2 3 5586324 Waldron, Christie J.
6. 3 1 5587885 Swamy, N. Deepak
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/