Ning <[email protected]> asked,
> How to reshape a data set from long form to wide form without using the
> command reshape?
and went on to explain why,
> Recently I am working on a data set from a remote access data library. The
> RADL receives STATA codes submitted by its users, it runs jobs and returns
> outputs via its web interface. For confidentiality reasons, not all the
> STATA commands are allowed in the RADL. One of these commands is reshape, we
> know reshape a dataset won't reveal extra information though. I tried to
> program but failed.)
Nick Cox <[email protected]> responded rather negatively to Ning's
question because, as he said, "-reshape- is no threat to anyone." I agree.
Nonetheless, if Ning is getting data from a remote library and they have
turned off -reshape-, what is Ning to do?
> HID PID INJDAM
> A 1 a
> A 2 b
> A 2 c
> B 1 b
> B 2 a
>
> where HID and PID represent household and personal ids respectively, and
> INJDAMC the damage type (eg, a=burns, b=bruising, c=..)in a person's most
> recent injury.
Here's how to convert this wide data to long form:
// setup
. use original, clear
. sort HID PID
. by HID PID: assert _N==1 // checking assumptions
. save original, replace // save sorted by HID
// do PID==1
. use original, clear
. keep if PID==1
. drop PID
. rename INJDAM INJDAM1
. save long, replace
// do PID==2
. use original, clear
. keep if PID==2
. drop PID
. rename INJDAM INJDAM2
. merge HID using long
. drop _merge
. sort HID
. save long, replace
If there was a PID==3, one would continue
// do PID==3
. use original, clear
. keep if PID==3
. drop PID
. rename INJDAM INJDAM3
. merge HID using long
. drop _merge
. sort HID
. save long, replace
An outline of the process is,
1. Check assumptions. HID PID should uniquely identify the
observations. In addition, resave original.dta sorted by
at least HID, because we are going to need it that way.
2. Determine the values of PID. Interactively, the easy way to
that is -tabulate PID-. Write down all the values that PID
takes on.
3. Do the following for each value of PID:
a. Use original.dta and keep just the observations for the
specific PID being processed right now.
Then drop variable PID.
b. rename all the other variables except HID to have a
suffix indicating the PID value. In the example, we
had one such variable: INJDAM.
c. If this is the first PID value, save the beginnings
of the dataset in long form: -save long-. For subsequent
PID values, you must merge by HID the data in memory with
long.dta, drop _merge, re-sort by HID, and resave long.dta.
I admit it's a lot of work.
For everyone's information, Stata was the first package to have a
-reshape- command, and the terms reshape, long, and wide were
introduced by us. In my past as an analyst, I had reshaped data
by hand many times and it was always a tedious experience. We were
very proud the day we realized that Stata's programming language was
up to mechanizing the task because many programming languages are not.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/