Tae Ho Eom has several tasks on which assistance
is desired. I'll pick off one:
> I am working on two separate data manipulation works,
> facing some difficulties.
> If you could advise me of the way I can process the
> dataset, it would be highly appreciated.
>
> The first data manipulation work is:
>
> I have 20 datasets (the same dataset over 20 years) with
> the same variables and structure as follows.
> (the first line is variable names and some observations are below)
>
> STATE COUNTY OBJECTS AMOUNTS
>
> 01 178 GG 100,000
> 01 166 SW 200,000
> 01 778 DL 50,000
> 03 336 GG 86,000
> 03 227 SW 33,000
>
> ; This is a dataset that explains how the federal
> government money is distributed to each State and County by
> Object (such as Salary, Insurance)
> OBJECTS has about 9 categories and I have code for OBJECTS
> (e.g. GG mean federal grants)
> I have 20 years dataset, so I will generate year variable
> for each dataset.
>
> The final dataset I want is:
>
> YEAR STATE GG SW
> DL (and other OBJECT ITEMs below)
>
> 1997 01 $1000,000 $2000,000 $340,000
> 1997 02 $2000,000 $3000,000 $345,000
> 1998 01 $3000,000 $2400,000 $345,000
> 1998 02 $5000,000 $3400,000 $367,000
> In short, I want to have the dataset that summarizes how
> much money is distributed based on OBJECTS categories by
> each STATE; in other words, I have to make each OBJECTS
> category a variable and want to collapse the dataset by STATE.
>
> I dont have problem with appending the 20 datasets, but I
> think I need some foreach and local macro commands that
> perform the data manipulation work before appending the
> whole datasets.
If you do the manipulation before you -append- you have to do
it several times; if afterwards just once. It's possible, of
course, that memory is an issue, but let's be optimistic.
I assume data sets data1776-data1795 for years 1776-1795
with observations like
state county objects amounts
01 178 GG 100,000
01 166 SW 200,000
01 778 DL 50,000
03 336 GG 86,000
03 227 SW 33,000
To -append-, we read in the first and -append-
the rest one at a time. Also, -replace- the year values in a loop:
that beats reading in each data set, -generate-ing a variable
and then writing the set out again.
. use data1776
. gen year = 1776
. forval y = 1777/1795 {
. append using data`y'
. replace year = `y' if mi(`y')
. }
Now we are feeling ready to -collapse-:
. collapse (sum) amount, by(year objects state)
The -reshape- is fairly standard:
. reshape wide amount, i(year state) j(objects) string
The clean-up is (1) map missings to 0:
(2) to fix the variable names:
. mvencode amount*, mv(0)
. renpfix amount
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/