Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: sxpose -not possible; would exceed present limit on number of variables
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: sxpose -not possible; would exceed present limit on number of variables
Date
Thu, 20 Feb 2014 00:34:50 +0000
In your sample data, blocks *3 *4 *5 seem to be the same information repeated.
With the sample data, this is code to play with
gen j = substr(word(v1, 1), -1, 1) if word(v1, 1) != "EntityID"
gen which = subinstr(v1, j, "", 1) if word(v1, 1) != "EntityID"
gen EntityID = v2 if word(v1, 1) == "EntityID"
replace EntityID = EntityID[_n-1] if missing(EntityID)
drop if word(v1,1) == "EntityID"
drop v1
reshape wide v2, i(EntityID j) j(which) string
renpfix v2
expand endyr - begyr + 1
rename begyr year
bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1
drop endyr j
l
Nick
[email protected]
On 19 February 2014 21:06, R Zhang <[email protected]> wrote:
> Hi Statalisters,
>
>
> My data has 13,458 observation and 21 variables.
> EntityID corpid1 begyr1 gvkey1 endyr1 corpid2 begyr2 gvkey2 endyr2
> corpid3 begyr3 gvkey3 endyr3 corpid4 begyr4 gvkey4 endyr4 corpid5
> begyr5 gvkey5 endyr5
> 100091 8101 1961 1000 1970 8091 1971 1000 1973 8011 1974 1001 2000
> 8012 2000 1001 2002 8012 2003 1001 2005
>
>
> for each unique EntityID, the corresponding gvkey and corpid could
> vary over time as indicated by begyr and endyr,
>
> what I want is a dataset that give me the gvkey and corpid for each
> time period, so I can match it to another dataset that has company
> specific financial data , the match variable will be gvkey, year.
>
> as of now, i thought I should reshape the data, Someone on the forum
> kindly offered me the following program to reshape my data. sample
> code (see below) works for his hypothetical data, but when i ran with
> my data (13,458 observation and 21 variables.). I got an error "not
> possible; would exceed present limit on number of variables", could
> you shed light on this?
>
> *****************
> input str20 v1 v2
> EntityID 100091
> corpid1 8101
> begyr1 1961
> gvkey1 1000
> endyr1 1970
> corpid2 8091
> begyr2 1971
> gvkey2 1000
> endyr2 1973
> corpid3 8011
> begyr3 1974
> gvkey3 1001
> endyr3 2000
> corpid4 8011
> begyr4 1974
> gvkey4 1001
> endyr4 2000
> corpid5 8011
> begyr5 1974
> gvkey5 1001
> endyr5 2000
> end
>
> compress
> sxpose, clear firstnames force
> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd)
> ***********************
>
> what I ultimately want is :
> EntityID corpid year gvkey
> 100091 8101 1961 1000
> 100091 8101 1962 1000
> 100091 8101 1963 1000
> 100091 8101 1964 1000
> 100091 8101 1965 1000
> 100091 8101 1966 1000
> ...
> 100091 8091 1971 1000
> 100091 8091 1972 1000
> 100091 8091 1973 1000
> 100091 8091 1974 1000
>
> p.s if you think there is a better way , please also share.
>
> thanks!!!
>
> -R
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/