Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: reshape long is changing j values
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: reshape long is changing j values
Date
Fri, 27 Jan 2012 18:58:32 +0000
What is biting you here is that your date variable contains very large integers which are being put into a -float- variable. That doesn't have enough precision to hold every distinct value, so your dates are being mangled.
We can see this directly
. gen foo = 20080111
. di foo[1]
20080112
The default default [repetition intended] type for a new numeric variable is -float-, but for values of the order of 20 million only even integers can be held exactly; odd integers are approximated, as you have observed.
One way to fix this is
reshape long maxtempF, i(store_city_id threshold) j(date) string
destring date, replace
Notes:
1. The above insists on mapping the dates to a string variable, after which -destring- is smart enough to see that the numeric information inside doesn't get mangled.
2. You don't need to specify values in the -j()- option.
Nick
[email protected]
Tiffany Shih
I am trying to reshape wide weather data to long format and while the reshape command completes, the resulting long form data are incorrect.
In wide form, my variables are
. de
Contains data from tmpminmaxtempF.dta
obs: 10
vars: 14 27 Jan 2012 09:16
size: 240 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
maxtem~20080101 byte %10.0g 20080101 maxtempF
maxtem~20080102 byte %10.0g 20080102 maxtempF
maxtem~20080103 byte %10.0g 20080103 maxtempF
maxtem~20080104 byte %10.0g 20080104 maxtempF
maxtem~20080105 byte %10.0g 20080105 maxtempF
maxtem~20080106 byte %10.0g 20080106 maxtempF
maxtem~20080107 byte %10.0g 20080107 maxtempF
maxtem~20080108 byte %10.0g 20080108 maxtempF
maxtem~20080109 byte %10.0g 20080109 maxtempF
maxtem~20080110 byte %10.0g 20080110 maxtempF
maxtem~20080111 byte %10.0g 20080111 maxtempF
maxtem~20080112 byte %10.0g 20080112 maxtempF
store_city_id float %9.0g
threshold float %9.0g
-------------------------------------------------------------------------------
The command I am using is:
"reshape long maxtempF, i(store_city_id threshold) j(date 20080101 20080102 20080103 20080104 20080105 20080106 20080107 20080108 20080109 20080110 20080111 20080112)"
The resulting long form data has the correct variable names, but it is missing all the odd values in "date" and seems to have doubled them up into the even values of date. For example, it introduces a new value of date, 20080100, and there is no 20080101. It seems to be substituting in the value of date from maxtempF20080101 into the long form data for 20080100. In addition, there is no value "date" for 20080103, 20080105, 20080107, etc., and instead there are two entries for each of 20080102, 20080104,... for each store_city_id with values in maxtempF that should be in the corresponding odd numbers.
If I repeat the same command but leave out the j values, the reshape only reshapes the even values and treats the odd values like i variables. Same problem if I turn store_city_id and threshold into one variable.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/