|  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Data Corruption?
Hello all,
I have searched within the archives and cannot find an answer to my 
problem. Please correct me if I am missing anything obvious.
I have strange results when calculating (basic) stats for a variable. I 
assume there is some sort of data corruption, but I have never seen this 
before, so any pointers would be helpful.
Here is a listing of my data (I have added an if condition to keep the 
example simple).
. li oil_level gdp if id==211
     +-------------------------------+
     | oil_le~l                  gdp |
     |-------------------------------|
1059. |    548.9   15492.840168784467 |
1060. |    575.7   16248.206511636326 |
1061. |    595.8   16370.615114025704 |
1062. |    635.5    17072.56241927471 |
1063. |    667.8    17501.07679210328 |
     |-------------------------------|
1064. |    694.6   17321.478178718633 |
1065. |    719.3    17792.90476504183 |
1066. |    775.8   18647.050437099166 |
1067. |      818   19551.838349719856 |
1068. |    782.6   19207.297273658987 |
     |-------------------------------|
1069. |    765.9   18931.991301627946 |
1070. |    822.4    19861.88173008627 |
1071. |    865.9   20652.321708883446 |
1072. |    888.8   21615.181061740117 |
1073. |      868    22041.69364428876 |
     |-------------------------------|
1074. |    794.1   21606.154441767136 |
1075. |      746   21955.530458150668 |
1076. |    705.5   21313.547357641153 |
1077. |    704.9   22154.364138248093 |
1078. |    723.3   23671.961781615486 |
     |-------------------------------|
1079. |    720.2   24387.448078662386 |
1080. |    749.3    24951.98356186933 |
1081. |    764.8   25520.697093487765 |
1082. |    796.7   26275.362569684687 |
1083. |    795.3   26927.174510472792 |
     |-------------------------------|
1084. |    781.8   27096.979921878978 |
1085. |    765.6    26688.35205330767 |
1086. |    782.2   27342.667747615116 |
1087. |    789.3   27871.529919979886 |
1088. |    809.8   28802.880024505936 |
     |-------------------------------|
1089. |    807.7   29248.767207023822 |
1090. |    836.5   30097.683751824916 |
1091. |      848   31237.956233486548 |
1092. |    863.8   32297.523769177868 |
1093. |    888.9   33443.544095306104 |
     |-------------------------------|
1094. |    897.6   34364.500620614825 |
1095. |    896.1    34162.90121322729 |
1096. |    897.4    34286.24328012388 |
1097. |    912.3    34875.37198079319 |
1098. |    948.7    36098.15411932452 |
     +-------------------------------+
The listing above is correct and matches the raw data.
Here is a summ of the two series:
. summ oil_level gdp if id==211
   Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  oil_level |        40      781.27    93.45438      548.9      948.7
        gdp |        40     3019.45    771.8664       1673       4237
Note the gdp is obviously wrong. Any ideas?
FYI the data are stored as:
. desc oil_level gdp
             storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
oil_level       float  %9.0g                  Oil Consumption (Million 
Tonnes)
gdp             float  %18.0g      gdp   
Stata versions: 9.2, both Windows and Linux experience the same problem.
Thanks,
Ed
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/