|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Data Corruption?
FWIW, I have sometimes (once or twice) found this kind of
problem in datasets that were created by StatTranfer.
I solved it by having StatTransfer create text files
and then -infile-ing them.
hth,
Jeph
William Gould, StataCorp LP wrote:
Ed Blackburne <[email protected]> reports
[...] I have strange results when calculating (basic) stats for a variable.
I assume there is some sort of data corruption, but I have never seen this
before, so any pointers would be helpful.
Here is a listing of my data (I have added an if condition to keep the
example simple).
. li oil_level gdp if id==211
+-------------------------------+
| oil_le~l gdp |
|-------------------------------|
1059. | 548.9 15492.840168784467 |
1060. | 575.7 16248.206511636326 |
1061. | 595.8 16370.615114025704 |
1062. | 635.5 17072.56241927471 |
1063. | 667.8 17501.07679210328 |
|-------------------------------|
1064. | 694.6 17321.478178718633 |
<output omitted by me, not Ed >
1098. | 948.7 36098.15411932452 |
+-------------------------------+
and yet, Ed reports:
. summ oil_level gdp if id==211
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------
oil_level | 40 781.27 93.45438 548.9 948.7
gdp | 40 3019.45 771.8664 1673 4237
Svend Juul tried to reproduce the problem, couldn't, and said "Looks like
human error to me. Are you sure nothing happened to your data between
the -list- and the -summarize- command?"
Of course, my response is the same, but I also assume Ed is reasonably sure
that he typed -list- followed by -summarize-.
I suggest Ed go to our Technical Services by emailing [email protected].
Don't forget to include the serial number of the Stata in your email.
Right now, I'm at a loss to explain the problem, although I'm thinking
(in no particular order) broken/corrupted hardware, corruputed Stata,
or corrupted dataset. Given what little I know right now, none of the
above exactly fits what Ed is reporting.
I have one experiment I want Ed to perform so he can report results
to Technical Services.
Do the following:
. log using problem.log, replace
. <use dataset>
. list oil_level gdp if id==211
. summarize oil_level gdp if id==211
. list oil_level gdp if id==211
. log close
Do them exactly like that, in that order, with nothing in between.
My questions are (1) is -summarize- still mistaken and (2) does the
second listing match the first?
Obviously, if the problem vanishes, we are back at human error. Otherwise,
we will want the log and the dataset.
Ed also reports
Stata versions: 9.2, both Windows and Linux experience the same problem.
Does that mean he ran the experiment on two *DIFFERENT* computers, or
one computer booted different ways?
For Ed's information, we have seen corrupted Statas. They tend to crash.
We have seen broken computers. They tend to crash, too, although we have
seen one with a broken Floating Point Coprocessor that simply give the wrong
answers for the exp() function. We have seen computers with bad memory. The
data morphs, but importantly, it continues to morph as you use the computer.
We have seen corrupted datasets, but they are simply corrupted and all Stata's
routines agree as to the (corrupted) contents of the data.
Go to Technical Services. I suspect we at StataCorp and Ed are going to have
to going to need to exchange emails to figure out what the problem is.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/