Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Stata crashes when loading a dataset
From
Alan Riley <[email protected]>
To
[email protected]
Subject
Re: st: Stata crashes when loading a dataset
Date
Wed, 25 May 2011 14:50:24 -0500
Dan Blanchette experienced a crash when he tried to use a dataset
he obtained from the Internet:
> I fell upon an odd situation where Stata 11 crashed when I tried to load
> a dataset that I downloaded from the internet (from a site in a foreign
> country) when I used the -use- command like so:
>
> . use "C:\data\foreign_data.dta"
>
> The person supplying the dataset reported that the dataset loaded fine
> for him on his computer. In the process of trying to figure out a way
> to get Stata to load the dataset without crashing, I stumbled on an odd
> solution. All I had to do was specify a varlist like so:
>
> . use * using "C:\data\foreign_data.dta"
>
> and Stata loaded the whole dataset just fine. I discovered that the
> dataset contained almost all numeric variables. The one string variable
> had no foreign characters. The dataset nor variables had any notes. Two
> of the numeric variables had two value labels that had 1 foreign character
> in them. I believe that is what caused Stata to crash when not specifying
> a variable list.
>
> Would you not expect these two commands to be identical?
>
> . use "C:\data\foreign_data.dta"
> . use * using "C:\data\foreign_data.dta"
Dan surmised that foreign characters in some of the value labels could
have caused the crash. I do not believe this is the case. Stata has
no problem with extended ASCII characters in string variables, variable
labels, or value labels.
I believe that the dataset Dan obtained is somehow corrupt, and
this is what is causing the crash. When a dataset is corrupt, it
can cause part of Stata's memory to have a 'hole' poked in it, and
that hole can lead to a crash.
It is merely fortuitous that Stata did not crash when Dan tried
-use * using ...-. While to a human, -use- with a varlist which
happens to be the entire varlist looks the same as -use- without
a varlist, to Stata, these take two different paths through the
code. In the case of -use- with a varlist, even when that varlist
contains every variable, Stata retrieves the data for each
observation variable-by-variable rather than the entire observation
at once. With a corrupt dataset, a hole could still get poked
in memory in this code and it is merely fortuitous in Dan's
case that Stata did not also crash here.
The corruption could have come from the download process or
perhaps the .dta file Dan downloaded was exported by another
package with something out-of-spec about it, such as a variable
or value label with more characters in it than Stata allows.
--Alan
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/