Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: Re: Stata appears to be eating some string IDs when saving a file
From
"Dimitriy V. Masterov" <[email protected]>
To
Statalist <[email protected]>
Subject
Re: st: RE: Re: Stata appears to be eating some string IDs when saving a file
Date
Tue, 2 Apr 2013 17:16:19 -0700
David,
The original file was only 1 or 1.5G. The crazy thing was I wasn't
getting ANY errors in either Stata or Ubuntu when I was saving it.
I noticed the cause when I deleted some other stuff I was working on
and everything started working all of a sudden.
DVM
On Tue, Apr 2, 2013 at 5:06 PM, David Radwin <[email protected]> wrote:
> Just out of curiosity, approximately how large is the file? Gigabytes?
> Hundreds of gigabytes? (I realize that even a small file could be larger
> than a small server, but that seems unlikely these days.)
>
> I'm glad you identified the problem, and thank you for reporting back to the
> list for posterity.
>
> David
> --
> David Radwin
> Senior Research Associate
> MPR Associates, Inc.
> 2150 Shattuck Ave., Suite 800
> Berkeley, CA 94704
> Phone: 510-849-4942
> Fax: 510-849-0794
>
> www.mprinc.com
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Dimitriy V. Masterov
>> Sent: Tuesday, April 02, 2013 4:35 PM
>> To: Statalist
>> Subject: st: Re: Stata appears to be eating some string IDs when saving a
>> file
>>
>> STS has confirmed that I am not a crazy person, at least not in this
>> instance. This is a real bug.
>>
>> The problem is that Stata does not return an error when the file
>> system fills up. The developers are now aware of this and they would
>> like to have Stata detect this problem in the future and report the
>> error correctly. They also plan to add some more error checking to the
>> -use- command so that it catches files that have been corrupted.
>>
>> For now, the best way to detect these types of issue is to use the
>> -datasignature- command to verify that the data set was not
>> modified/corrupted when saved.
>>
>> DVM
>>
>> On Sun, Mar 31, 2013 at 10:32 PM, Dimitriy V. Masterov
>> <[email protected]> wrote:
>> > I believe I diagnosed the issue. This seems to happen when I am
>> > running low on space in my home directory on the server. When I freed
>> > up some space, the problem went away. I wish there was some sort of
>> > warning to alert users that this is happening. This has been a very
>> > frustrating and terrifying experience.
>> >
>> > DVM
>> >
>> > On Sat, Mar 30, 2013 at 2:25 PM, Dimitriy V. Masterov
>> > <[email protected]> wrote:
>> >> I am having a strange problem with Stata deleting the values for about
>> 80%
>> >> of my data when I save a file. It only does it for string variables,
>> >> and this only happens some of the time that I run this code.
>> >>
>> >> Here's the relevant part:
>> >>
>> >> . des ;
>> >>
>> >> Contains data
>> >> obs: 10,766,127
>> >> vars: 4
>> >> size: 387,580,572
>> >> ------------------------------
>> >> -----------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> -----------------------------------------------
>> >> storage display value
>> >> variable name type format label variable label
>> >> -----------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> ---
>> >> slr_id str10 %10s
>> >> byr_id str10 %10s
>> >> item_id str12 %12s
>> >> pt_m2m_cat float %21.0g pt_m2m_cat
>> >> -----------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> ---
>> >> Sorted by:
>> >> Note: dataset has changed since last saved
>> >>
>> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) &
>> >> !missing(pt_m2m_cat);
>> >>
>> >> . count;
>> >> 10766127
>> >>
>> >> . save "pt_m2m_cat.dta", replace;
>> >> file pt_m2m_cat.dta saved
>> >>
>> >> . use "pt_m2m_cat.dta", clear;
>> >>
>> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) &
>> >> !missing(pt_m2m_cat);
>> >> 3407873 contradictions in 10766127 observations
>> >> assertion is false
>> >> r(9);
>> >>
>> >>
>> >> My Stata MP is 12.1 (March 20, 2013), on an Ubuntu box. Any ideas how
>> >> to diagnose this?
>> >>
>> >> DVM
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/