Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Datamanagement: warning when using infile with optional if


From   <[email protected]>
To   <[email protected]>
Subject   RE: st: Datamanagement: warning when using infile with optional if
Date   Tue, 28 Feb 2012 16:17:23 -0000

Nick,

Thank you for your reply. Good to know that there is no obvious solution, and that you think pursuing the bail-out route it not worthwhile.
Editing the raw data file is not really an option for me, since I re-run the file in case cat=="A", more specifically I re-run it 20 times (many different categories).

I might use -qui- once I know that the code is working properly.

Thanks again,
Arne

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 28 February 2012 16:08
To: [email protected]
Subject: Re: st: Datamanagement: warning when using infile with optional if

It's built-in to Stata that -if- tests every (potential) observation.
How else is Stata to know -- at least in this problem -- that your test is satisfied? More to your point, adding extra code to ensure bail-out once a line is known to be invalid would slow -infile- down more frequently than it speeds it up: at least that's my guess.

-quietly- suppresses the little messages.

There are many ways to work with this kind of file, including deleting lines from a copy that don't match a regular expression using any decent text editor or scripting language before you enter Stata.

Nick

On Tue, Feb 28, 2012 at 3:41 PM,  <[email protected]> wrote:

> I am reading ASCII data with a dictionary using the command -infile- 
> whilst conditioning on an variable (using -if-) that is read in the 
> same time. I created a simplified example to show you what is happening:
>
> The data looks like:
> -----------------data.txt--------------
> 1Ajohn1
> 1B8724
> 2Ajane0
> 2B8625
> 3Amark1
> -----------------------------------------
>
> With dictionary file
> -----------------dctB.dct--------------
> dictionary using data.txt {
>  _column(1)     int     id      %1f     "Identifier"
>  _column(2)     str1    cat     %1s     "Category"
>  _column(3)     int     dob     %2f     "Date of Birth"
>  _column(5)     int     age     %2f     "Age"
>  }
> -----------------------------------------
>
> My aim is to read only those lines where the variable cat is equal to B.
> I do this by making use of the command infile using dctB if cat=="B"
>
> I do end up with the required result. Stata does a great job at 
> conditioning on a variable that it is reading at the same time, 
> however, it returns an error for every line where cat=="A" as it 
> contains non numeric characters, where Stata expects only integers. 
> Not only does this produce messy .log files (especially with thousands 
> of lines), it indicates that Stata has to read every line completely 
> which is time consuming and somewhat unnecessary.
>
> Does anyone have a suggestion to improve on my current method?
> Preferably one that produces readable .log files?
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index