Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Datamanagement: warning when using infile with optional if
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Datamanagement: warning when using infile with optional if
Date
Tue, 28 Feb 2012 16:08:08 +0000
It's built-in to Stata that -if- tests every (potential) observation.
How else is Stata to know -- at least in this problem -- that your
test is satisfied? More to your point, adding extra code to ensure
bail-out once a line is known to be invalid would slow -infile- down
more frequently than it speeds it up: at least that's my guess.
-quietly- suppresses the little messages.
There are many ways to work with this kind of file, including deleting
lines from a copy that don't match a regular expression using any
decent text editor or scripting language before you enter Stata.
Nick
On Tue, Feb 28, 2012 at 3:41 PM, <[email protected]> wrote:
> I am reading ASCII data with a dictionary using the command -infile-
> whilst conditioning on an variable (using -if-) that is read in the same
> time. I created a simplified example to show you what is happening:
>
> The data looks like:
> -----------------data.txt--------------
> 1Ajohn1
> 1B8724
> 2Ajane0
> 2B8625
> 3Amark1
> -----------------------------------------
>
> With dictionary file
> -----------------dctB.dct--------------
> dictionary using data.txt {
> _column(1) int id %1f "Identifier"
> _column(2) str1 cat %1s "Category"
> _column(3) int dob %2f "Date of Birth"
> _column(5) int age %2f "Age"
> }
> -----------------------------------------
>
> My aim is to read only those lines where the variable cat is equal to B.
> I do this by making use of the command
> infile using dctB if cat=="B"
>
> I do end up with the required result. Stata does a great job at
> conditioning on a variable that it is reading at the same time, however,
> it returns an error for every line where cat=="A" as it contains non
> numeric characters, where Stata expects only integers. Not only does
> this produce messy .log files (especially with thousands of lines), it
> indicates that Stata has to read every line completely which is time
> consuming and somewhat unnecessary.
>
> Does anyone have a suggestion to improve on my current method?
> Preferably one that produces readable .log files?
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/