Radu Ban
>
> I have a flat (ASCII) dataset of 130 columns and roughly
> 250,000 lines. In
> theory all the observations should be numeric, but just by
> visual inspection
> I can tell that's not the case. So now I want to count the number of
> non-numeric observations by column.
> I'm trying to use -infix-, i.e.
> infix var1 1 var2 2 .... var130 130 using ../rawdata/raw.txt
>
> Is there a quick way to put each column into a variable
> (other than typing
> all the indivdual variable names and column numbers), and
> reading in the
> dataset only once?
> I know I can do sth like
>
> forvalues i=1/130 {
> infix var`i' `i' using ../rawdata/raw.txt, clear
> sum var`i' *to see how many numeric obs i have
> }
>
> but this would mean having to read in a sizeable dataset
> 130 times which
> would take a long time.
If you have Stata/SE you can -infix-
your data as a single str130 variable.
If you don't, you can -infix- them
as a str80 and a str50 variable.
Then within Stata, you can do something
like this
forval i = 1/130 {
gen str1 s`i' = substr(data,`i',1)
qui gen n`i' = real(s`i')
}
or
forval i = 1/80 {
gen str1 s`i' = substr(data1,`i',1)
qui gen n`i' = real(s`i')
}
forval i = 81/130
gen str1 s`i' = substr(data2,`i'-80,1)
qui gen n`i' = real(s`i')
}
Then -summarize- on the numeric variables
will show you how many missings you have.
Or there are many other ways of getting
at that, e.g. -nmissing- from STB-60.
And you can look at the string
variables to see why the numerics have
missings.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/