That's an instructive example.
As I understand it, -insheet- peeks at the early bit of the file, makes
a guess at the number and type of variables, and assigns accordingly.
Whether guessing will also reliably give a workable answer with Joseph
Wagner's files, I can't say.
Nick
[email protected]
Friedrich Huebler
Assume we have a file "test.txt" that contains the following text
(without the Start and End lines). We are only interested in the
numbers.
=== Start of file ===
I am not clear how that this will help, as the header text and
the remainder of the file will give -insheet- quite different
ideas about what variables there are.
mpg trunk turn
22 11 40
17 11 40
22 12 35
20 16 40
=== End of file ===
Let's import the data with -insheet-.
. insheet using test.txt, nonames delimiter(" ")
(14 vars, 8 obs)
. drop if _n < 5
(4 observations deleted)
. drop v4 - v14
. list
+--------------+
| v1 v2 v3 |
|--------------|
1. | 22 11 40 |
2. | 17 11 40 |
3. | 22 12 35 |
4. | 20 16 40 |
+--------------+
Friedrich
On Wed, Feb 20, 2008 at 6:35 AM, Nick Cox <[email protected]> wrote:
> I am not clear how that this will help, as the header text and the
> remainder of the file will give -insheet- quite different ideas about
> what variables there are.
>
>
> Nick
> [email protected]
>
> Friedrich Huebler
>
>
> You wrote that -insheet- with subsequent deletion of unwanted data is
> "sloppy". That approach might still be the easiest if all files have
> the same structure and your data always appear in the same columns.
>
> . insheet using filename, nonames
> . drop if _n < 30 | _n > 129
> . drop v1 - v20 v25 - v30
>
>
>
> On Feb 18, 2008 9:26 AM, Joseph Wagner <[email protected]>
wrote:
> > I have data I wish to input a portion of into STATA. Data is
> collected
> > on patients by a machine that measures their gait as they walk. A
> text
> > file is output for each patient with columns representing variables
> > (each about 130 lines long) but the multiple observation data
doesn't
> > start until line 29. The first 28 lines are taken up with short
lines
> > of data describing the patient. Unfortunately, I also need a
couple
> of
> > those lines in 'header' area. The 29th line has the variables
names
> but
> > they do not line up directly with the columns of data so I figured
I
> > could just label the data later. The data I need starts 30 lines
down
> > at column 115 and includes the next 4 columns and goes down 100
lines.
> >
> > I realize there are easier ways to do this but I have data on about
> 300
> > patients (and so one file for each person) and wanted to automate
this
> > input (followed by successive merging of files to get my final
> dataset).
> >
> > I wanted to use the -infix- command but have never used this
command
> > before and my attempts so far have failed. I also tried using
> -infile-
> > with the _first(30) option and the _line(30) option but those
didn't
> > seem to work either.
> >
> > Here is a dictionary I attempted with just one of the variables:
> >
> > dictionary using "c:\data\gait\SBS00001_20050607_1.nrm" {
> > _line(30)
> > _column(115) r_grf_vrt_frc %5f
> > }
> >
> > infile using SBS00001_20050607_1.dct
> >
> > unexpected end of file
> > (5 observations read)
> >
> > The other problem is that it didn't seem to pull the data
> corresponding
> > to that column. I thought perhaps there was a problem with the
data
> not
> > being in a fixed format but if I try -insheet- all the data imports
> and
> > the correct data lines up in the individual columns. Of course I
> could
> > write some programming whereby I delete the unneeded variables and
> line
> > but that's kind of sloppy.
> >
> >
> >
> > I am using STATA ver. 8.2
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/