infile dictionary options
Title
|
|
Infile dictionary options
|
Author
|
James Hardin, StataCorp
|
Date
|
January 1996
|
If you are reading a dataset with a dictionary, then Stata is reading that
data in
record mode. This means that Stata has the concept of a row
of data coming in from the raw data file and somehow being split up into the
variables. Using the dictionary, you have complete control over how that
information is assigned to your variables in Stata. With this power, you
need to learn how to use it to best accomplish your goals.
In a dictionary, there is one line for each variable that
you will be reading in. On each line there are the following directives
- An optional column(#) directive stating where to begin reading.
This tells Stata to move to the specified column in order to read in the data
associated with this particular variable. By default, Stata will just
move to the next column.
- An optional skip(#) directive stating how far over to skip from
the end of the last field that was processed.
This tells Stata how many columns to skip before reading in the data associated
with this particular variable. By default, Stata will just skip over one
more column from the previous variable.
- An optional data type for how to store the variable.
Do not get this directive confused with the read format specifier. This
directive only affects how the data is stored and not how it is
read. Just because you specify that a variable is type str5
does not mean that Stata will read in 5 columns of data for this variable.
In order to control how many columns are read for a particular variable,
you must specify a read format. If you do not specify a type for the
variable, it will default to float.
- A required variable name.
You must specify a variable name for each of the fields that you will
read from the raw data records. There is no shortcut for this.
- An optional label name to specify a value label for the
variable.
You can use this if you want to associate a value label name that you will
apply to a numeric variable. (Value labels allow numeric variables to
contain "strings", the strings being numerically encoded.)
This is rarely used (the data is typically read into string variables)
and may only be useful when
you will be applying a great number of value labels to your variables, or
when you are defining the value labels and the infile steps in a do-file.
- An optional read format to read the field for this variables values.
Many people assume that once the type is specified, this does not
have to be also specified. On the contrary, the read format is even
more important in many cases as this is what tells Stata how many columns
should be read for a particular variable.
- An optional variable label to apply to this variable.
This is the descriptive label associated with a variable that is printed
out to the right of the variable in the describe command. You do not
have to specify this variable though for large datasets, it can be helpful.
There are times that you need to specify only a few of these, and there are
other times that you may need to specify many of these directives.
The above tools allow you to control for each variable
- how it is read
- where it is read from
- how it is stored
- what the variable is called
- what the variable is labeled
- how the values are labeled
This is usually enough for almost all datasets that you encounter. However,
there are other datasets that have additional complexities to how they are
organized in the file. To address those additional complexities, you may
specify these other directives to further control the overall behavior of
Stata as it processes the data file
- lrecl(#) will allow you to specify the logical record length
for the file. In most files, there are line breaks from one record to the
next. In other files, there are no line lengths, but each line is a certain
number of characters long. In order to specify this length, you use this
directive.
- newline(#) will allow you to specify that the next field described
begins on # lines down in the file. Some datasets are organized in
such a way that each record extends across multiple lines in the file.
- comments are allowed in the dictionary file and give you the
opportunity to add notes to the dictionary files that you create for reading
in your raw data files. This is the most overlooked optional directives
available for the dictionary file. However, you should use it as it will
allow you to return to old dictionaries and remind yourself of how you solved
problematic reads.