. gen long code = real(rest) if first=="NUMB"
. assert code!=. if first=="NUMB" // check assumption
. replace code = code[_n-1] if code==. // carry down
Now look at the data. I assume that code is filled in right from the
first observation. Assuming that,
. sort code recnum
We have our code variable.
Step 6: Get the rest
---------------------
I suggest we code
. replace first = strlower(first)
With that, we may be done. Do we have the data in long form? If so,
we can now set about converting it wide, and then changing the numeric
variables from string to numeric. If not, we have more to do, so we do it.
Final comment
-------------
Note how I proceeded: I just work interactively to solve little problems.
I don't know what the ultimate solution is, but I do know how to get
closer, and I keep doing that until I'm done.
Working interactively, however, is dangerous. It is too easy to make a
mistake and not detect it. So what I do is start a do-file. It started
like this:
------------------------------------------------- input.do ---
clear
infix str line 1-80 using <filename>
compress
------------------------------------------------- input.do ---
I ran that, then I looked around, tried a few things, and added to my
do-file:
------------------------------------------------- input.do ---
clear
infix str line 1-80 using <filename>
compress
assert strlen(line)<80 <- notice this line
gen long recnum = _n
gen blank = strpos(line, " ")
gen str first = strtrim(substr(line, 1, blank)) if blank
replace first = line if blank==0
gen str rest = strtrim(substr(line, blank, .)) if blank
------------------------------------------------- input.do ---
Then I rerun the do-file, and repeat the process. Thus, I build a do-file
as I go.
Note the line I flagged.
assert strlen(line)<80 <- notice this line
After each group of lines, I add -assert-s verifying what I found and
the assumptions I am making. This way, I can rerun the do-file later
on an updated version of the dataset and, if it completes, be certain
my original assumptions are still true and thus reasonably certain that
I just created correctly an updated Stata dataset.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/