Dear Scott,
I tremendously appreciate the time you took to consider my problem. While
it did not give me what I needed, I carefully followed/studied it line by
line, and it has taught me some useful approaches. Particularly, your use
of -split-, -reverse-, and the -forvalues- loops were helpful.
When I input the table as you did, your code works for me as it did for you.
Interestingly, if I input your code --and before I process it I -outsheet-
it--, then examine it in a text editor, the reason might be revealed: my
native file begins each line with a space, there is no " character to demark
the beginning of the line (though there are end of line codes on every
line). If I -outsheet- the data run from your input table, each observation
came in demarked by " ". I suspect this lack of demarcation creates the
processing problem, without the beginning of the string demarked, Stata
appears to look for the first character and count from there...thus, in my
file and yours the var1 variable become different. Stata seems to count ok
from the first character, but if there are embedded blanks before the first
character and the string observation is not started by a " as in my native
file, Stata starts counting from the first character instead of from the
first character space - thus, where your code for var1 results in a space
("") for observation 2, using my native file delivers the first 3 characters
it sees, which are "123". Similarly, parsing of the remaining string
portion of the observation is incorrect because of different starting points
for the counting.
In any case, I will use a combination of your splitting and reversing
approach, and the approach I had come up with late yesterday which was
somewhat similar but used -g varX = word(v1,-1) - to get at some of what you
did with -split- and -reverse()-.
The tables pasted in below demonstrate what happens using your code on my
native file; I get the following which I have listed in two parts for the
sake of trying to keep it together in the email:
Following your last several lines:
order var1 R6 v1 price R4 R3 R2 R1
}
l, clean
I get the following list from the real data (I have split it in 2 sections)
var1 R6 v1
1. ANY 1206 Bunk1 Corn Silage $0.00
2. 123 4 Shed1 Grass Hay $0.00
3. 158 2 Purch1 Straw $0.00
4. 123 7 Shor1 Haylage 1st cut $0.00
5. 123 8 Bunk4 Haylage 2nd cut $0.00
6. 107 0 CornGrainGrndFine $135.00
7. 039 BakeryByProdBread $130.00
8. 105 2 BeetPulpPlCp $140.00
9. 102 2 EnergyBooster $1,200.
price R4 R3 R2 R1 R11
1. 97.00 FarmGrown Forag lb No
2. 146.00 Purchased Forag lb No
3. 164.00 Purchased Forag lb No
4. 149.00 FarmGrown Forag lb No 7
5. 150.00 FarmGrown Forag lb No 8
6. 205.00 Purchased Energ lb No
7. 308.00 Purchased Energ lb No
8. 313.00 Purchased Energ lb No
9. 521.00 Purchased Energ lb No
Again, I am most grateful for the time you took, and also to Svend who
responded earlier today. I should be able to use the alternative hybrid
approach I described above to get the data into a usable dataset. Thanks
again.
Buzz Burhans, Ph.D.
Dairy-Tech Group
Phone: 802-755-6842
Cell: 802-388-7214
Email: [email protected]
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
[email protected]
Sent: Wednesday, January 24, 2007 8:16 AM
To: [email protected]
Subject: Re: RE: st: importing "irregular" columns with embedded blanks
It is not clear to me what the final data set should look like, but
perhaps this can help:
clear
input str68 v1
"ANY 1206 Bunk1 Corn Silage $0.00 97 FarmGrown Forag lb"
" 1234 Shed1 Grass Hay $0.00 146 Purchased Forag lb"
" 1582 Purch1 Straw $0.00 164 Purchased Forag lb"
" 1237 Shor1 Haylage 1st cut $0.00 149 FarmGrown Forag lb"
" 1238 Bunk4 Haylage 2nd cut $0.00 150 FarmGrown Forag lb"
" 1070 CornGrainGrndFine $135.00 205 Purchased Energ lb"
" 039 BakeryByProdBread $130.00 308 Purchased Energ lb"
" 1052 BeetPulpPlCp $140.00 313 Purchased Energ lb"
" 1022 EnergyBooster $1,200. 521 Purchased Energ lb"
" 00"
end
qui {
gen var1 = substr(v1,1,3)
replace v1 = substr(v1,4,.)
split v1, limit(1) gen(foo)
gen v2 = reverse(v1)
split v2, gen(R)
foreach var of varlist R* {
replace `var' = reverse(`var')
}
drop v2 R6-R10
rename foo R6
forv j = 1/6 {
forv i = 1/10 {
local a = v1[`i']
local r`j' = R`j'[`i']
local b : list a - r`j'
replace v1 = "`b'" in `i'
}
}
replace R5 = subinstr(R5,"$","",.)
replace R5 = subinstr(R5,",","",.)
gen price = real(R5)
format price %8.2f
drop R5
compress
order var1 R6 v1 price R4 R3 R2 R1
}
l, clean noobs
Which produces:
var1 R6 v1 price R4 R3
R2 R1
ANY 1206 Bunk1 Corn Silage 0.00 97 FarmGrown
Forag lb
1234 Shed1 Grass Hay 0.00 146 Purchased
Forag lb
1582 Purch1 Straw 0.00 164 Purchased
Forag lb
1237 Shor1 Haylage 1st cut 0.00 149 FarmGrown
Forag lb
1238 Bunk4 Haylage 2nd cut 0.00 150 FarmGrown
Forag lb
1070 CornGrainGrndFine 135.00 205 Purchased
Energ lb
039 BakeryByProdBread 130.00 308 Purchased
Energ lb
1052 BeetPulpPlCp 140.00 313 Purchased
Energ lb
1022 EnergyBooster 1200.00 521 Purchased
Energ lb
00 . 00
Scott
----- Original Message -----
From: "Dr. Buzz Burhans" <[email protected]>
Date: Wednesday, January 24, 2007 6:08 am
Subject: RE: st: importing "irregular" columns with embedded blanks
To: [email protected]
> Dear Svend,
>
> Thanks very much for the help.
>
> The problem is that while the file "looks" like a fixed format
> file, it is
> not fixed format...what look like demarked columns (by tabs or
> whatever) are
> actually empty blanks. Thus, running infix results in imported
> variablesthat are as below.
<snip>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/