Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: RE: st: importing "irregular" columns with embedded blanks


From   "Dr. Buzz Burhans" <[email protected]>
To   <[email protected]>
Subject   RE: RE: st: importing "irregular" columns with embedded blanks
Date   Wed, 24 Jan 2007 11:10:05 -0500

Dear Scott,

I tremendously appreciate the time you took to consider my problem.  While
it did not give me what I needed, I carefully followed/studied it line by
line, and it has taught me some useful approaches.  Particularly, your use
of -split-,  -reverse-, and the -forvalues- loops were helpful.  

When I input the table as you did, your code works for me as it did for you.
Interestingly, if I input your code --and before I process it I -outsheet-
it--, then examine it in a text editor, the reason might be revealed: my
native file begins each line with a space, there is no " character to demark
the beginning of the line (though there are end of line codes on every
line). If I  -outsheet- the data run from your input table, each observation
came in demarked by "  ".  I suspect this lack of demarcation creates the
processing problem, without the beginning of the string demarked, Stata
appears to look for the first character and count from there...thus, in my
file and yours the var1 variable become different.  Stata seems to count ok
from the first character, but if there are embedded blanks before the first
character and the string observation is not started by a " as in my native
file, Stata starts counting from the first character instead of from the
first character space - thus, where your code for var1 results in a space
("") for observation 2, using my native file delivers the first 3 characters
it sees, which are "123".  Similarly, parsing of the remaining string
portion of the observation is incorrect because of different starting points
for the counting.

In any case, I will use a combination of your splitting and reversing
approach, and the approach I had come up with late yesterday which was
somewhat similar but used -g varX = word(v1,-1) - to get at some of what you
did with -split- and -reverse()-.


The tables pasted in below demonstrate what happens using your code on my
native file; I get the following which I have listed in two parts for the
sake of trying to keep it together in the email:

Following your last several lines:

order var1 R6 v1 price R4 R3 R2 R1
}
l, clean 
I get the following list from the real data (I have split it in 2 sections) 

     var1                  R6                            v1  
1.    ANY                1206       Bunk1 Corn Silage $0.00  
2.    123                   4         Shed1 Grass Hay $0.00  
3.    158                   2            Purch1 Straw $0.00  
4.    123                   7   Shor1 Haylage 1st cut $0.00  
5.    123                   8   Bunk4 Haylage 2nd cut $0.00  
6.    107                   0     CornGrainGrndFine $135.00  
7.    039   BakeryByProdBread                       $130.00  
8.    105                   2          BeetPulpPlCp $140.00  
9.    102                   2         EnergyBooster $1,200.  


    price          R4      R3   R2   R1   R11  
1.  97.00   FarmGrown   Forag   lb   No        
2. 146.00   Purchased   Forag   lb   No        
3. 164.00   Purchased   Forag   lb   No        
4. 149.00   FarmGrown   Forag   lb   No     7  
5. 150.00   FarmGrown   Forag   lb   No     8  
6. 205.00   Purchased   Energ   lb   No        
7. 308.00   Purchased   Energ   lb   No        
8. 313.00   Purchased   Energ   lb   No        
9. 521.00   Purchased   Energ   lb   No  


Again, I am most grateful for the time you took, and also to Svend who
responded earlier today.  I should be able to use the alternative hybrid
approach I described above to get the data into a usable dataset.  Thanks
again.    
                                             

Buzz Burhans, Ph.D.

Dairy-Tech Group
Phone: 802-755-6842
Cell: 802-388-7214 

Email: [email protected]

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
[email protected]
Sent: Wednesday, January 24, 2007 8:16 AM
To: [email protected]
Subject: Re: RE: st: importing "irregular" columns with embedded blanks

It is not clear to me what the final data set should look like, but 
perhaps this can help:

clear
input str68 v1
 "ANY 1206 Bunk1 Corn Silage      $0.00     97    FarmGrown Forag   lb"
 "    1234 Shed1 Grass Hay        $0.00    146    Purchased Forag   lb"
 "    1582 Purch1 Straw           $0.00    164    Purchased Forag   lb"
 "    1237 Shor1 Haylage 1st cut  $0.00    149    FarmGrown Forag   lb"
 "    1238 Bunk4 Haylage 2nd cut  $0.00    150    FarmGrown Forag   lb"
 "    1070 CornGrainGrndFine    $135.00    205    Purchased Energ   lb"
 "     039 BakeryByProdBread    $130.00    308    Purchased Energ   lb"
 "    1052 BeetPulpPlCp         $140.00    313    Purchased Energ   lb"
 "    1022 EnergyBooster        $1,200.    521    Purchased Energ   lb"
 "                                                                  00"
end
qui {
gen var1 = substr(v1,1,3)
replace v1 = substr(v1,4,.)
split v1, limit(1) gen(foo)
gen v2 = reverse(v1)
split v2, gen(R)

foreach var of varlist R* {
	replace `var' = reverse(`var')
}
drop v2 R6-R10
rename foo R6

forv j = 1/6 {
	forv i = 1/10 {
		local a = v1[`i']
		local r`j' = R`j'[`i']
		local b : list a - r`j'
		replace v1 = "`b'" in `i'

	}
}

replace R5 = subinstr(R5,"$","",.)
replace R5 = subinstr(R5,",","",.)
gen price = real(R5)
format price %8.2f
drop R5
compress
order var1 R6 v1 price R4 R3 R2 R1
}
l, clean noobs


Which produces:


var1     R6                      v1     price    R4          R3      
R2   R1  
 ANY   1206       Bunk1 Corn Silage      0.00    97   FarmGrown   
Forag   lb  
       1234         Shed1 Grass Hay      0.00   146   Purchased   
Forag   lb  
       1582            Purch1 Straw      0.00   164   Purchased   
Forag   lb  
       1237   Shor1 Haylage 1st cut      0.00   149   FarmGrown   
Forag   lb  
       1238   Bunk4 Haylage 2nd cut      0.00   150   FarmGrown   
Forag   lb  
       1070       CornGrainGrndFine    135.00   205   Purchased   
Energ   lb  
        039       BakeryByProdBread    130.00   308   Purchased   
Energ   lb  
       1052            BeetPulpPlCp    140.00   313   Purchased   
Energ   lb  
       1022           EnergyBooster   1200.00   521   Purchased   
Energ   lb  
         
00                                 .                             00  



Scott

----- Original Message -----
From: "Dr. Buzz Burhans" <[email protected]>
Date: Wednesday, January 24, 2007 6:08 am
Subject: RE: st: importing "irregular" columns with embedded blanks
To: [email protected]

> Dear Svend,
> 
> Thanks very much for the help.  
> 
> The problem is that while the file "looks" like a fixed format 
> file, it is
> not fixed format...what look like demarked columns (by tabs or 
> whatever) are
> actually empty blanks.  Thus, running infix results in imported 
> variablesthat are as below. 

<snip>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index