Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: complete data import
From
"Sarah Edgington" <[email protected]>
To
<[email protected]>
Subject
RE: st: complete data import
Date
Wed, 9 Apr 2014 13:33:08 -0700
Doug,
See below for example code that I think does what you want. One assumption
I'm making here is that no record has more than 1 mem or tnd line. As long
as that's true this should work.
For this example I'm assuming that all the original variables ended up
importing as strings, but it should work find even if that isn't true. I
just didn't feel like figuring out which columns were pure numeric and which
ended up being a mix of numeric and strings.
There are probably many other ways to do this. This is just what occurred
to me first.
-Sarah
clear
input str15 v1 str15 v2 str15 v3 str15 v4 str15 v5 str15 v6 str15 v7 str15
v8
ITM S 0 1 1.35 5100001261 soup
TND N 0 0 0 1.35 0 0
ITM S 0 1 1.35 5100001261 soup
TND N 0 0 0 1.35 0 0
MEM N 280211400001
ITM S 0 1 1.35 5100001261 soup
TND N 0 0 0 0 11.28 0
ITM S 0 1 1.2`i' 5000001011 milk
TND N 0 0 0 1.2`i' 0 0
MEM N 28021140015`i'
ITM S 0 1 1.2`i' 5000001011 milk
ITM S 0 1 1.2`i' 5000001011 milk
ITM S 0 1 1.2`i' 5000001011 milk
TND N 4 0 0 0 0 0
end
/*create a line number to be able to retain original ordering */
gen orgline=_n
/*now create an identifier within the tnd line */
/*sort on v1 and within that orgline, assign id based on _n for tnd records
only */
bysort v1 (orgline) : gen id=_n if v1=="TND"
/*now back to original ordering */
sort orgline
/* since tnd is always the end of the record, flip the order of the original
data so that the last line is first */
/* this will make it easier to populate id to the next line until a new
record is encountered */
/* now new record will start with tnd, so non-missing id is sign of new
record. Only fill forward if missing id */
gsort -orgline
replace id=id[_n-1] if missing(id)
/*now create variables from the mem and tnd lines so that these can be
distributed to all itm lines within that id */
/*first make a loop to create v9-v15 out of tnd vars */
forv i=2/7 {
local j=7+`i'
gen v`j'=v`i' if v1=="TND"
replace v`i'="" if v1=="TND"
}
/* use same logic to create v15 & v16 out of mem vars */
forv i=2/3 {
local j=13+`i'
gen v`j'=v`i' if v1=="MEM"
replace v`i'="" if v1=="MEM"
}
/* now we can sort on the id var and use similar strategy as with id to
distribute to all lines values of v8-v15 */
/* create flags for the tnd and mem lines so this can be used in a sort */
mark tnd if v1=="TND"
mark mem if v1=="MEM"
gsort id -tnd
forv i=9/14 {
by id: replace v`i'=v`i'[_n-1] if missing(v`i')
}
gsort id -mem
forv i=15/16 {
by id: replace v`i'=v`i'[_n-1] if missing(v`i')
}
drop if inlist(v1,"TND","MEM")
/*now look at the example data to see if it looks as expected*/
/*may need to use -order- command to get variables into desired final order
*/
sort orgline
list
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/