This solution seems a bit more complicated than necessary.
I started with this soares.csv
Name,Company1,Company2,Company3
Code,0000(MV),0001(MV),0002(MV)
Currency,GBP,GBP,USD
31/12/1986,1,2,3
30/01/1987,4,5,6
30/12/2005,7,8,9
and then ran this
insheet using soares.csv, clear
foreach v of var * {
rename `v' `=`v'[1]'
}
drop in 1
gen long id = _n
reshape long Company , i(id)
gen Code = Company if Name == "Code"
gen Currency = Company if Name == "Currency"
bysort _j (Code) : replace Code = Code[_N]
bysort _j (Currency) : replace Currency = Currency[_N]
drop if inlist(Name, "Code", "Currency")
drop id
rename Company Value
rename _j Company
gen Date = date(Name, "dmy")
format Date %dd/n/CY
drop Name
destring Value
sort Company Date
The results are
. l
+----------------------------------------------------+
| Company Value Code Currency Date |
|----------------------------------------------------|
1. | 1 1 0000(MV) GBP 31/12/1986 |
2. | 1 4 0000(MV) GBP 30/1/1987 |
3. | 1 7 0000(MV) GBP 30/12/2005 |
4. | 2 2 0001(MV) GBP 31/12/1986 |
5. | 2 5 0001(MV) GBP 30/1/1987 |
|----------------------------------------------------|
6. | 2 8 0001(MV) GBP 30/12/2005 |
7. | 3 3 0002(MV) USD 31/12/1986 |
8. | 3 6 0002(MV) USD 30/1/1987 |
9. | 3 9 0002(MV) USD 30/12/2005 |
+----------------------------------------------------+
What are the key differences?
1. No need to resort to a user-written program -sxpose-,
no matter how much you trust it.
2. Date manipulations are much simpler with -date()-.
You _don't_ have to -destring- the date. Also dropping
the "/" is dangerous!
Nick
[email protected]
Nuno Soares
> Thanks a lot Radu! Just had to destring the date_ variables.
> Here goes the code. Might be useful to other statalisters:
>
> ***************** BEGIN CODE ***********************
>
> clear
> *reading in the data
> insheet using "C:\mv1.csv", clear
>
> *transposing
> sxpose, clear
>
> *renaming the first three variables
> foreach v of varlist _var1 _var2 _var3 {
> local i = `v'[1]
> rename `v' `i'
> }
>
> *getting rid of "/" in the dates
> *note that after renaming, only the date variables start with "_var"
> destring _var*, replace ignore("/") force
>
> *now renaming the remaining variables
> foreach v of varlist _var* {
> local i = `v'[1]
> rename `v' data`i'
> }
>
> *dropping the first observation (because this is just the
> name of the variable) drop in 1
>
> *reshaping
> reshape long data, i(Name) j(date) string
>
>
> *now you're left with the date in a string format, which you
> can convert to a proper date, in a couple of steps
>
> gen str date_year = substr(date, -4,.)
> gen str date_month = substr(date, 3, 2)
> gen str date_day = substr(date, 1, 2)
> destring date_year date_month date_day, replace
>
> gen truedate = mdy(date_month, date_day, date_year)
> format truedate %d
>
> ***************** END CODE ***********************
Radu Ban
> Here's a try. For this you would need to install the -sxpose-
> command (do a -findit sxpose- and then follow the link to
> install). This is needed because the built-in -xpose- command
> doesn't work with strings.
>
> ******************************
>
> *reading in the data
> insheet using "C:\somedir\myfile.csv", clear
>
> *transposing
> sxpose, clear
>
> *renaming the first three variables
> foreach v of varlist _var1 _var2 _var3 { local i = `v'[1]
> rename `v' `i'
> }
>
> *getting rid of "/" in the dates
> *note that after renaming, only the date variables start with "_var"
> destring _var*, replace ignore("/") force
>
> *now renaming the remaining variables
> foreach v of varlist _var* {
> local i = `v'[1]
> rename `v' data`i'
> }
>
> *dropping the first observation (because this is just the
> name of the variable) drop in 1
>
> *reshaping
> reshape long data, i(Name) j(date) string
>
>
> *now you're left with the date in a string format, which you
> can convert to a proper *date, in a couple of steps
>
> gen str date_year = substr(date, -4,.)
> gen str date_month = substr(date, 3, 2)
> gen str date_day = substr(date, 1, 2)
>
> gen truedate = mdy(date_month, date_day, date_year)
>
> *now you can format the date using the preferred date format,
> for example format truedate %d
> ********************
>
> I hope this helps. Also if you want to run this for several
> .csv files, check out the archives for last week, there's a
> thread on how to loop over all files in a directory.
Nuno Soares
> > I need to import some csv files that have an awkward
> format. The format is always the same:
> >
> > Name Company1 Company2 Company3 ...
> > Code 0000(MV) 0001(MV) 0002(MV) ...
> > Currency £ £ € ...
> > 31/12/1986 data1 data2 data3 ...
> > 30/01/1987 dataX dataY dataZ ...
> > ...
> > 30/12/2005 dataW dataJ dataK ...
> >
> > The number of companies is variable (ie, columns), and its
> width is also variable. Data1, data2, etc. are numeric.
> >
> > Can you please tell me if it is possible to import to Stata
> the data described, in the following format:
> >
> > Company Code Currency Date Data
> > Company1 0000(MV) £ 31/12/1986 data1
> > Company1 0000(MV) £ 30/01/1987 dataX
> > ...
> > Company1 0000(MV) £ 30/12/2005 dataW
> > Company2 0001(MV) £ 31/12/1986 data2
> > Company2 0001(MV) £ 30/01/1987 dataY
> > ...
> > Company2 0001(MV) £ 30/12/2005 dataJ
> > Company3 0002(MV) £ 31/12/1986 data3
> > Company3 0002(MV) £ 30/01/1987 dataZ
> > ...
> > Company3 0002(MV) £ 30/12/2005 dataK
> > and so on.
> >
> > Some treatment on the original data can be done in Excel,
> but I wanted to avoid it (I have almost one hundred files
> that I need to import...). If any one has an idea of how to
> do it, I would be greatly appreciated!
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/