If you give examples, people may take you at your word!
If the basic idea is sound, that should not be fatal.
I tried different "data"
Name,Foo and Bar LLP,Meganational Monsters,Stata is Smart
Code,0000(MV),0001(MV),0002(MV)
Currency,GBP,GBP,USD
31/12/1986,1,2,3
30/01/1987,4,5,6
30/12/2005,7,8,9
and tweaked the code
insheet using soares.csv, clear
gen long id = _n
rename v1 foo
reshape long v , i(id)
gen Company = v if foo == "Name"
gen Code = v if foo == "Code"
gen Currency = v if foo == "Currency"
bysort _j (Company) : replace Company = Company[_N]
bysort _j (Code) : replace Code = Code[_N]
bysort _j (Currency) : replace Currency = Currency[_N]
drop if inlist(foo, "Name", "Code", "Currency")
rename v Value
gen Date = date(foo, "dmy")
format Date %dd/n/CY
drop foo id _j
sort Company Date
destring Value, replace
+------------------------------------------------------------------+
| Value Company Code Currency Date |
|------------------------------------------------------------------|
1. | 1 Foo and Bar LLP 0000(MV) GBP 31/12/1986 |
2. | 4 Foo and Bar LLP 0000(MV) GBP 30/1/1987 |
3. | 7 Foo and Bar LLP 0000(MV) GBP 30/12/2005 |
4. | 2 Meganational Monsters 0001(MV) GBP 31/12/1986 |
5. | 5 Meganational Monsters 0001(MV) GBP 30/1/1987 |
|------------------------------------------------------------------|
6. | 8 Meganational Monsters 0001(MV) GBP 30/12/2005 |
7. | 3 Stata is Smart 0002(MV) USD 31/12/1986 |
8. | 6 Stata is Smart 0002(MV) USD 30/1/1987 |
9. | 9 Stata is Smart 0002(MV) USD 30/12/2005 |
+------------------------------------------------------------------+
This corrects a small bug in my previous code. -destring Value- needed
the -, replace-.
On the face of it, there might be a shorter solution involving
two -reshape-s.
Nick
[email protected]
Nuno Soares
> Thanks for your insight. However, I've ran into a problem
> when running your code. The code runs fine for your example,
> but when we replace the company names by actual names, we
> have problems with spaces in the companies' names or names
> that have characters that aren't allowed as var names by
> Stata (ex. I have companies called "@UK", "Marks & Spencer",
> "Azure Holdings", etc.). These names can't be used your first
> rename procedure... That's a pitty because your code is simpler...
Nick Cox
> This solution seems a bit more complicated than necessary.
>
> I started with this soares.csv
>
> Name,Company1,Company2,Company3
> Code,0000(MV),0001(MV),0002(MV)
> Currency,GBP,GBP,USD
> 31/12/1986,1,2,3
> 30/01/1987,4,5,6
> 30/12/2005,7,8,9
>
> and then ran this
>
> insheet using soares.csv, clear
> foreach v of var * {
> rename `v' `=`v'[1]'
> }
> drop in 1
> gen long id = _n
> reshape long Company , i(id)
> gen Code = Company if Name == "Code"
> gen Currency = Company if Name == "Currency"
> bysort _j (Code) : replace Code = Code[_N]
> bysort _j (Currency) : replace Currency = Currency[_N]
> drop if inlist(Name, "Code", "Currency")
> drop id rename Company Value
> rename _j Company
> gen Date = date(Name, "dmy")
> format Date %dd/n/CY
> drop Name
> destring Value
> sort Company Date
>
> The results are
>
> . l
>
> +----------------------------------------------------+
> | Company Value Code Currency Date |
> |----------------------------------------------------|
> 1. | 1 1 0000(MV) GBP 31/12/1986 |
> 2. | 1 4 0000(MV) GBP 30/1/1987 |
> 3. | 1 7 0000(MV) GBP 30/12/2005 |
> 4. | 2 2 0001(MV) GBP 31/12/1986 |
> 5. | 2 5 0001(MV) GBP 30/1/1987 |
> |----------------------------------------------------|
> 6. | 2 8 0001(MV) GBP 30/12/2005 |
> 7. | 3 3 0002(MV) USD 31/12/1986 |
> 8. | 3 6 0002(MV) USD 30/1/1987 |
> 9. | 3 9 0002(MV) USD 30/12/2005 |
> +----------------------------------------------------+
>
> What are the key differences?
>
> 1. No need to resort to a user-written program -sxpose-, no
> matter how much you trust it.
>
> 2. Date manipulations are much simpler with -date()-.
> You _don't_ have to -destring- the date. Also dropping the
> "/" is dangerous!
Nuno Soares
> > Thanks a lot Radu! Just had to destring the date_ variables.
> > Here goes the code. Might be useful to other statalisters:
> >
> > ***************** BEGIN CODE ***********************
> >
> > clear
> > *reading in the data
> > insheet using "C:\mv1.csv", clear
> >
> > *transposing
> > sxpose, clear
> >
> > *renaming the first three variables
> > foreach v of varlist _var1 _var2 _var3 { local i = `v'[1]
> rename `v'
> > `i'
> > }
> >
> > *getting rid of "/" in the dates
> > *note that after renaming, only the date variables start with "_var"
> > destring _var*, replace ignore("/") force
> >
> > *now renaming the remaining variables
> > foreach v of varlist _var* {
> > local i = `v'[1]
> > rename `v' data`i'
> > }
> >
> > *dropping the first observation (because this is just the
> name of the
> > variable) drop in 1
> >
> > *reshaping
> > reshape long data, i(Name) j(date) string
> >
> >
> > *now you're left with the date in a string format, which you can
> > convert to a proper date, in a couple of steps
> >
> > gen str date_year = substr(date, -4,.) gen str date_month =
> > substr(date, 3, 2) gen str date_day = substr(date, 1, 2) destring
> > date_year date_month date_day, replace
> >
> > gen truedate = mdy(date_month, date_day, date_year) format
> truedate %d
> >
> > ***************** END CODE ***********************
>
> Radu Ban
>
> > Here's a try. For this you would need to install the
> -sxpose- command
> > (do a -findit sxpose- and then follow the link to install). This is
> > needed because the built-in -xpose- command doesn't work
> with strings.
> >
> > ******************************
> >
> > *reading in the data
> > insheet using "C:\somedir\myfile.csv", clear
> >
> > *transposing
> > sxpose, clear
> >
> > *renaming the first three variables
> > foreach v of varlist _var1 _var2 _var3 { local i = `v'[1]
> rename `v'
> > `i'
> > }
> >
> > *getting rid of "/" in the dates
> > *note that after renaming, only the date variables start with "_var"
> > destring _var*, replace ignore("/") force
> >
> > *now renaming the remaining variables
> > foreach v of varlist _var* {
> > local i = `v'[1]
> > rename `v' data`i'
> > }
> >
> > *dropping the first observation (because this is just the
> name of the
> > variable) drop in 1
> >
> > *reshaping
> > reshape long data, i(Name) j(date) string
> >
> >
> > *now you're left with the date in a string format, which you can
> > convert to a proper *date, in a couple of steps
> >
> > gen str date_year = substr(date, -4,.) gen str date_month =
> > substr(date, 3, 2) gen str date_day = substr(date, 1, 2)
> >
> > gen truedate = mdy(date_month, date_day, date_year)
> >
> > *now you can format the date using the preferred date format, for
> > example format truedate %d
> > ********************
> >
> > I hope this helps. Also if you want to run this for several .csv
> > files, check out the archives for last week, there's a
> thread on how
> > to loop over all files in a directory.
>
>
> Nuno Soares
>
> > > I need to import some csv files that have an awkward
> > format. The format is always the same:
> > >
> > > Name Company1 Company2 Company3 ...
> > > Code 0000(MV) 0001(MV) 0002(MV) ...
> > > Currency £ £ € ...
> > > 31/12/1986 data1 data2 data3 ...
> > > 30/01/1987 dataX dataY dataZ ...
> > > ...
> > > 30/12/2005 dataW dataJ dataK ...
> > >
> > > The number of companies is variable (ie, columns), and its
> > width is also variable. Data1, data2, etc. are numeric.
> > >
> > > Can you please tell me if it is possible to import to Stata
> > the data described, in the following format:
> > >
> > > Company Code Currency Date Data
> > > Company1 0000(MV) £ 31/12/1986 data1
> > > Company1 0000(MV) £ 30/01/1987 dataX
> > > ...
> > > Company1 0000(MV) £ 30/12/2005 dataW
> > > Company2 0001(MV) £ 31/12/1986 data2
> > > Company2 0001(MV) £ 30/01/1987 dataY
> > > ...
> > > Company2 0001(MV) £ 30/12/2005 dataJ
> > > Company3 0002(MV) £ 31/12/1986 data3
> > > Company3 0002(MV) £ 30/01/1987 dataZ
> > > ...
> > > Company3 0002(MV) £ 30/12/2005 dataK
> > > and so on.
> > >
> > > Some treatment on the original data can be done in Excel,
> > but I wanted to avoid it (I have almost one hundred files
> that I need
> > to import...). If any one has an idea of how to do it, I would be
> > greatly appreciated!
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/