Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: Problems with the reshape command
From
"POPE, REBECCA" <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE: Problems with the reshape command
Date
Wed, 19 Jan 2011 17:52:42 -0600
Since you mention that you are giving this data to someone else who might prefer the wide form, I'm going to expand on my original answer a bit. Nick's valid points about data structure aside, I can certainly understand having to supply data to another person in a format they like.
"line" is admittedly arbitrary, but my intent was to illustrate a dirty work-around if you absolutely had to convert to wide form. Ideally, if you were going to attempt to reshape your data, there would be some _meaningful_ variable to add to "port" (e.g. date of shipment - you're outside my industry, so this is a "best guess") that might enhance any analysis, whether done by you or someone else. Something like "line" would be a variable of last resort if there is no logical option in your data.
As noted in my original reply, it is possible to further consolidate the after reshaping the data. However, since I didn't know your ultimate objectives, I left that rather vague. I'm sorry if that caused any confusion. As Nick mentioned in a previous reply to this post, one powerful tool in Stata is the -collapse- command. If you wanted to get, for example, average price at each port for each item you could run -collapse- before or after -reshape- (or just not run reshape). If you need to maintain individual price observations, this won't be a good choice.
Note: I don't know how useful it will be for your real data set, but just in case you find it helpful, to replicate the last table in your original post, the code is:
. by port item, sort: generate line = _n
. reshape wide price, i(port line) j(item)
and optionally
. list port price*, noobs
Regards,
Rebecca
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Syed Basher
Sent: Wednesday, January 19, 2011 3:52 PM
To: [email protected]
Subject: Re: st: RE: Problems with the reshape command
In my case, long structure is equally (or better) informative as wide structure.
In fact, with the wide structure I get numerous empty cells which is visually
uncomfortable. I guess I will leave the choice between long and wide structure
to the end-user in my office by supplying them both structures. I appreciate
Nick's meddling on this matter.
Syed
----- Original Message ----
From: Nick Cox <[email protected]>
To: [email protected]
Sent: Wed, January 19, 2011 10:19:58 PM
Subject: Re: st: RE: Problems with the reshape command
Rebecca is clearly right in the sense that if you create a
sufficiently fine identifier, -reshape- will oblige. But what is
useful about the data structure created? It rules out as many analyses
as it allows because -line- is arbitrary and separates things you
might want to compare.
But even if this is what Syed is asking for, there is a deeper
question: With this structure, why -reshape- at all? Almost all
analysis questions are easier to answer with a long structure.
Nick
On Wed, Jan 19, 2011 at 5:33 PM, POPE, REBECCA <[email protected]> wrote:
> Hi Syed,
> I'm a bit confused by your use of the term "cross-tab", but since you are using
>reshape, I'm going to assume you are just trying to get the prices for the
>different goods to become variables. If so, do you have some other additional
>identifying variable that you could use in your reshape command? If you have
>multiple prices for the same item at the same port, might the shipments be from
>different suppliers or have arrived on different dates? If so, you could use
>something like the following:
>
> . reshape wide price, i(port date) j(item)
>
> I'm guessing this won't give you exactly what you want because there will still
>be multiple lines per port (at least if your real data looks like the
>hypothetical data), but you'll have gotten around reshape's objections and can
>use other commands to consolidate after that. Other users might have more
>elegant solutions, but I hope this helps.
>
> If you don't have another logical ID variable to add to port, you can generate
>a fake one by doing the following:
>
> . by port, sort: generate line = _n
> . reshape wide price, i(port line) j(item)
>
> port line pri~1006 pri~2011 pri~2045 pri~4029 pri~4061 pri~7031
>pri~8041
>------------------------------------------------------------------------------------------
>-
> 1 1 . . . . 92.79 .
> .
> 1 2 37.55 . . . . .
> .
> 1 3 . . 16.21 . . .
> .
> 2 1 . . . . . .
> 12.55
> 2 2 . 13.13 . . . .
> .
>------------------------------------------------------------------------------------------
>-
> 2 3 . 89.68 . . . .
> .
> 3 1 . . . 27.62 . .
> .
> 3 2 . . 15.18 . . .
> .
> 3 3 . . . . . 68.01
> .
> 3 4 . . . 15.47 . .
> .
>
>
> Regards,
> Rebecca
>
> Rebecca A. Pope
> Program Manager
> UAMS CCTR Health Services Research
> Fay W. Boozman College of Public Health
> Dept. of Health Policy and Management
>
> -----Original Message-----
> From: [email protected]
>[mailto:[email protected]] On Behalf Of Syed Basher
> Sent: Wednesday, January 19, 2011 10:49 AM
> To: [email protected]
> Subject: st: Problems with the reshape command
>
> Dear Statalist,
>
> I am using Stata 11.1. I have the following hypothetical data:
>
> +--------------------------+
> | port item price |
> |--------------------------|
> 1. | 3 4029 27.62 |
> 2. | 3 4029 15.47 |
> 3. | 1 1006 37.55 |
> 4. | 3 2045 15.18 |
> 5. | 1 2045 16.21 |
> |------------------------|
> 6. | 1 4061 92.79 |
> 7. | 2 8041 12.55 |
> 8. | 2 2011 89.68 |
> 9. | 3 7031 68.01 |
> 10. | 2 2011 13.13 |
> |-----------------------|
>
> I would like to reshape the data to wide format using:
> . reshape wide price, i(port) j(item)
>
> This is of course problematic in Stata since "item" is not unique within
> "port". Eventually I would like to obtain the following cross-tab (in wide
> format):
>
> port | 1006 2011 2045 4029 4061 7031 8041
> -------------------------------------------------------------------
> 1 | 37.55 16.21 92.79
> 2 |
> 89.68 12.55
> 2 | 13.33
> 3 | 15.18 27.62 68.01
> 3 | 15.47
>
> I have been consulting Stata's FAQs on this issue
> (http://www.stata.com/support/faqs/data/reshape3.html) without much success.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and privileged information. Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message..
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/