This looks like a -reshape- to me.
. l
+-----------------------------------+
| buyer price colour age id |
|-----------------------------------|
1. | X 10 Yellow 12 1 |
2. | X 10 Red 16 2 |
3. | Y 12 Red 16 2 |
4. | Z 13 Red 16 2 |
5. | Z 15 Blue 4 3 |
|-----------------------------------|
6. | Y 80 Green 1 4 |
7. | H 6 White 20 5 |
8. | H 10 White 20 5 |
9. | X 14 Red 16 2 |
+-----------------------------------+
. gen newid = _n
. reshape wide price, i(newid id colour age) j(buyer) string
(note: j = H X Y Z)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 9 -> 9
Number of variables 6 -> 8
j variable (4 values) buyer -> (dropped)
xij variables:
price -> priceH priceX ... priceZ
-----------------------------------------------------------------------------
. renpfix price
. drop newid
. l
+---------------------------------------+
| colour age id H X Y Z |
|---------------------------------------|
1. | Yellow 12 1 . 10 . . |
2. | Red 16 2 . 10 . . |
3. | Red 16 2 . . 12 . |
4. | Red 16 2 . . . 13 |
5. | Blue 4 3 . . . 15 |
|---------------------------------------|
6. | Green 1 4 . . 80 . |
7. | White 20 5 6 . . . |
8. | White 20 5 10 . . . |
9. | Red 16 2 . 14 . . |
+---------------------------------------+
See [D] reshape and the FAQ on reshape.
Nick
[email protected]
I.A.C. van de Snepscheut
> I have a problem with my dataset. It is very huge, almost 400000
> lines. I want to change my dataset. It is sort of transposing a very
> big matrix. But the difference is that the columns with the same must
> be put together. To be clear is here a table how the situation is in
> small:
>
> Buyer Price buyer offers Colour Age Product ID
> X 10 Yellow 12 1
> X 10 Red 16 2
> Y 12 Red 16 2
> Z 13 Red 16 2
> Z 15 Blue 4 3
> Y 80 Green 1 4
> H 6 White 20 5
> H 10 White 20 5
> X 14 Red 16 2
>
> Wanted situation:
>
> Product ID Colour Age X Y Z H
> 1 Yellow 12 10 0 0 0
> 2 Red 16 10 12 13 0
> 2 Red 16 14 0 0 0
> 3 Blue 4 0 0 15 0
> 4 Green 1 0 80 0 0
> 5 White 20 0 0 0 6
> 5 White 20 0 0 0 10
>
> Or
>
> Product ID Colour Age X Y Z H
> 1 Yellow 12 10 0 0 0
> 2 Red 16 10 0 0 0
> 2 Red 16 0 12 0 0
> 2 Red 16 0 0 13 0
> 2 Red 16 14 0 0 0
> 3 Blue 4 0 0 15 0
> 4 Green 1 0 80 0 0
> 5 White 20 0 0 0 6
> 5 White 20 0 0 0 10
>
> Does anyone knows how to do this in stata?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/