| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Reshape problem
I agree with Radu that this is a problem requiring
a transpose of the data.
As Radu hints, -xpose- requires all variables
to be numeric. But -sxpose- on SSC is a rough-and-ready
transpose for string variables.
So I had a go:
. sxpose, clear force
. l
+-------------------------------------------------------------------------------------------------------+
1. | _var1 | _var2 | _var3 | _var4 | _var5 | _var6 | _var7 | _var8 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|-------------------------------------------------------------------------------------------------------|
| _var9 | _var10 |
| 10 | 11 |
+-------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------+
2. | _var1 | _var2 | _var3 | _var4 | _var5 | _var6 | _var7 | _var8 |
| Sex | Age | Left eye | Right eye | Lower eyelid | Upper eyelid | Lateral canthus | Medial canthus |
|-------------------------------------------------------------------------------------------------------|
| _var9 | _var10 |
| Recurrent lesion | Primary lesion |
+-------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------+
3. | _var1 | _var2 | _var3 | _var4 | _var5 | _var6 | _var7 | _var8 |
| M | 47 | | Y | Y | | | Y |
|-------------------------------------------------------------------------------------------------------|
| _var9 | _var10 |
| | Y |
+-------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------+
4. | _var1 | _var2 | _var3 | _var4 | _var5 | _var6 | _var7 | _var8 |
| M | 66 | | Y | Y | | | Y |
|-------------------------------------------------------------------------------------------------------|
| _var9 | _var10 |
| | Y |
+-------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------+
5. | _var1 | _var2 | _var3 | _var4 | _var5 | _var6 | _var7 | _var8 |
| M | 56 | | Y | Y | | | |
|-------------------------------------------------------------------------------------------------------|
| _var9 | _var10 |
| | Y |
+-------------------------------------------------------------------------------------------------------+
. drop in 1
(1 observation deleted)
. foreach v of var _var* {
2. label var `v' "`= `v'[1]'"
3. }
. drop in 1
(1 observation deleted)
. destring, replace
_var1 contains non-numeric characters; no replace
_var2 has all characters numeric; replaced as byte
_var3 has all characters numeric; replaced as byte
(3 missing values generated)
_var4 contains non-numeric characters; no replace
_var5 contains non-numeric characters; no replace
_var6 has all characters numeric; replaced as byte
(3 missing values generated)
_var7 has all characters numeric; replaced as byte
(3 missing values generated)
_var8 contains non-numeric characters; no replace
_var9 has all characters numeric; replaced as byte
(3 missing values generated)
_var10 contains non-numeric characters; no replace
. d
Contains data from spiedel.dta
obs: 3
vars: 10 4 May 2006 20:39
size: 183 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_var1 str3 %9s Sex
_var2 byte %10.0g Age
_var3 byte %10.0g Left eye
_var4 str9 %9s Right eye
_var5 str12 %12s Lower eyelid
_var6 byte %10.0g Upper eyelid
_var7 byte %10.0g Lateral canthus
_var8 str14 %14s Medial canthus
_var9 byte %10.0g Recurrent lesion
_var10 str14 %14s Primary lesion
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
. l
+--------------------------------------------------------------------------------+
| _var1 _var2 _var3 _var4 _var5 _var6 _var7 _var8 _var9 _var10 |
|--------------------------------------------------------------------------------|
1. | M 47 . Y Y . . Y . Y |
2. | M 66 . Y Y . . Y . Y |
3. | M 56 . Y Y . . . Y |
+--------------------------------------------------------------------------------+
Naturally, I can't see what problems lurk in the rest of the data, but so
far, so good.
Nick
[email protected]
Radu Ban
> i don't know if this is the best way, but you could use the
> -xpose- command.
>
> but before that you need to make all you variables numerical.
> for example:
>
> /*coding males as 1 females as 0, Y as 1 no(missing) as 0 */
> /*coding age as numeric age*/
>
> forval i = 2/256 {
> gen real_v`i' = 1 if v`i' == "M" | v`i' == "Y"
> replace real_v`i' = 0 if v`i' == "F" | v`i' == ""
> replace real_v`i' = real(v`i') in 2
> }
>
> /*now you no longer really need id and v1, as long as you make note of
> what each row means*/
>
> drop id v1
>
> /*now you can transpose*/
> xpose, clear
>
> /*now you rename to get the variables name back*/
> rename v1 sex
> rename v2 age
> ...
>
> /*now you label values*/
> label define yesno 1 yes 2 no
> label define gender 1 male 0 female
> ...
>
> hope this helps. there are probably more elegant solutions though.
Thomas Speidel
> > I have a dataset that I need to re-organize in a long format.
> > Here is a sample:
> >
> > +--------------------------------------+
> > | id v1 v2 v3 v4 |
> > |--------------------------------------|
> > | 2 Sex M M M |
> > | 3 Age 47 66 56 |
> > | 4 Left eye |
> > | 5 Right eye Y Y Y |
> > | 6 Lower eyelid Y Y Y |
> > |--------------------------------------|
> > | 7 Upper eyelid |
> > | 8 Lateral canthus |
> > | 9 Medial canthus Y Y |
> > | 10 Recurrent lesion |
> > | 11 Primary lesion Y Y Y |
> > +--------------------------------------+
> >
> > There are 255 observations (v2-v256) occupying the columns.
> I need each
> > observation to be a distinct row. Almost all of the
> entries are string,
> > (missing means "NO"). I am having difficulties using the
> reshape command
> > to achieve my goal, possibly because of the strings. Any
> suggestion on
> > how to approach this? Should I create a loop to encode all
> variables?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/