Perhaps a way to get back the memory would be to -encode- the long
"label" variable. If you do that in the data set of codes and labels,
then there will be a consistent encoding that will come along each time
you merge the labels into a dataset.
--Nick
-----------------------------------------------------------
Nicholas Winter, Ph.D. P 202.939.5343
Policy Studies Associates F 202.939.5732
1718 Connecticut Avenue, NW [email protected]
Washington, DC 20009-1148 www.policystudies.com
-----------------------------------------------------------
> -----Original Message-----
> From: HealthMaps [mailto:[email protected]]
> Sent: Friday, October 25, 2002 1:11 PM
> To: [email protected]
> Subject: st: RE: RE: RE: RE: suggestion for Stata 8: value
> labels for string variables
>
>
> Thanks for taking the time to work this out; this is a big help!
> The problem with this is that it does add a lot to the size
> of my file, but
> .... I got 2 gigs of memory and lots of disk space. Again,
> thank you ...
>
> Richard Hoskins
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Nick Winter
> Sent: Friday, October 25, 2002 10:04 AM
> To: [email protected]
> Subject: st: RE: RE: RE: suggestion for Stata 8: value labels
> for string
> variables
>
>
> It seems to me that the answer here is to store the "labels" in a
> separate string variable. Then you can use the short name variable or
> the long name variable (ie, the "value" or the "label") as you like.
>
> This is, I think, the direction that Nick Cox is going here.
>
> To do this, and apply it across multiple files, first create a stata
> dataset with two string variables: the short "value" variable, and a
> longer, "label" variable. This should have one record per label. (You
> could probably read this using -infix- directly from your SAS proc
> format text file.
>
> . clear
>
> . infix str4 cause 2-5 str80 label 9-89 using labels.txt
> (13 observations read)
>
> (You will need to play with the column positions given your SAS PROC
> FORMAT file, but this gives you the idea...)
>
> . list
>
>
> cause
> label
> 1. A391 Waterhouse Friderichsen
> syndrome'
> 2. A483 Toxic shock
> syndrome'
> 3. A985 Hemorrhagic fever w renal
> syndrome'
> 4. B222 HIV dis wasting
> syndrome'
> 5. B230 Act HIV infect
> syndrome'
> 6. D469 Myelodysplastic
> syndrome,
> unspec'
> 7. D593 Hemolytic uremic
> syndrome'
> 8. D65' Diseminated intravascular coagulation [defibrination
> syndrome]'
> 9. D762 Hemophagocytic
> syndrome, infect
> assoc'
> 10. D814 Nezelofs
> syndrome'
> 11. D820 Wiskott Aldrich
> syndrome'
> 12. D821 Di Georges
> syndrome'
> 13. D824 Hyperimmunoglobulin E [IgE]
> syndrome'
>
> . replace label=substr(label,1,index(label,"'")-1) if
> index(label,"'")
> (13 real changes made)
>
> . replace cause=substr(cause ,1,index(cause,"'")-1) if
> index(cause,"'")
> (1 real change made)
>
> This gets rid of the trailing single quote characters in each
> variable...
>
> . list
>
> cause
> label
> 1. A391 Waterhouse Friderichsen
> syndrome
> 2. A483 Toxic shock
> syndrome
> 3. A985 Hemorrhagic fever w renal
> syndrome
> 4. B222 HIV dis wasting
> syndrome
> 5. B230 Act HIV infect
> syndrome
> 6. D469 Myelodysplastic
> syndrome,
> unspec
> 7. D593 Hemolytic uremic
> syndrome
> 8. D65 Diseminated intravascular coagulation [defibrination
> syndrome]
> 9. D762 Hemophagocytic
> syndrome, infect
> assoc
> 10. D814 Nezelofs
> syndrome
> 11. D820 Wiskott Aldrich
> syndrome
> 12. D821 Di Georges
> syndrome
> 13. D824 Hyperimmunoglobulin E [IgE]
> syndrome
>
> . sort cause
>
> . save labelfile, replace
> (note: file labelfile.dta not found)
> file labelfile.dta saved
>
> Sort on the cause variable, and save the master labelling file.
>
> . use mydata
> . sort cause
> . merge cause using labelfile , nokeep
> . list cause age label
>
> cause age
> label
> 1. A391 71 Waterhouse
> Friderichsen syndrome
> 2. A483 32
> Toxic shock syndrome
> 3. A985 45
> Hemorrhagic fever
> w renal syndrome
> 4. B222 85
> HIV dis
> wasting syndrome
> 5. B230 91 Act
> HIV infect syndrome
> 6. D469 56
> Myelodysplastic
> syndrome, unspec
> 7. D593 74
> Hemolytic uremic syndrome
> 8. D65 44 Diseminated intravascular coagulation
> [defibrination syndrome]
> 9. D762 58 Hemophagocytic
> syndrome, infect assoc
> 10. D814 65
> Nezelofs syndrome
> 11. D820 69
> Wiskott
> Aldrich syndrome
> 12. D821 72
> Di
> Georges syndrome
> 13. D824 85
> Hyperimmunoglobulin
> E [IgE] syndrome
>
>
> . table cause, c(mean age) stubwidth(40)
>
> -----------------------------------------------------
> cause | mean(age)
> -----------------------------------------+-----------
> A391 | 71
> A483 | 32
> A985 | 45
> B222 | 85
> B230 | 91
> D469 | 56
> D593 | 74
> D65 | 44
> D762 | 58
> D814 | 65
> D820 | 69
> D821 | 72
> D824 | 85
> -----------------------------------------------------
>
> . table label, c(mean age) stubwidth(40)
>
> -----------------------------------------------------
> label | mean(age)
> -----------------------------------------+-----------
> Act HIV infect syndrome | 91
> Di Georges syndrome | 72
> Diseminated intravascular coagulation [d | 44
> HIV dis wasting syndrome | 85
> Hemolytic uremic syndrome | 74
> Hemophagocytic syndrome, infect assoc | 58
> Hemorrhagic fever w renal syndrome | 45
> Hyperimmunoglobulin E [IgE] syndrome | 85
> Myelodysplastic syndrome, unspec | 56
> Nezelofs syndrome | 65
> Toxic shock syndrome | 32
> Waterhouse Friderichsen syndrome | 71
> Wiskott Aldrich syndrome | 69
> -----------------------------------------------------
>
>
> --Nick Winter
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/