Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Alphabetical sort by value label


From   "David Elliott" <[email protected]>
To   [email protected]
Subject   st: Alphabetical sort by value label
Date   Tue, 16 Dec 2008 19:24:35 -0400

I have a need to sort data by the alphabetical sort order of the value
labels of a variable or have value labels appear in alphabetical order
in tabs & tables.  Normally, I could -decode x,gen(y)- and -sort y- to
do so, but in this case I have a very large dataset, long value labels
and no room to add a string## variable.  Besides, I don't want to do
this manually, I want to generalize the process to something like
-labvalsort x-

I've looked in various nooks and crannies of the manual, I've
-hsearch-ed and -findit-ed without any luck.  Even Nick's redoubtable
labutils couldn't help in this case.

(1) Does anyone know if such a utility program exists?

(2) If such a program does not exist, I will outline how I am
approaching the problem on my own.

In order to get a list of labels into manipulatable form, there were
two basic approaches that I considered:
(A) using a -levelsof- on the value labeled variable of interest and
then looping to get the labels and sorting the resultant list
(B) using mata's -st_vlload()- function to allow processing of the
labels in a matrix

As a mata neophyte (B) looked daunting but had a certain beauty to it
since a simple function can accomplish the work of a loop.

I created the following -labvalsort- and much to my surprise, it worked:

-----------begin code - watch for wrapping--------------
program define labvalsort
version 9.0

*! version 1.0.0  2008.12.16
*! Alphabetical sort by value label
*! by David C. Elliott

*! syntax is labvalsort varname
*! creates new variable prepending "_" to varname
*! creates new value label prepending "_" to varname's vlaue label

local vallab : val lab `1'
if "`vallab'"=="" {
	error 182
	exit
	}
mata: vallabsort("`1'","`vallab'")
qui recode `1' `rc_list',gen(_`1')
lab val _`1' `new_lab'
end

mata:
void vallabsort(string scalar varname , string scalar val_lab)
{
	string scalar new_lab, rc_list
	string matrix lab_list, sort_lab
	new_lab = "_" + val_lab
	st_vlload(val_lab, values=.,text=.)
	lab_list=(strofreal(values),text)
	sort_lab=lab_list[.,1], sort(lab_list,2)
	/*
	origorder = strtoreal(sort_lab[.,1]')
	neworder = strtoreal(sort_lab[.,2]')
	*/
// create new value label
	st_vlmodify(new_lab,strtoreal(sort_lab[.,1]),sort_lab[.,3])
// loop to create recode list
	for (i=1; i<=rows(sort_lab); i++) {
		rc_list = rc_list + "(" + sort_lab[i,2] + "=" + sort_lab[i,1]+")"
	    }
	st_local("rc_list",rc_list)
	st_local("new_lab",new_lab)
	}
end
---------------------------end code----------------------------

One can test this with:

-----------begin code - watch for wrapping--------------
* testing labvalsort
sysuse nlsw88.dta
lab list indlbl
tab ind
labvalsort industry
lab list _indlbl
tab _ind
sort _ind
---------------------------end code----------------------------

The -labvalsort- program creates a new variable and value label where
the new variable is numerically sorted in the new value label's
alphabetical order.  (Note that error checking is rudimentary at this
point)

(3) I'd like to get rid of the recode list loop. I believe it may be
possible to do the recode from within mata and had created two vectors
origorder and neworder (currently commented out) that I intend(ed) to
use.  However, I don't currently see how that is possible and would
appreciate some suggestions on how to perform the recode from within
mata.  It may be that there isn't a -st_something- function available.

While I don't anticipate coming up against macro length limits in the
usual situation, recodes involving thousands of numbers could create
an rc_list exceeding 65536. Staying in mata would stay away from that
problem, I think.

If there is a positive answer to (1) above, then (2) is a bit
redundant, albeit an interesting challenge.  If someone would like to
help with (3) it would be appreciated.  Indeed, if there is an
alternate approach to the -labvalsort- problem, I'd enjoy the
discussion.  If others would find  -labvalsort- of use, I'll spruce it
up a bit with user choice of newvar name for the sorted variable.

Many thanks,

-- 
David Elliott
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index