Pooja Gupta wrote
>
> > one of my
> > variables has multiple alphanumeric characters that are not
> > seperated by commas.
> > for eg, the first five observations of the variable are
> >
> > 1. ABC
> > 2. ABCEG
> > 3. BDEGHI
> > 4. ACDFGI
> > 5. AHI
> >
> > can a write a code which allows me to do a tabulation of each
> > of these alphabets
> > (i.e., how many As, how many B, how many C and so on) ?
and Tom Steichen suggested
>
> Something of the form
>
> . for any A B C D E F G H I: gen v_X=index(var, "X") \ replace
> v_X=1 if v_X>1
>
> where A B C D E F G H I is the list of possible alpha characters
> and var is the variable of interest
>
> will generate individual numeric (0,1) variables for each alpha code
> that can then be tabulated with the usual tabulation commands.
>
> Tom
>
There's a small slip in Tom's code here.
He meant
. for any A B C D E F G H I: gen v_X=index(var, "X") \ replace
v_X=1 if v_X>0
because otherwise all occurrences in the first column will
be ignored. In fact, his code can be telescoped:
. for any A B C D E F G H I: gen v_X=index(var, "X") > 0
That still leaves several variables, which as said can be
tabulated one by one, but you might want something more
compact.
Here's another way to approach it. I assume string variable
-v-.
1. -save- the data set if not already saved.
2. -trim()- any spaces:
replace v = trim(v)
3. calculate the length of each string:
gen l = length(v)
4. record obs number
gen long obs = _n
5. -expand- using -l-
expand l
6. -sort- and take each character
bysort obs: gen str1 char = substr(v,_n,1)
7. -tabulate- results
tab char
8. -save- this data set if needed in future
9. return to original data set
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/