Kazi Ali Toufique
> I have a simple problem. I have a variable (say SEX) with
> codes 1 (say Female) and 2 (Male). When I tabulate for a dataset
that
> has all male then
> it does not show 1 (i. e. female) = 0 instead it shows 2 =
> 100%. How do I
> make a tabulate (or some other similar command) command
> show frequencies
> for all codes, not just only for the codes that exist in a
> particular database?
This is an example of a fairly common and more general
problem, which Kit Baum and I call the -tabcond- problem:
tabulate how many observations satisfy one or more
conditions, and show all counts explicitly, whether zero or more.
We wrote some programs without feeling that we had a good general
solution.
For this specific problem, however,
something cooked up ad hoc may help:
-tabcount- counts occurrences of specified values.
If the values are numeric, specify which you
want in the required -values()- option (numlists
are fine):
. tabcount rep78 , values(1/5)
----------------------------------------------
Repair |
Record |
1978 | Freq. Percent Cum.
----------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
|
Total | 69 100.00
----------------------------------------------
. bysort foreign : tabcount rep78 , values(1/5)
______________________________________________________________________
_________
-> foreign = Domestic
----------------------------------------------
Repair |
Record |
1978 | Freq. Percent Cum.
----------+-----------------------------------
1 | 2 4.17 4.17
2 | 8 16.67 20.83
3 | 27 56.25 77.08
4 | 9 18.75 95.83
5 | 2 4.17 100.00
|
Total | 48 100.00
----------------------------------------------
______________________________________________________________________
_________
-> foreign = Foreign
----------------------------------------------
Repair |
Record |
1978 | Freq. Percent Cum.
----------+-----------------------------------
1 | 0 0.00 0.00
2 | 0 0.00 0.00
3 | 3 14.29 14.29
4 | 9 42.86 57.14
5 | 9 42.86 100.00
|
Total | 21 100.00
----------------------------------------------
If the values are string, again you need
to specify them. For this, -levels- from
SSC can save some typing. Here is
a silly example:
. egen Make = ends(make), head
. levels Make
`"AMC"' `"Audi"' `"BMW"' `"Buick"' `"Cad."' `"Chev."' `"Datsun"'
`"Dodge"' `"Fiat"' `"Ford"' `"Honda"' `"Linc
> ."' `"Mazda"' `"Merc."' `"Olds"' `"Peugeot"' `"Plym."' `"Pont."'
`"Renault"' `"Subaru"' `"Toyota"' `"VW"' `
> "Volvo"'
. bysort rep78 : tabcount Make , values(`r(levels)')
______________________________________________________________________
_________
-> rep78 = 1
----------------------------------------------
Make | Freq. Percent Cum.
----------+-----------------------------------
AMC | 0 0.00 0.00
Audi | 0 0.00 0.00
BMW | 0 0.00 0.00
Buick | 0 0.00 0.00
Cad. | 0 0.00 0.00
Chev. | 0 0.00 0.00
Datsun | 0 0.00 0.00
Dodge | 0 0.00 0.00
Fiat | 0 0.00 0.00
Ford | 0 0.00 0.00
Honda | 0 0.00 0.00
Linc. | 0 0.00 0.00
Mazda | 0 0.00 0.00
Merc. | 0 0.00 0.00
Olds | 1 50.00 50.00
Peugeot | 0 0.00 50.00
Plym. | 0 0.00 50.00
Pont. | 1 50.00 100.00
Renault | 0 0.00 100.00
Subaru | 0 0.00 100.00
Toyota | 0 0.00 100.00
VW | 0 0.00 100.00
Volvo | 0 0.00 100.00
|
Total | 2 100.00
----------------------------------------------
______________________________________________________________________
_________
-> rep78 = 2
< snip >
Here is the code. I'll ask Kit to
put it up on SSC.
---------------------- begin tabcount.ado
*! NJC 1.0.0 3 Nov 2002
program def tabcount, byable(recall)
version 7
syntax varname [if] [in] , values(str asis)
marksample touse, strok
tempvar Values Freq SFreq Percent Cum
qui gen long `Freq' = .
qui gen str1 `SFreq' = ""
capture numlist "`values'"
* string values
qui if _rc { /* */
gen str1 `Values' = ""
tokenize `"`values'"'
local nvals : word count `values'
forval i = 1 / `nvals' { /* */
replace `Values' = `"``i''"' in `i'
count if `touse' & `varlist' == `"``i''"'
replace `Freq' = r(N) in `i'
}
}
* numeric values
else qui { /* */
gen `Values' = .
tokenize `r(numlist)'
local nvals : word count `r(numlist)'
forval i = 1 / `nvals' { /* */
replace `Values' = ``i'' in `i'
count if `touse' & `varlist' == ``i''
replace `Freq' = r(N) in `i'
}
}
qui { /* */
su `Freq', meanonly
gen `Percent' = 100 * `Freq' / r(sum)
gen `Cum' = sum(`Percent') in 1 / `nvals'
local nvalsp1 = `nvals' + 1
replace `Freq' = r(sum) in `nvalsp1'
replace `Percent' = 100 in `nvalsp1'
replace `SFreq' = string(`Freq')
}
_crcslbl `Values' `varlist'
local vallbl : value label `varlist'
if "`vallbl'" != "" { /* */
label val `Values' `vallbl'
}
label var `SFreq' "Freq."
label var `Percent' "Percent"
label var `Cum' "Cum."
tabdisp `Values' in 1 / `nvalsp1', /*
*/ cell(`SFreq' `Percent' `Cum') format(%3.2f) total
end
----------------------- end tabcount.ado
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/