Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library
From
Gabi Huiber <[email protected]>
To
[email protected]
Subject
st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library
Date
Fri, 19 Feb 2010 17:54:58 -0500
Hello Statalist,
I am trying to get a stratified random sample without replacement.
There are three ways that I can think of, and I am curious about the
differences between them.
Suppose I define a sample count as a local named sample_ct. Then in a
local named strata I list the variables that define my strata. The
three ways that I can think of go like this:
sample `sample_ct', count by(`strata') // (1) as suggested in [D] sample
bysort `strata': sample `sample_ct', count // (2) per
http://www.ats.ucla.edu/stat/Stata/faq/sample.htm
gsample `sample_ct', strata(`strata') replace // (3) using gsample by
Ben Jann, via ssc install
You can replicate this setup with one of your .dta files and a
variable list of your choice, side-by-side with one of the included
data sets. I chose lifeexp.dta, with the do-file below:
___do-file starts here___
local strata1 region lexp
local strata2 ${LAT_strata} // use your own varlist
local whichfile1 sysuse lifeexp
local whichfile2 use "${sub_file}" // use your own file
local sample_ct 1
local formulaz "bysortsample sampleby gsample"
local formulaz "bysortsample sampleby"
forvalues i=1/2 {
local bysortsample "bysort `strata`i'': sample `sample_ct', count"
local sampleby "sample `sample_ct', count by(`strata`i'')"
local gsample "gsample `sample_ct', strata(`strata`i'')"
foreach k in `formulaz' {
tempfile `k'_file`i'
`whichfile`i''
set seed 1234567
``k''
save "``k'_file`i''", replace
count
}
di ""
di "`whichfile'"
drop _all
use "`sampleby_file`i''"
cf _all using "`bysortsample_file`i''"
}
___and ends here___
The cf command will turn up all sorts of discrepancies between the
files generated by (1) and (2) and I have no idea why that would be
so. That is my first question.
But gsample as applied with (3) is making further trouble. Here's the output:
mm_sample() from -moremata- is required; type ssc install moremata
r(499);
end of do-file
r(499);
Yet I do have moremata. I checked:
. ssc install moremata
checking moremata consistency and verifying not already installed...
all files already exist and are up to date.
.
Has anybody seen this kind of phantom ssc installs before? How did you
fix yours?
Thank you,
Gabi
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/