Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Questions for random data generation and value label
From
"Joseph Coveney" <[email protected]>
To
<[email protected]>
Subject
Re: st: Questions for random data generation and value label
Date
Tue, 12 Mar 2013 10:28:27 +0900
Mark Yu Xue wrote:
Let me use an example to describe my question more clearly.
There is an actual data that has three variables: Var1, Var2, Var3.
Each of them has continuous numeric values. And I get the max, min,
SD, mean for each of them, and save them in several macros, and then
clear the memory.
Then, I want to generate a synthetic data, which also include three
variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min,
SD, mean of Var1, Var2, Var3, respectively as in actual data.
--------------------------------------------------------------------------------
If you have the actual data available, then you can try fitting a Johnson
distribution to each variable (with one of the user-written commands -jnsn- or
-jnsw-), and then generate the artificial dataset from the parameters of the
Johnson distribution (using the user-written command -ajv-). All three
user-written commands are in the same package, "JNSN", which you can download
from SSC. Type -findit jnsn- to see more.
These commands will not get you the exact-same mean, SD, minimum and maximum of
the original variable each time, but Johnson distributions have been considered
useful in creating artificial data following the same arbitrary (unknown)
distribution of actual data of interest, for example, in order to characterize
the behavior of candidate estimators or tests.
The commands' help files might be a little busy-looking your first time through
them, but the commands' use together is rather simple, with just two required
lines of code: first either -jnsn- or -jnsw-, and then -ajv- using the returned
scalars and macros of the first command. I've illustrated their use in a simple
example below.
Joseph Coveney
. sysuse auto
(1978 Automobile Data)
. jnsn mpg
Johnson's system of transformations
Mean and moments for mpg
Mean = 21.297
Variance = 33.472
Skewness = 0.949
Kurtosis = 3.975
Johnson distribution type: SB
gamma = 2.248
delta = 1.541
xi = 9.616
lambda = 56.418
Note: Program terminated normally
. return list
scalars:
r(lambda) = 56.41802121562024
r(xi) = 9.615504048256971
r(delta) = 1.54090335776377
r(gamma) = 2.247612125156365
macros:
r(fault) : "Program terminated normally"
r(johnson_type) : "SB"
. ajv , distribution(`r(johnson_type)') generate(fake_mpg) lambda(`r(lambda)')
xi(`r(xi)') gamma(`r(gamma)') delta(`r(delta)') seed(12345) n(100)
. summarize mpg fake_mpg
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
fake_mpg | 100 20.84794 5.561717 12.62255 37.59033
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/