Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Questions for random data generation and value label
From
Yu Xue <[email protected]>
To
[email protected]
Subject
Re: st: Questions for random data generation and value label
Date
Tue, 12 Mar 2013 11:33:14 -0500
Thanks everyone for answering my question, especially Joseph! I think
you offered a right solution to my problem, although the result is
still not accurate.
Mark
On Mon, Mar 11, 2013 at 8:28 PM, Joseph Coveney <[email protected]> wrote:
> Mark Yu Xue wrote:
>
> Let me use an example to describe my question more clearly.
>
> There is an actual data that has three variables: Var1, Var2, Var3.
> Each of them has continuous numeric values. And I get the max, min,
> SD, mean for each of them, and save them in several macros, and then
> clear the memory.
>
> Then, I want to generate a synthetic data, which also include three
> variables: SynVar1, SynVar2, SynVar3. And they keep the same max, min,
> SD, mean of Var1, Var2, Var3, respectively as in actual data.
>
> --------------------------------------------------------------------------------
>
> If you have the actual data available, then you can try fitting a Johnson
> distribution to each variable (with one of the user-written commands -jnsn- or
> -jnsw-), and then generate the artificial dataset from the parameters of the
> Johnson distribution (using the user-written command -ajv-). All three
> user-written commands are in the same package, "JNSN", which you can download
> from SSC. Type -findit jnsn- to see more.
>
> These commands will not get you the exact-same mean, SD, minimum and maximum of
> the original variable each time, but Johnson distributions have been considered
> useful in creating artificial data following the same arbitrary (unknown)
> distribution of actual data of interest, for example, in order to characterize
> the behavior of candidate estimators or tests.
>
> The commands' help files might be a little busy-looking your first time through
> them, but the commands' use together is rather simple, with just two required
> lines of code: first either -jnsn- or -jnsw-, and then -ajv- using the returned
> scalars and macros of the first command. I've illustrated their use in a simple
> example below.
>
> Joseph Coveney
>
> . sysuse auto
> (1978 Automobile Data)
>
> . jnsn mpg
> Johnson's system of transformations
>
>
> Mean and moments for mpg
> Mean = 21.297
> Variance = 33.472
> Skewness = 0.949
> Kurtosis = 3.975
>
>
> Johnson distribution type: SB
> gamma = 2.248
> delta = 1.541
> xi = 9.616
> lambda = 56.418
>
>
> Note: Program terminated normally
>
> . return list
>
> scalars:
> r(lambda) = 56.41802121562024
> r(xi) = 9.615504048256971
> r(delta) = 1.54090335776377
> r(gamma) = 2.247612125156365
>
> macros:
> r(fault) : "Program terminated normally"
> r(johnson_type) : "SB"
>
> . ajv , distribution(`r(johnson_type)') generate(fake_mpg) lambda(`r(lambda)')
> xi(`r(xi)') gamma(`r(gamma)') delta(`r(delta)') seed(12345) n(100)
>
> . summarize mpg fake_mpg
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> mpg | 74 21.2973 5.785503 12 41
> fake_mpg | 100 20.84794 5.561717 12.62255 37.59033
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/