Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: simple scripting and formulas


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: simple scripting and formulas
Date   Mon, 5 Jan 2009 19:59:02 -0000

Friedrich's answer gives a good low-level way to do this in Stata. 

In programs or do-files, you will get a little more speed or efficiency
by using a convenient option of -summarize-. 

su var, meanonly 
gen var2 = (var - r(min)) / (r(max) - r(min)) 

The name -meanonly- is misleading here. 

This answer uses the minimum and maximum across all panels, as Allison
seems to be asking for. 

Sometimes people will want to scale by the extremes in each panel. In
that circumstance 

egen min = min(var), by(panel) 
egen max = max(var), by(panel) 
gen var2 = (var - min) / (max - min) 

is one convenient (if not especially efficient) way to proceed using
official Stata commands. 

One canned solution is available from Stas Kolenikov: 

_gstd01 from http://web.missouri.edu/~kolenikovs/stata
    _gstd01 -- Standardize to [0,1] / / Author: Stas Kolenikov,
    [email protected] / This program is an extension to the egen
    command / that standardize the specified variable into [0,1] range /
so     that 0 corresponds to the minimum value, and 1

I don't know if that Russian email address still works. Stas has been
based in the US for some years now, as his mailings to this list and the
Missouri URL above do indicate. Stas' code is informative: 

program define _gstd01
   version 6
   gettoken type 0 : 0
   gettoken g    0 : 0
   gettoken eqs  0 : 0
   syntax varname [if] [in], [BY(varlist)]
   marksample touse
   if "`by'"=="" {
     tempvar by
     qui g byte `by'=0 if `touse'
   }
   tempname byvar vmin vmax t
   tokenize `varlist'
   sort `touse' `by' `varlist'
   qui by `touse' `by' : g long `byvar'=1 if _n==1
   qui replace `byvar'=sum(`byvar')
   qui by `touse' `by': g double `vmin'=`varlist'[1]
   qui g double `t'=-`varlist'
   sort `touse' `by' `t'
   qui by `touse' `by': g double `vmax'=-`t'[1]
   qui g `type' `g'=(`1'-`vmin')/(`vmax'-`vmin') if `touse'
   lab var `g' "`1' standardised to [0,1]"
end

In fact, that can be slimmed down a bit. The variable `byvar' does
nothing and the double sorting to get maxima as well as minima is
unnecessary given that any missing values are segregated by
-marksample-. 

*! 1.0.0 NJC 5 Jan 2009 after Stas Kolenikov 
program _gstdminmax 
	version 8 
	gettoken type 0 : 0
	gettoken g    0 : 0
	gettoken eqs  0 : 0
	syntax varname [if] [in], [BY(varlist)]
	marksample touse
	tempname vmin vmax 
	local y `varlist' 
	qui bysort `touse' `by' (`y') : /// 
	g `type' `g'= (`y' - `y'[1])/(`y'[_N] -`y'[1]) if `touse'
	lab var `g' "`1' standardised to [0,1]"
end

With this code in _gstdminmax.ado on your -adopath- an example of
panelwise scaling would be 

egen var2 = stdminmax(var), by(panel) 

Overall scaling would omit the option call: 

egen var2 = stdminmax(var)
 
Nick 
[email protected] 

Friedrich Huebler

Do the commands below yield the values you need?

. sum var
. gen var2 = (var-r(min))/(r(max)-r(min))

<[email protected]> 

> I have a question about creating formulas or scripts in STATA for my
panel
> data set.
> I wish to normalise my panel data using the following formula:
> Vi-Vmin/Vmax-Vmin (where Vi is the actual value of a variable, Vmax is
the
> manimum value in a complete data series, and Vmin is the minimum). How
do I
> generate a new list of variables using this formula?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index