Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Mata Data Structure or "variable" variable names for timeseries computations
From
Tirthankar Chakravarty <[email protected]>
To
[email protected]
Subject
Re: st: Mata Data Structure or "variable" variable names for timeseries computations
Date
Wed, 1 Aug 2012 02:23:32 -0700
Matt,
You presumably have panel data located in multiple files. My solution
would be to append the datasets and use Mata's excellent understanding
of panel structure via the -panelsetup()- function. Instead of
submatrices based on panel identifiers, you can use submatrices based
on the time variable.
I have written you a pretty general ado file which with modifications
can be made to do virtually anything in the near neighbourhood of your
stated requirements. It asks you for the start and end years and which
variables from the read in files you would like to take differences
of, and then it saves out the matrices of differences to binary format
files which can be read in for later use.
1) First, the ado file
*------------------------------------------------- matdiff.ado
cap program drop matdiff
program matdiff, rclass
version 12
syntax varlist(min=1 numeric), ///
[STartyr(integer 1995) ENdyr(integer 2010)] ///
ID(varname) ///
Time(varname)
// put the variable names together
local allvars `id' `time' `varlist'
// read the files in and append
use "file_`startyr'", clear
forvalues year=`startyr'/`endyr' {
if("file_`year'"~="file_`startyr'") {
append using file_`year'
}
}
// call the Mata function
mata: a=fnCalcDiffMat(st_local("allvars"), ///
strtoreal(st_local("startyr")), ///
strtoreal(st_local("endyr")))
end
version 12
set matastrict off
mata
// define a class to store the returned matrices
struct stMattData {
real matrix mDiff
real scalar yearCurrent
real scalar yearPrevious
}
// Mata function
struct stMattData colvector function ///
fnCalcDiffMat(string scalar varlist, ///
real scalar startyr, real scalar endyr) {
// declare the vector of structures
struct stMattData colvector stDiff
real scalar fh // the file handle
string scalar sFileName // filename string
// create some views
st_view(mV=., ., varlist, .)
mV=sort(mV, (2, 1)) // sort by id within year
// setup as panel
st_subview(mV1=., mV, ., (3..length(tokens(varlist))))
// pull variables into subview
st_subview(mV2=., mV, ., 2) // year variable is "panel"
stInfo = panelsetup(mV2, 1) // setup as panel
// instantiate classes
stDiff= J(rows(stInfo)-1, 1, stMattData())
// fill 'er up
for(i=1; i<rows(stInfo); i++) {
mCurrent = panelsubmatrix(mV1, i+1, stInfo)
mPrevious = panelsubmatrix(mV1, i, stInfo)
// do the computation
stDiff[i].mDiff = mCurrent - mPrevious
stDiff[i].yearCurrent = mV2[stInfo[i+1,1],1]
stDiff[i].yearPrevious = mV2[stInfo[i,1],1]
sFileName = invtokens(("file",
strofreal(stDiff[i].yearCurrent) ,
strofreal(stDiff[i].yearPrevious)),"_")
// write the file to disk
fh = fopen(sFileName, "rw")
fputmatrix(fh, stDiff[i].mDiff)
fclose(fh)
}
return(stDiff) // no use currently, but handy
}
end
*------------------------------------------------- matdiff.ado
2) Next, a script that implements the ado file
*---------------------------------- matdiff_script.do
clear*
// start year and endyear
local startyr 1995
local endyr 2010
// generate dummy data
forvalues year=`startyr'/`endyr' {
drop _all
set obs 10
drawnorm myvar1-myvar5
g year = `year'
g id=_n
save file_`year', replace
}
run matdiff.ado
// call the program
matdiff myvar1-myvar5, startyr(1996) ///
endyr(2005) id(id) time(year)
*---------------------------------- matdiff_script.do
Much simpler solutions are possible without invoking Mata, but this
program should be extensible.
T
On Tue, Jul 31, 2012 at 9:49 PM, Matthew McKay <[email protected]> wrote:
> Dear StataList,
>
> I am trying to compute differences in sets of matrices matrices using MATA
> for a time series 1995 to 2010.
> I am producing a mata function to compute the differences for each year
> after I load in ALL my yearly data matrices.
> My Issue is in Step#2 is that I want to use a similar methodology to local
> macro's in do files. (which can't be done)
>
> How can I pass part of a variable name (like Year) into the function and
> then perform a set of computations over the data that is loaded in MATA?
> In MATA how can you cycle through different variables (substituting in
> different year append) and compute a new set of matrices?
> Should I be constructing a Struct that contains a Data Matrix and Time
> Series Indicator?
>
> I understand the MATA function is compiled and therefore local macro
> substitution can't be done ... but was wondering if anyone else has an
> elegant solution to reference different variables in MATA by changing a
> component of the variablename (knowing the variable is defined in memory).
>
> Many Thanks,
> Matthew
>
> Step #1:
> **
> ** Import ALL MCP Matrix
> **
> local Years = "1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
> 2007 2008 2009 2010"
> clear all
> local END = "end"
> foreach year of local Years {
> use [file_`year'.dta], clear // Load MCP Matrices into
> Mata Memory //
> drop ReporterISO3C
> mata
> Mcp_`year' = st_data(., .)
> `END'
> }
>
> Step#2:
> ** MATA FUNCTION **
>
> clear all
> mata
> void calcdiffvectors(real scalar StartYear, real scalar EndYear) {
> for(year = StartYear; year < EndYear; year++) {
> !!!!!!!!! local NextYear = `year' + 1 !!!!!!!!!!!!!
> !!!!!!!!! Mcp_`year'_`NextYear' = Mcp_`year' :- Mcp_`NextYear'
> !!!!!!!!!!
> }
> }
> end
>
> Step #3:
> mata: calcdiffvectors(1995, 2010)
> // I can then retrieve the difference vectors using get mata etc. //
>
>
--
Tirthankar Chakravarty
[email protected]
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/