Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Mata: Calculating conditional expectation
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Mata: Calculating conditional expectation
Date
Thu, 25 Apr 2013 12:13:30 +0100
su Y_T if inlist(Y_T, 0, 1), meanonly
scalar fp = r(mean)
scalar fn = 1 - fp
Nick
[email protected]
On 25 April 2013 12:09, Nick Cox <[email protected]> wrote:
> In this there is a lot of putting constants into variables, only to
> drop them later.
>
> In addition, whenever you want only one mean, -egen-'s -mean()-
> function is overkill.
>
> If you want to handle constants, that usually calls for scalars, not variables.
>
> -summarize, meanonly- leaves r(mean) behind in memory, which you can
> exploit more than you do.
>
> Your first block I would rewrite as
>
> count if Y == 1 & Y_T == 0
> scalar fp = r(N)/_N
> count if Y == 0 & Y_T == 1
> scalar fn = r(N)/_N
>
> Indeed
>
> su Y_T if inlist(Y_T, 0, 1), meanonly
> scalar fp = r(mean)
> scalar fn = 1 - fppct
>
> is better technique, I suggest. It does not presume there are no
> missing values on -Y_T-. (In addition, you calculated fractions, not
> percents.)
>
> The block
>
> local dummies
> forvalues i = 1/1000 {
> egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0
> gen idfp`i' = fppct * fpmean`i'
> local dummies "`dummies' idfp`i'"
> drop fpmean`i'
> }
>
> could be simplified to
>
> local dummies
> forvalues i = 1/1000 {
> su id`i' if Y == 1 & Y_T == 0, meanonly
> gen idfp`i' = fppct * r(mean)
> local dummies "`dummies' idfp`i'"
> }
>
> These are small changes. I've not tried to understand the whole of
> what you are doing.
>
> Nick
> [email protected]
>
>
> On 25 April 2013 11:22, Ivan Png <[email protected]> wrote:
>> Let me try to explain how I did it. I did not use Mata to direct
>> computing the vector of conditional expectation of the explanatory
>> variables. Rather, I constructed the conditional expectation of each
>> explanatory variable separately, and then collected them into the
>> vector. Not very elegant, but seems to work.
>>
>> == code fragment ==
>>
>> . count if Y == 1 & Y_T == 0 /* Y = observed Y; Y_T = true Y */
>> . gen fppct = r(N)/total /* total = no. of observations */
>> /* fppct = percent false positives */
>> . count if Y == 0 & Y_T == 1
>> . gen fnpct = r(N)/total /* fppct = percent false negatives */
>>
>> . gen const = 1
>>
>>
>> *** create X including dummy variables (fixed effects) ***
>>
>> . local dummies
>> . forvalues i=1/1000 {
>> local dummies "`dummies' id`i'"
>> }
>>
>> . mata : st_view(X = .,., ("hhr", "const", "`dummies'")) /* hhr =
>> higher degree */
>> . mata : rows(X), cols(X) /* check matrix */
>> . mata : X[|1,1 \ 7,7|] /* check matrix for empty rows */
>>
>>
>> ** construct conditional mean(X) **
>> . sort lower year
>>
>> . foreach var in const hhr {
>> egen `var'_fpmean = mean(`var') if Y == 1 & Y_T == 0
>> gen `var'_fp = fppct * `var'_fpmean
>> egen `var'_fnmean = mean(`var') if Y == 0 & Y_T == 1
>> gen `var'_fn = fnpct * `var'_fnmean
>> drop `var'_fnmean
>> }
>>
>> ** construct P = mean(X) conditional on false positive **
>> . local dummies
>> . forvalues i = 1/1000 {
>> egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0
>> gen idfp`i' = fppct * fpmean`i'
>> local dummies "`dummies' idfp`i'"
>> drop fpmean`i'
>> }
>>
>>
>> . mata : st_view(F = .,., ("hhr_fp", "const_fp", "`dummies'"))
>> . mata : rows(F), cols(F) /* check matrix */
>> . mata : F[|1,1 \ 15,15|] /* check matrix for empty rows */
>>
>> . mata : st_subview(P = . , F, 1 , .) /* choose first non-empty row */
>> . mata : rows(P), cols(P)
>> . mata : P[|1,1 \ 1,20|] /* check vector */
>>
>>
>> ** construct N = mean(X) conditional on false negative **
>> . sort lower year
>>
>> . local dummies
>> . forvalues i = 1/1000 {
>> egen fnmean`i' = mean(id`i') if Y == 0 & Y_T == 1
>> gen idfn`i' = fnpct * fnmean`i'
>> local dummies "`dummies' idfn`i'"
>> drop fnmean`i'
>> }
>>
>> . mata : st_view(G = .,., ("hhr_fn", "const_fn", "`dummies'"))
>> . mata : rows(G), cols(G) /* check matrix */
>> . mata : G[|1,1 \ 20,20|] /* check matrix for empty rows */
>>
>> . mata : st_subview(N = . , G, 3 , .) /* choose first non-empty row */
>> . mata : rows(N), cols(N)
>> . mata : N[|1,1 \ 1,20|] /* check vector */
>>
>>
>> ** estimate bias **
>> . mata : D = invsym(cross(X,X))*(P' - N')
>> . mata : rows(D), cols(D)
>> . mata : D[|1,1 \ 9,1|] /* check vector */
>>
>> . mata : st_subview(E = . , D, (1::5) , .) /* coefficients of focal
>> variables */
>> . mata : E
>>
>> . mata : H = 9898 * E /* total = 9898 */
>> . mata : H
>>
>> === end fragment ===
>>
>>
>> On 23 April 2013 11:45, Ivan Png <[email protected]> wrote:
>>>
>>> Many thanks to Statalist members for their previous help on
>>> constructing the matrix.
>>>
>>> To recall, my dependent variable is categorical: Mobility = 1 if
>>> inventor changed employer, else = 0. I'm investigating the effect of
>>> classification error. Obviously, this cannot be classical. Let Y =
>>> true mobility and Z = recorded mobility. If Y = 0 and Z = 1, then
>>> error = -1, while if Y = 1 and Z = 0, error = +1.
>>>
>>> Meyer and Mittag, U of Chicago (2012) characterize the bias as
>>> N(X'X)^{-1} [ Pr( Y = 0 & Z = 1) E(X : Y = 0 & Z = 1) - Pr(Y = 1 & Z
>>> = 0) E(X : Y = 1 & Z = 0) ]. I have a benchmark data set with both
>>> the true and inaccurate mobility data, and would like to compute the
>>> bias.
>>>
>>> So, I need to compute the conditional expectations, E(X : Y = 0 & Z =
>>> 1) and E(X : Y = 1 & Z = 0), and then weight by the probabilities,
>>> take the difference, and pre-multiply by N(X'X)^{-1}. My idea:
>>>
>>> . keep if Y = 0 & Z = 1
>>> . mata : F = mean(X)
>>> . mata : mata matsave filename
>>>
>>> and repeat for Y = 1 & Z = 0.
>>>
>>> But, sadly, I don't how to proceed. How to combine the original data
>>> with the two new files containing the conditional expectations, and
>>> then going back to MATA to calculate the bias. Grateful to
>>> Statalisters for help.
>>>
>>
>>
>>
>> --
>> Best wishes
>> Ivan Png
>> Skype: ipng00
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/