引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计
特定模型的似然函数或后验分布
贝叶斯预测
贝叶斯方法提供了与传统频数方法不同的视角,参数估计、模型选择等。
贝叶斯估计通过蒙特卡洛模拟获得参数的抽样分布,更适合于复杂的模型。
贝叶斯推断的优势:将多种不确定性,包括数据、模型和参数,融合到统一的框架内。
引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计
特定模型的似然函数或后验分布
贝叶斯预测
随机过程\(X_t\)取不同数值\(S=(1,2,\cdots, s)\),转移概率为 \[ p_{ij} = P(X_{t+1}=j|X_t=i), i,j \in S. \]
比如, \[ P=\begin{bmatrix} p_{11} & p_{12} \\ p_{21} & p_{22} \end{bmatrix} = \begin{bmatrix} 1-\alpha & \alpha \\ \beta & 1-\beta \end{bmatrix}. \]
马尔可夫特征:随机变量的\(t\)期状态只取决于\(t-1\)期状态。即下一步去哪里只取决于变量当前所在的位置,而不取决于是怎么到达当前位置的。
马尔科夫链:满足马尔科夫特征的随机过程。
如果\(\pi_t=(\pi_1, \cdots, \pi_s)\),那么\(\pi_{t+1} = \pi_t P\). \[ \begin{bmatrix} \pi_{1,t+1} & \pi_{2,t+1} \end{bmatrix} = \begin{bmatrix} \pi_{1t} & \pi_{2t} \end{bmatrix} \begin{bmatrix} p_{11} & p_{12} \\ p_{21} & p_{22} \end{bmatrix} \]
\(\pi_{t+n} = \pi_{t+n-1} P=\pi_{t+n-2} P^2=\pi_{t} P^n\).
平稳分布:如果分布\(\pi_t\)不再变化,即\(\pi = \pi P\),\(\pi\)称为平稳分布。
\(\pi (I-P) = 0\),所以\(\pi\)为P矩阵特征值1对应的特征向量。
如果转移矩阵为 \[ \begin{bmatrix} 0.75 & 0.25 \\ 0.125 & 0.875 \end{bmatrix} \] 可以求出其对应的平稳分布为(1/3, 2/3).
对于一个转移矩阵,平稳分布是不是唯一的? \[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]
无穷多平稳分布:转移矩阵为 \[ \begin{bmatrix} 1/3 & 2/3 & 0 \\ 1/3 & 2/3 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]
令\(P^{(n)} = \pi_{t} P^n\),如果\(P^{(n)}_{ij}>0\),称\(j\)从\(i\)可达(accessible)。
如果\(P^{(n)}_{ij}>0\),\(P^{(n)}_{ji}>0\),称\(i\)与\(j\)互通(communicate)。一组互通的状态构成互通类(communicate class)。
只包含一个互通类的马尔科夫链称为不可约的(irreducible)。
定理:不可约的马尔科夫链具有唯一的平稳分布。
含义:对于目标分布,如果能找到其对应的马尔科夫链,那么总可以到达该分布。
如何找到马尔科夫链:MCMC
MCMC: Gibbs抽样(Geman and Geman, 1984), Metropolis-Hastings抽样(Metropolis et al., 1953; Hastings, 1970).
MH抽样;设目标分布为\(f(x)\),
给定\(x_t\), 定义工具分布\(q\), 从\(q(y|x_t\))\(抽取随机数\)y_t$
定义接受概率(acceptance probability): \(\rho(x_t, y_t)=\min \left[ \frac{f(y_t)}{f(x_t)} \frac{q(x_t|y_t)}{q(y_t|x_t)}, 1 \right]\), \[ x_{t+1} = \begin{cases} y_{t} & \rho(x_t, y_t) \\ x_{t} & 1- \rho(x_t, y_t) \\ \end{cases} \]
对称情形:\(\rho(x_t, y_t)=\min \left[ \frac{f(y_t)}{f(x_t)}, 1 \right]\),
独立情形:\(\rho(x_t, y_t)=\min \left[ \frac{f(y_t)}{f(x_t)} \frac{q(x_t)}{q(y_t)}, 1 \right]\),
target: Beta(2.7, 6.3); proposal: uniform(0,1).
. clear . set obs 5000 Number of observations (_N) was 0, now 5,000. . gen x=. (5,000 missing values generated) . mata: ───────────────────────────────────────────────── mata (type end to exit) ─────────────────────────────────────────────────── : n = 5000 : x = runiform(n,1) : for (i=2; i<=n; i++) { > y = runiform(1,1) > rho = betaden(2.7, 6.3, y)/betaden(2.7, 6.3, x[i-1]) > x[i] = runiform(1,1)<rho ? y : x[i-1] > } : st_store((1,n), "x", x) : end ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── . twoway (function y=betaden(2.7,6.3,x), range(0 1)) (kdensity x, lp(dash))
target: Cauchy; proposal: t(1).
. clear . set obs 2000 Number of observations (_N) was 0, now 2,000. . gen x=. (2,000 missing values generated) . mata: ───────────────────────────────────────────────── mata (type end to exit) ─────────────────────────────────────────────────── : n = 5000 // burn-in sample:3000 : x = J(n,1,0) // initializing : for (i=2; i<=n; i++) { > y = rt(1,1,1) // proposal density: t(1) > denom = cauchyden(0,1,x[i-1])*tden(1,y) > rho = cauchyden(0,1,y)*tden(1,x[i-1])/denom > x[i] = runiform(1,1)<rho ? y : x[i-1] > } : st_store((1,2000), "x", x[3001..n]) : end ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── . twoway (function y=1/(c(pi)*(1+x^2)), range(-10 10)) /// > (kdensity x if abs(x)<10, lp(dash))
给定\(x_t\), 抽取随机数\(y_t = x_t + u_t, u_t \sim q(u)\)
target: Beta(2.7, 6.3); proposal: uniform(0,1).
. clear . set obs 5000 Number of observations (_N) was 0, now 5,000. . gen x=. (5,000 missing values generated) . mata: ───────────────────────────────────────────────── mata (type end to exit) ─────────────────────────────────────────────────── : n = 5000 : x = runiform(n,1) : for (i=2; i<=n; i++) { > y = x[i-1] + rnormal(1,1,0,1) > rho = betaden(2.7, 6.3, y)/betaden(2.7, 6.3, x[i-1]) > x[i] = runiform(1,1)<rho ? y : x[i-1] > } : st_store((1,n), "x", x) : end ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── . twoway (function y=betaden(2.7,6.3,x), range(0 1)) (kdensity x, lp(dash)) , legend(off) xtitle("")
将随机变量分为两组,\((x_t, y_t)\),联合密度为\(f(x,y)\)。
抽取\(x_0\)
\(t=1,2,\cdots\),抽取\(y_{t+1} \sim f(y|x_t)\), \(x_{t+1} \sim f(x|y_{t+1})\)
例:二元正态分布。
抽取\(x_0=0\)
\(t=1,2,\cdots\),抽取 \[ \begin{eqnarray*} y_{t+1} | x_t & \sim & N \left( \frac{\sigma_1}{\sigma_2} \rho x_t,(1-\rho^2) \sigma_1^2 \right) \\ x_{t+1} | y_{t+1} & \sim & N \left( \frac{\sigma_2}{\sigma_1} \rho y_{t+1},(1-\rho^2) \sigma_2^2 \right) \\ \end{eqnarray*} \]
可以直接扩展到随机变量分为多组的情况。
对协方差矩阵的抽样一般采用Gibbs抽样,inverse Wishart为最广泛采用的先验分布。
设\(T_0\)为burn-in样本量,\(T\)为MCMC的样本量,整个MCMC迭代的次数为\(T_{total}=T_0+(T-1)\times thinning + 1\)。
MH抽样生成试验值的公式为: \(\theta_{*} = \theta_{t-1} + e, e \sim N(0, \rho_k^2 \Sigma_k)\),所谓适应性MH抽样是指每隔一定的区间调整\(\rho_k\)和\(\Sigma_k\),以实现最优的接受率(TAR)。Gelman, Gilks and Roberts (1997)证明,单一参数的TAR为0.44,多个参数的TAR为0.234。
设Alen为每个适应性区间的长度(adaptation(every()))。可以设置最小迭代次数(adaptation(miniter()))和最大迭代次数(adaptation(maxiter()))。
初始化:\(\theta_t = \theta_0^{(f)}\),\(\rho_0=2.38/\sqrt{d}\),其中d为参数个数。\(\Sigma_0 = I\)或者由covariance()选项自行设定。
生成试验值: \(\theta_{*} = \theta_{t-1} + e, e \sim N(0, \rho_k^2 \Sigma_k)\)。
接受概率为 \[ \alpha(\theta_{*}|\theta_{t-1}) = \text{min} \left{ \frac{p(\theta_{*}|y)}{p(\theta_{t-1}|y)}, 1 \right}. \]
其中,\(p(\theta_{*}|y) = f(y|\theta) p(\theta)\)为\(\theta\)的后验分布。
\(\rho_k\)和\(\Sigma_k\)的更新公式为: \[ \rho_k = \rho_{k-1} \exp( \beta_k [\Phi^{-1}(AR_k/2) - \Phi^{-1}(TAR/2) ] ) \]
其中, \[ AR_k = (1-\alpha) AR_{k-1} + \alpha AP \]
AP为接受概率\(\alpha(\theta_{*}|\theta_{t-1})\)的均值。\(\alpha\)的默认值为0.75,或者由adaptation(alpha())自行设定。
\[ \Sigma_k = (1-\beta_k) \Sigma_{k-1} + \beta_k \hat{\Sigma}_k, \beta_k = \beta_0/k^{\gamma}. \]
其中,\(\Sigma_0 = I\)或者由covariance()选项自行设定;\(\hat{\Sigma}_k\)为MCMC样本的协方差。\(\beta_0\)的默认值为0.8,或者由adaptation(beta())自行设定。\(\gamma\)的默认值为0,或者由adaptation(gamma())自行设定。
引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计与检验
特定模型的似然函数或后验分布
贝叶斯预测
\[ \pi(\theta|y) = \frac{f(y|\theta) \pi(\theta)}{f(y)} \propto f(y|\theta) \pi(\theta). \]
其中,\(f(y|\theta) \pi(\theta)\)叫做贝叶斯核(kernel)。
如果先验分布\(\pi(\theta)\)对后验分布的影响非常小, 称其是无信息的(noninformative, vague, diffuse, flat)。
如果\(\pi(\theta)\)的积分不等于1,称其是不恰当的(improper)。
Jeffreys先验:\(\pi(\theta) \propto |I(\theta)|^{1/2}\),其中,\(I(\theta)\)为Fisher信息矩阵, \[ I(\theta) = -E\left[ \frac{\partial^2 \log(p(y|\theta))}{\partial \theta^2} \right]. \]
模型: \(y=x\beta + u\)
似然函数: \(y_i \sim Normal(x_i \beta, \sigma^2)\)
先验: \[ \begin{eqnarray*} \beta & \sim & \text{Mumltivariate-Normal}(\beta_0, V_0) \\ \sigma^2 & \sim & \text{Inverse-Gamma}(\alpha_0/2, \delta_0/2) \\ \end{eqnarray*} \]
一般的逆Gamma分布,InverseGamma[\(\alpha, \beta,\gamma,\mu\)] 定义域为(\(\mu, \infty\)),\(\mu\)为位置参数,\(\alpha, \gamma\)为形状参数,\(\beta\)为刻度参数。
InverseGamma[\(\alpha, \beta\)]表示InverseGamma[\(\alpha, \beta, 1, 0\)], pdf为 \[ pdf(x) = \frac{\exp(-\beta/x) (\beta/x)^{\alpha}}{x \Gamma(\alpha)} \]
\[ E(x) = \frac{\beta}{\alpha-1} (\text{ for } \alpha>1); Var(x) = \frac{\beta^2}{(\alpha-2)(\alpha-1)^2} (\text{ for } \alpha>2). \]
如果\(x\sim Gamma(\alpha, \beta)\),那么\(1/x \sim Inv-Gamma(\alpha, 1/beta)\).
边际后验分布: \[ \beta | \sigma^2,y \sim \text{Mumltivariate-Normal} \left( \beta_1, V_1 \right) \]
其中, \[ V_1 = \left[ \sigma^{-2} X'X + V_0^{-1} \right]^{-1}, \beta_1 = V_1 \left[ \sigma^{-2} X'y + V_0^{-1} \beta_0 \right] \]
\[ \sigma^2 | \beta,y \sim \text{Inverse-Gamma}\left( (\alpha_0+n)/2, [\delta_0+(y-x\beta)'(y-x\beta)]/2 \right) . \]
\(\beta\) are sampled in one block (Greenberg, 2013).
设定初始值\(\sigma^{2(0)}\).
第\(t\)次迭代, 从下面的分布抽样 \[ \begin{eqnarray*} \beta^{(t)} | \sigma^2,y & \sim & \text{MNormal} \left( \beta_1, V_1 \right) \\ \sigma^2 | \beta,y & \sim & \text{IG}\left( (\alpha_0+n)/2, [\delta_0+(y-x\beta^{(t)})'(y-x\beta^{(t)})]/2 \right) . \end{eqnarray*} \]
bayes [, bayesopts] : estimation_command [, estopts]
bayesopts | note |
---|---|
gibbs | 只适用于regress, mvreg等模型 |
normalprior(#) | 回归系数的正态先验分布的标准差,默认值为100 |
igammaprior(# #) | 回归模型的方差的先验分布:inverse-gamma分布(shape, scale),默认值为igammaprior(0.01 0.01) |
iwishartprior(#) | 随机效应协方差矩阵的先验分布:Inverse-wishart分布(degree of freedom) |
prior(priorspec) | 自己设定先验分布 |
其中,priorspec的形式为:{[eqname:]param[, matrix]}, priordist。比如,
prior(educ, normal(0, 9)) prior(age, uniform(-3,3))
prior({dep:}, mvnormal(3, mvec, vmat))
. use mroz, clear . bayes, normalprior(10) igammaprior(0.01 0.01) : regress lwage educ age exper Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ regress(xb_lwage,{sigma2}) Priors: {lwage:educ age exper _cons} ~ normal(0,100) (1) {sigma2} ~ igamma(.01,.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .3342 Efficiency: min = .06385 avg = .0982 Log marginal-likelihood = -467.9285 max = .2099 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1087602 .0140838 .000557 .1092518 .081044 .1356251 age │ -.0011846 .0048595 .000184 -.0011781 -.0108816 .008066 exper │ .0160051 .0044727 .000167 .0162019 .0069124 .0243769 _cons │ -.3458634 .2641602 .009615 -.3501628 -.885464 .1888002 ─────────────┼──────────────────────────────────────────────────────────────── sigma2 │ .4509639 .0315607 .000689 .4500597 .394099 .5204984 ─────────────┴──────────────────────────────────────────────────────────────── Note: Default priors are used for some model parameters.
. use mroz, clear . bayes, gibbs normalprior(10) igammaprior(0.01 0.01) : regress lwage educ age exper Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normal(xb_lwage,{sigma2}) Priors: {lwage:educ age exper _cons} ~ normal(0,100) (1) {sigma2} ~ igamma(.01,.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian linear regression MCMC iterations = 12,500 Gibbs sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = 1 Efficiency: min = .9761 avg = .9952 Log marginal-likelihood = -467.84857 max = 1 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1092317 .0142794 .000143 .1093622 .0815355 .1376194 age │ -.0013859 .0048177 .000048 -.0013814 -.0109141 .0080745 exper │ .0163677 .0046453 .000046 .0163429 .0072553 .0253992 _cons │ -.3475648 .2653856 .002654 -.3446139 -.8709864 .1654805 ─────────────┼──────────────────────────────────────────────────────────────── sigma2 │ .4504266 .0311056 .000315 .4491257 .393057 .5144107 ─────────────┴──────────────────────────────────────────────────────────────── Note: Default priors are used for some model parameters.
引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计与检验
特定模型的似然函数或后验分布
贝叶斯预测
收敛性(是否达到平稳分布)
踪迹图(围绕常数均值波动)
核密度图(模拟样本分为前后两段,比较其分布,或者做等均值等方差检验)
自相关
Gelman-Rubin (shrink factor)统计量:多条马尔科夫链进行抽样,那么链之间的平均差异与链内的平均差异应该进行相等。二者的比例称作收缩因子(shrink factor),作为经验规则, 收缩因子应低于1.1。
Efficiency: 大于0.1,视作良好。低于0.01,严重关注。
bayesgraph type which
其中,type包括:diag
, trace
, histogram
, kdensity
, ac
, cusum
, matrix
.
which包括: _all, {[*eqname*:]*param*}
. bayesgraph diag {lwage:educ}
. bayesgraph ac {lwage:}, byparm
所有参数:
. bayesstats ic Bayesian information criteria ─────────────┬──────────────────────────────── │ DIC log(ML) log(BF) ─────────────┼──────────────────────────────── active │ 877.4558 -467.8486 . ─────────────┴──────────────────────────────── Note: Marginal likelihood (ML) is computed using Laplace–Metropolis approximation. . bayesstats ess Efficiency summaries MCMC sample size = 10,000 Efficiency: min = .9761 avg = .9952 max = 1 ─────────────┬────────────────────────────────────── │ ESS Corr. time Efficiency ─────────────┼────────────────────────────────────── lwage │ educ │ 10000.00 1.00 1.0000 age │ 10000.00 1.00 1.0000 exper │ 10000.00 1.00 1.0000 _cons │ 10000.00 1.00 1.0000 ─────────────┼────────────────────────────────────── sigma2 │ 9761.25 1.02 0.9761 ─────────────┴──────────────────────────────────────
所有参数:
bayesgraph trace {lwage:}, byparm
bayesgraph kdensity {lwage:}, byparm
bayesgraph hist {lwage:}, byparm
bayesgraph ac {lwage:}, byparm
单个参数:
bayesgraph trace {lwage:educ}
bayesgraph kdensity {lwage:educ}
bayesgraph hist {lwage:educ}
bayesgraph ac {lwage:educ}
bayesstats ic: 三个统计指标。
边际似然函数
dic (应选择dic最低的模型)
bf (贝叶斯因子=两个模型的边际似然函数的比(相同的样本))
贝叶斯因子 (Jeffreys, 1961): \[ BF_{12} = \frac{P(y|M_1)}{P(y|M_2)} = \frac{m_1(y)}{m_2(y)}. \]
posterior odds: \[ \frac{P(M_1|y)}{P(M_2|y)} = \frac{P(y|M_1)P(M_1)}{P(y|M_2)P(M_2)} = BF_{12} \frac{P(M_1)}{P(M_2)}. \]
\(\log_{10}(bf)\) | bf | 结论 |
---|---|---|
0 - 1/2 | 1 - 3.2 | |
1/2 - 1 | 3.2 - 10 | 较强 (substantial) |
1 - 2 | 10 - 100 | 强 (strong) |
>2 | ||
> 100 | 一致 (decisive) |
Kass and Raftery (1995): 贝叶斯因子:
\(2\log_{e}(bf)\) | bf | 结论 |
---|---|---|
0 - 2 | 1 - 3 | |
2 - 6 | 3 - 20 | positive |
6 - 10 | 20 - 150 | strong |
>10 | > 150 | very strong |
bayestest model ...
可以用于比较不同的先验、不同的后验分布、不同的回归函数、不同的解释变量等。
条件:不同模型使用完全相同的样本;具有恰当的后验分布;MC收敛。
对参数落在某个区间的检验:
bayestest interval ({y:x1},lower(.9) upper(1.02)) ({y:x2},lower(-1.1) upper(-.8)), joint
Stata对不同模型的不同类型的参数设置了不同的先验分布。
回归系数的先验分布默认为独立的正态分布\(N(0, \sigma_{prior}^2)\),其中\(\sigma_{prior}^2=10,000\)。
正值的参数(比如方差)默认先验分布为Inverse-Gamma(\(\alpha, \beta\))。默认值为\(\alpha=0.01\), \(\beta=0.01\)。
排序选择模型的切点:先验分布为Unifrom(0,1)。
多水平模型:随机效应协方差矩阵
independent, identity: Inverse-Gamma(\(\alpha, \beta\))
unstructured: Invser-Wishart(d+1, I(d)),其中d为协方差矩阵的维度,d+1为自由度。
引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计与检验
特定模型的似然函数或后验分布
贝叶斯预测
bayesmh可以用于估计单方程线性模型、单方程非线性模型、多方程线性模型、多方程非线性模型、多水平模型、概率分布等。
bayesmh depvar [indepvarspec] [if] [in] [weight], likelihood
(modelspec) prior
(priorspec) llevaluator
(log-likelihood) evaluator
(log-postdist) [options]
三种用法:
likelihood() + prior(): 内置似然函数 + 内置先验分布
llevaluator() + prior(): 自定义的对数似然函数 + 内置先验分布
evaluator(): 自定义的对数后验分布
似然函数即概率密度函数,不同分布涉及到的参数也不同。比如,正态分布有均值和方差两个参数;泊松分布只有一个参数,既为均值也是方差。
bayesmh中,分布的均值参数通过模型来设定。分布的形式和其它参数则通过likelihood()来设定。比如,
bayesmh y x1 x2, likelihood(normal({var})),表示\(y\sim N(b_0+b_1 x_1 + b_2 x_2, var)\)
bayesmh y x1 x2, likelihood(poisson),表示\(y\sim Poisson(exp(b_0+b_1 x_1 + b_2 x_2))\)
bayesmh y x1 x2, likelihood(probit),表示\(y\sim Bernoulli(\Phi(b_0+b_1 x_1 + b_2 x_2))\)
likelihood() | 似然函数 | 分布形式 |
---|---|---|
normal(var) | 正态分布 | \(y\sim N(xb, var)\) |
lognormal(var) | 对数正态分布 | \(\ln(y)\sim N(xb, var)\) |
exp | 指数分布 | \(y \sim \exp(xb)\) |
probit | 贝努里分布 | \(P(y=1)=\Phi(xb)\) |
logit | 贝努里分布 | \(P(y=1)=\Lambda(xb)\) |
oprobit | 多项式分布,概率为排序probit | \(P(y=j)=P(c_{j-1}<y^*<c_j)\) |
ologit | 多项式分布,概率为排序logit | \(P(y=j)=P(c_{j-1}<y^*<c_j)\) |
poisson | 泊松分布 | \(P(y=j)=Poisson(xb)\) |
binomial(n) | 二项分布 | \(P(y=j)=Binoamial(n,)\) |
stexp | 指数生存函数 | |
stgamma(lns) | Gamma生存函数(对数标度参数lhs) | |
stloglogistic(lns) | log-logistic生存函数(对数标度参数lhs) | |
stlognormal(lnstd) | 对数正态生存函数(对数标准差lnstd) | |
stweibull(lnp) | Weibull生存函数(对数形状参数lnp) | |
llf() | 对数似然函数的表达式 |
Stata对不同模型的不同类型的参数设置了不同的先验分布。
回归系数的先验分布默认为独立的正态分布\(N(0, \sigma_{prior}^2)\),其中\(\sigma_{prior}^2=10,000\)。
正值的参数(比如方差)默认先验分布为Inverse-Gamma(\(\alpha, \beta\))。默认值为\(\alpha=0.01\), \(\beta=0.01\)。
排序选择模型的切点:先验分布为Unifrom(0,1)。
多水平模型:随机效应协方差矩阵
independent, identity: Inverse-Gamma(\(\alpha, \beta\))
unstructured: Invser-Wishart(d+1, I(d)),其中d为协方差矩阵的维度,d+1为自由度。
priordist | note |
---|---|
normal(mu, var) | Normal(m, v) |
t(mu, s2, df) | location-scale t with mean mu, squared scale sigma2, and degrees of freedom df |
lognormal(mu, var) | log-normal(mu, var) |
uniform(a,b) | uniform on (a,b) |
gamma(a,b) | gamma with shape a and scale b |
igamma(a,b) | inverse-gamma with shape a and scale b |
exponential(b) | gamma with scale b |
beta(a,b) | beta with shape a and b |
laplace(mu,b) | laplace with location loc and scale b |
cauchy(loc,b) | beta with shape a and b |
chi2(df) | chi2 with degree of freedom |
pareto(a,b) | pareto with shape a and scale b |
jeffreys | Jeffreys prior |
priordist | note |
---|---|
mvnormal(d, mu, var) | d-dimensions with mean (mu) and vaiance (var) |
mvnormal0(d, var) | d-dimensions with zero-mean and vaiance (var) |
zellnersg(d, g, mu, {var}) | Zellner-g prior of dimension d with g dof, mean mu and variance {var} |
zellnersg0(d, g, mu, {var}) | Zellner-g prior of dimension d with g dof, zero-mean and variance {var} |
dirichlet(a1, a2, …, ad) | Dirichlet (multivariate beta) of dimension d with shape (a1,a2,…,ad) |
wishart(d,df, V) | Wishart of dimension d with DOF df and scale matrix V |
iwishart(d,df, V) | inverse-Wishart of dimension d with DOF df and scale matrix V |
jeffreys(d) | Jeffreys prior of d-variate normal dist. |
priordist | note |
---|---|
bernoulli(p) | 0 or 1 (with probability p) |
geometric(p) | 0, 1, 2, … with geometric(p) |
index(p1,…,pk) | 1, 2, …, k with prob. (p1, p2, …, pk) |
poisson(mu) | 0, 1, 2, … with Poisson(mu) |
priordist | note |
---|---|
flat | |
density(f) | generic density f |
logdensity(logf) | generic log-density logf |
note: 扁平(flat)先验要谨慎使用。Flat先验是不恰当的(improper),即先验分布的积分并不为1(事实上为无穷大),可能导致不恰当的后验分布而使得无法进行贝叶斯推断。
Jeffreys先验:均值参数的Jeffreys先验即扁平先验: \(\pi(\mu)=1\);标准差参数的Jeffreys先验为倒数先验(inverse prior): \(\pi(\sigma)=1/\sigma\).
. use mroz, clear . bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, flat) prior({var}, igamma(0.01, 0.01)) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normal(xb_lwage,{var}) Priors: {lwage:educ age exper _cons} ~ 1 (flat) (1) {var} ~ igamma(0.01,0.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian normal regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .2752 Efficiency: min = .03065 avg = .04804 Log marginal-likelihood = -455.02345 max = .07165 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1093384 .0144143 .000675 .1094341 .0807039 .1364698 age │ -.001229 .0048322 .000221 -.0012917 -.0106893 .0081376 exper │ .0163394 .0045813 .000171 .0162681 .0074164 .0253667 _cons │ -.3510312 .2650894 .012581 -.3514681 -.8570201 .1873868 ─────────────┼──────────────────────────────────────────────────────────────── var │ .4498116 .030934 .001767 .4483777 .3933691 .5150745 ─────────────┴──────────────────────────────────────────────────────────────── Note: Adaptation tolerance is not met.
. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, normal(0,10000)) prior({var},igamma(0.01, 0.01)) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normal(xb_lwage,{var}) Priors: {lwage:educ age exper _cons} ~ normal(0,10000) (1) {var} ~ igamma(0.01,0.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian normal regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .253 Efficiency: min = .03411 avg = .04046 Log marginal-likelihood = -477.13055 max = .05554 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1086604 .0143401 .000608 .1092476 .0799753 .1362948 age │ -.0018785 .0048822 .000256 -.0019494 -.0110175 .0084529 exper │ .0169774 .0044795 .000243 .0168361 .008705 .0264318 _cons │ -.3258351 .2698549 .013326 -.3212487 -.857303 .1921515 ─────────────┼──────────────────────────────────────────────────────────────── var │ .450493 .0319438 .001698 .4494177 .3939383 .5191762 ─────────────┴────────────────────────────────────────────────────────────────
. bayesmh inlf age educ kids*, likelihood(logit) prior({inlf:}, flat) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: inlf ~ logit(xb_inlf) Prior: {inlf:age educ kidslt6 kidsge6 _cons} ~ 1 (flat) (1) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_inlf. Bayesian logistic regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 753 Acceptance rate = .1607 Efficiency: min = .02098 avg = .04698 Log marginal-likelihood = -475.37675 max = .05996 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed inlf │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ -.0647113 .0128096 .000581 -.0650733 -.089756 -.0393152 educ │ .2020967 .0383368 .001757 .2038444 .1280694 .2722752 kidslt6 │ -1.496497 .1902067 .013131 -1.483813 -1.880425 -1.124643 kidsge6 │ -.0939495 .0671387 .002742 -.0927716 -.2155951 .0316578 _cons │ 1.041166 .7975184 .033216 1.062803 -.5794387 2.618541 ─────────────┴────────────────────────────────────────────────────────────────
. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:educ}, normal(0.1, 0.4)) prior({lwage:age}, normal(-0. > 01, 1)) prior({lwage:exper}, normal(0.1, 1)) prior({lwage:_cons}, normal(2, 10)) prior({var},igamma(0.01, 0.01)) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normal(xb_lwage,{var}) Priors: {lwage:educ} ~ normal(0.1,0.4) (1) {lwage:age} ~ normal(-0.01,1) (1) {lwage:exper} ~ normal(0.1,1) (1) {lwage:_cons} ~ normal(2,10) (1) {var} ~ igamma(0.01,0.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian normal regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .2158 Efficiency: min = .001864 avg = .02366 Log marginal-likelihood = -459.584 max = .08343 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1080999 .0163228 .003015 .1085795 .076095 .1408135 age │ -.0016291 .0051764 .000886 -.0016677 -.0122493 .0081089 exper │ .0164365 .004882 .000169 .0164146 .0069865 .0262597 _cons │ -.3244789 .3187475 .073838 -.3318201 -.977361 .3921511 ─────────────┼──────────────────────────────────────────────────────────────── var │ .4539588 .030861 .00189 .4522007 .3942803 .5171512 ─────────────┴──────────────────────────────────────────────────────────────── Note: There is a high autocorrelation after 500 lags.
. qui reg lwage educ age exper . matrix vec=e(b) . matrix cov=e(V) . bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, mvnormal(4,vec,cov)) prior({var},igamma(0.01, 0.01) > ) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normal(xb_lwage,{var}) Priors: {lwage:educ age exper _cons} ~ mvnormal(4,vec,cov) (1) {var} ~ igamma(0.01,0.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian normal regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .1926 Efficiency: min = .001502 avg = .02856 Log marginal-likelihood = -442.16765 max = .07191 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1078397 .0078649 .001125 .1079189 .0934599 .1234291 age │ -.0019469 .0027718 .000339 -.0018452 -.0075171 .0033644 exper │ .0165112 .003203 .000119 .016684 .0101089 .0225106 _cons │ -.3074617 .1048001 .027043 -.3049968 -.5511863 -.1230595 ─────────────┼──────────────────────────────────────────────────────────────── var │ .4480066 .0291848 .001213 .4463937 .3955503 .5106466 ─────────────┴──────────────────────────────────────────────────────────────── Note: There is a high autocorrelation after 500 lags.
prior({eq:}, normal(0,10000))设定各个参数是独立的。可以通过两种方式允许多个参数是相关的:
设定多个参数服从多元分布(多元正态分布等)。
Zellner-g prior
\[ \beta | \sigma^2 \sim MVN(\mu_0, g\sigma^2 (x'x)^{-1}) \]
其中,\(g\)为g-prior的自由度。
Stata设定:zellnersg(d, g, mu, sigsq)
其中,sigsq往往直接设定为{var},即zellnersg(d, g, mu, {var})
. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, zellnersg(4,30,vec,{var})) prior({var},igamma(0.01, > 0.01)) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normal(xb_lwage,{var}) Priors: {lwage:educ age exper _cons} ~ zellnersg(4,30,vec,{var}) (1) {var} ~ igamma(0.01,0.01) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian normal regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .1603 Efficiency: min = .00119 avg = .01802 Log marginal-likelihood = -446.16778 max = .04551 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1012417 .0115183 .001895 .1013191 .0791962 .1233749 age │ -.0037958 .0041686 .000711 -.0037038 -.0119122 .0040406 exper │ .01706 .0043192 .000227 .017137 .0083099 .025188 _cons │ -.1531522 .1799528 .052174 -.1626343 -.46208 .1529861 ─────────────┼──────────────────────────────────────────────────────────────── var │ .4450226 .032087 .001504 .4429504 .387583 .5111025 ─────────────┴──────────────────────────────────────────────────────────────── Note: There is a high autocorrelation after 500 lags.
Gelman et al. (2014): Logit回归模型中,系数的先验分布为Cauchy分布,位置参数为\(a = 0\), 刻度参数为\(b = 2.5\)。
bayesmh y x, likelihood(logit) prior({y:x}, logdensity(ln(2.5)-ln(2.5^2+{y:x}^2)-ln(_pi))) prior({y:_cons}, logdensity(ln(10)-ln(10^2+{y:_cons}^2)-ln(_pi)))
引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计与检验
特定模型的似然函数或后验分布
贝叶斯预测
非线性模型需要明确指定参数和变量构成的表达式。
参数以大括号括起来,比如{b0}, {b1}等。
例:bayesmh y={b0} + {b1}*x1^{p} + {b2}*x2, 表示\(y=b_0+b_1 x_1^{p} + b_2 x_2\),
非线性单方程模型:eqname : dep = subexp
例: bayesmh mpg = {b0}+{b1}*price^{b2}, likelihood(…) prior(…)
非线性多方程模型:(eqname : dep = subexp) (…..)
估计CES生产函数;预测当capital=1.5, labor=2时的产出。
. use production, clear . bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(-1*{rho}))), likelihood(normal({sig > 2})) prior({sig2}, igamma(0.01, 0.01)) prior({b0}, normal(0, 100)) prior({delta}, uniform(0, 1)) prior({rho}, gamma(1, 1 > )) block({b0 delta rho sig2}, split) rseed(16) saving(ces_mcmc, replace) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lnoutput ~ normal({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta })*labor^(-1*{rho})),{sig2}) Priors: {sig2} ~ igamma(0.01,0.01) {b0} ~ normal(0,100) {delta} ~ uniform(0,1) {rho} ~ gamma(1,1) ────────────────────────────────────────────────────────────────────────────── Bayesian normal regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 100 Acceptance rate = .4385 Efficiency: min = .02283 avg = .1225 Log marginal-likelihood = -93.731032 max = .2461 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── b0 │ 3.829453 .1047645 .0059 3.829384 3.633927 4.027828 delta │ .4830306 .0636595 .001283 .4820599 .3540333 .6146103 rho │ 1.838675 .8665057 .057343 1.624436 .8088482 4.053971 sig2 │ .3098183 .043935 .001009 .3062381 .233141 .4054633 ─────────────┴──────────────────────────────────────────────────────────────── file ces_mcmc.dta saved.
线性多方程模型:(eqname : dep indepvars [, noconstant]) (……)
例:
bayesmh (eq:mpg price weight) (eq2:price weight foreign), likelihood(…) prior(…)
表示模型:
mpg = {eq:price}*price + {eq:weight}*weight + {eq:_cons}
price = {eq2:weight}*weight + {eq2:foreign}*foreign + {eq2:_cons}
先验分布设定:prior( parameter, dist )
例:定义方程eq中x1的系数的先验分布:prior({y:x1}, normal(0,100))
bayesmh y x1 x2, likelihood(normal({var})) prior({y:x1}, normal(0,100))
基本规则:参数以大括号括起来。比如,
单个参数{mu}
特定方程的单个参数:{eq:mu}
参数矩阵(一般为协方差矩阵):{Sigma, matrix}
特定方程的参数矩阵:{Cov:Sigma, matrix}
如果似然函数或先验分布比较复杂,不能由内置的分布函数或表达式来表示,可以写似然函数或后验分布程序来实现。
用evaluator()设置对数后验分布,用llevaluator()设置对数似然函数。
设置自己的对数后验分布:
bayesmh depvar indepvars [if] [in] , evaluator(program, spec)
设置自己的对数似然函数:
bayesmh depvar indepvars [if] [in] , llevaluator(program, spec)
基本架构:
program progname
args lnden xb1 [xb2 ...] [ modelparams]
... computations ...
scalar `lnden' = ...
end
args跟着对数似然函数(lnden)、线性表达式(xb1, xb2, …)、模型其它参数。
bayesmh设定的模型必须要与evaluator程序的args保持一致。比如,args lnden xb1 var,对应的bayesmh指令:
. bayesmh y x1 x2, evaluator(normaljeffreys, parameters({var}))
y x1 x2:因变量为y,自变量为x1和x2。
b0+b1*x1+b2*x2对应args的xb1
parameters({var})则对应args中的var。
例:args lnden xb1 xb2 tau var1 var2,对应的bayesmh指令:
. bayesmh (y1 x1 x2) (y2 x1 z2), evaluator(normaljeffreys, parameters({tau} {var1} {var2}))
其中,(y1 x1 x2) (y2 x1 z2): 模型包含两个方程,因变量分别为y1、y2,自变量分别为(x1, x2)和(x1, z2)。
b0+b1*x1+b2*x2对应args的xb1,c0+c1*x1+c2*z2对应args的xb2。
parameters({tau} {var1} {var2})则对应args中的tau var1 var2。
model | global macro | note |
---|---|---|
单方程 | $MH_y | 因变量 |
$MH_x1, $MH_x2, … | 第1个方程的第1、2、…个自变量 | |
$MH_xn | 第1个方程的自变量的个数 | |
多方程 | $MH_y1, $MH_y2, $MH_y3, … | 第1、2、3、…个方程的因变量 |
$MH_y1x1, $MH_y1x2, … | 第1个方程的第1、2、…个自变量 | |
$MH_y2x1, $MH_y2x2, … | 第2个方程的第1、2、…个自变量 | |
$MH_y1xn, $MH_y2xn, … | 第1、2、…个方程的自变量的个数 | |
通用 | $MH_touse | 样本范围 |
$MH_n | 样本量 | |
$MH_b | 系数向量 | |
$MH_bn | 参数个数 | |
$MH_m1, $MH_m2, … | 第1、2、…个矩阵参数 | |
$MH_mn | 矩阵参数的个数 | |
$MH_extravars | 其它变量 |
回归系数的先验为flat(密度为1,对数为0,因此在后验分布中不用考虑),方差的先验分布设定为Jeffreys先验。
. type normaljeffreys.ado program normaljeffreys version 16.1 args lnp xb var /* compute log likelihood */ tempname sd scalar `sd' = sqrt(`var') tempvar lnfj quietly generate double `lnfj' = lnnormalden($MH_y,`xb',`sd') if $MH_touse quietly summarize `lnfj', meanonly if r(N) < $MH_n { scalar `lnp' = . exit } tempname lnf scalar `lnf' = r(sum) /* compute log prior */ tempname lnprior /* case 1: variance: Jeffereys prior, coefficient: flat prior. */ scalar `lnprior' = -2*ln(`sd') /* compute log posterior */ scalar `lnp' = `lnf' + `lnprior' end
由于已经在evaluator设置了对数后验分布,bayesmh指令中不能再设置prior()或likelihood()。
. use mroz, clear . bayesmh lwage educ age, evaluator(normaljeffreys, parameters({var})) Burn-in ... note: invalid initial state. Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Posterior: lwage ~ normaljeffreys(xb_lwage,{var}) ────────────────────────────────────────────────────────────────────────────── Bayesian regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .2765 Efficiency: min = .06077 avg = .07804 Log marginal-likelihood = -452.12848 max = .09761 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1099913 .013842 .000443 .1105567 .0814029 .1356931 age │ .0067848 .0043219 .000175 .0067199 -.0016034 .0153288 _cons │ -.4874439 .2635526 .009471 -.4873747 -1.001772 .0526065 ─────────────┼──────────────────────────────────────────────────────────────── var │ .4626251 .0312279 .00113 .4615565 .4051353 .5263745 ─────────────┴────────────────────────────────────────────────────────────────
等价的做法:likelihood() + prior()
bayesmh lwage educ age, likelihood(normal({var})) prior({mpg:}, flat) prior({var}, jeffreys)
可以单独写对数似然函数,先验分布采用内置的分布。
例:注意被注释掉的几行语句。
. type normallnf.ado program normallnf version 16.1 args lnf xb var // lnp --> lnf tempname sd scalar `sd' = sqrt(`var') tempvar lnfj quietly generate double `lnfj'=lnnormalden($MH_y,`xb',`sd') if $MH_touse quietly summarize `lnfj', meanonly if r(N) < $MH_n { scalar `lnf' = . // lnp --> lnf exit } // tempname lnf scalar `lnf' = r(sum) // tempname lnprior // scalar `lnprior' = -2*ln(`sd') // scalar `lnp' = `lnf' + `lnprior' end
. bayesmh lwage educ age, llevaluator(normallnf, parameters({var})) prior({lwage:}, flat) prior({var}, jeffreys) Burn-in ... note: invalid initial state. Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: lwage ~ normallnf(xb_lwage,{var}) Priors: {lwage:educ age _cons} ~ 1 (flat) (1) {var} ~ jeffreys ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_lwage. Bayesian regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 428 Acceptance rate = .1388 Efficiency: min = .001144 avg = .005087 Log marginal-likelihood = -464.96066 max = .009786 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── lwage │ educ │ .1713085 .013087 .002016 .1711292 .1469124 .1960288 age │ .0262109 .0036357 .000504 .0263546 .0189688 .0330134 _cons │ -2.103083 .1503325 .04444 -2.137789 -2.353303 -1.825682 ─────────────┼──────────────────────────────────────────────────────────────── var │ .5224401 .038405 .003882 .5194123 .448735 .603441 ─────────────┴──────────────────────────────────────────────────────────────── Note: There is a high autocorrelation after 500 lags.
Logit模型的对数似然函数。
. type logitlnf.ado program logitlnf version 16.1 args lnf xb tempvar lnfj quietly generate `lnfj' = ln(invlogit(`xb')) if $MH_y == 1 & $MH_touse quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse quietly summarize `lnfj', meanonly if r(N) < $MH_n { scalar `lnf' = . exit } scalar `lnf' = r(sum) end
. bayesmh inlf kids* educ age, llevaluator(logitlnf) prior({inlf:}, flat) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: inlf ~ logitlnf(xb_inlf) Prior: {inlf:kidslt6 kidsge6 educ age _cons} ~ 1 (flat) (1) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_inlf. Bayesian regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 753 Acceptance rate = .2605 Efficiency: min = .03826 avg = .0633 Log marginal-likelihood = -475.38324 max = .0815 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed inlf │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── kidslt6 │ -1.483831 .1929396 .007068 -1.476455 -1.871804 -1.096678 kidsge6 │ -.096295 .0680707 .00348 -.0951521 -.2328876 .0339874 educ │ .201044 .0379359 .001761 .200977 .129362 .275341 age │ -.0638044 .0124056 .000435 -.0633938 -.08888 -.0399602 _cons │ 1.024522 .7731857 .028078 1.0043 -.4649703 2.55663 ─────────────┴────────────────────────────────────────────────────────────────
似然函数程序文件: normalthresh.ado
. sysuse taylor, clear . gen ygap1 = L.ygap (1 missing value generated) . bayesmh (eq1:ffr l.ffr l.pi l.ygap) (eq2: ffr l.ffr l.pi l.ygap), /// > llevaluator(normalthresh, parameters({tau} {var1} {var2}) extravars(ygap1)) /// > prior({eq1:}, normal(0,0.5)) prior({var1}, igamma(2,1)) /// > prior({eq2:}, normal(0,0.5)) prior({var2}, igamma(2,1)) /// > prior({tau}, uniform(-2,2)) /// > initial({eq1:} 0 {eq2:} 0 {tau} -2 {var1} 1 {var2} 1) /// > block({eq1:}) block({eq2:}) block({var1}) block({var2}) blocksummary /// > burnin(2000) mcmcsize(2000) Burn-in ... Simulation ... Model summary ────────────────────────────────────────────────────────────────────────────── Likelihood: ffr ffr ~ normalthresh(xb_eq1,xb_eq2,{tau},{var1},{var2}) Priors: {eq1:L.ffr L.pi L.ygap _cons} ~ normal(0,0.5) (1) {eq2:L.ffr L.pi L.ygap _cons} ~ normal(0,0.5) (2) {tau} ~ uniform(-2,2) {var1 var2} ~ igamma(2,1) ────────────────────────────────────────────────────────────────────────────── (1) Parameters are elements of the linear form xb_eq1. (2) Parameters are elements of the linear form xb_eq2. Block summary ────────────────────────────────────────────────────────────────────────────── 1: {eq1:L.ffr L.pi L.ygap _cons} 2: {eq2:L.ffr L.pi L.ygap _cons} 3: {var1} 4: {var2} 5: {tau} ────────────────────────────────────────────────────────────────────────────── Bayesian regression MCMC iterations = 4,000 Random-walk Metropolis–Hastings sampling Burn-in = 2,000 MCMC sample size = 2,000 Number of obs = 113 Acceptance rate = .3433 Efficiency: min = .03825 avg = .08808 Log marginal-likelihood = -132.2379 max = .261 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── eq1 │ ffr │ L1. │ .4263503 .1543339 .013155 .4467987 .1165634 .7892567 │ pi │ L1. │ 1.005484 .2931961 .025911 .9806879 .3473836 1.622831 │ ygap │ L1. │ .1055092 .2932414 .032391 .0986647 -.4638702 .6225725 │ _cons │ -.3022746 .5587455 .063881 -.2833094 -1.369574 .7045184 ─────────────┼──────────────────────────────────────────────────────────────── eq2 │ ffr │ L1. │ .9153278 .0236862 .002067 .9171459 .8683087 .9619808 │ pi │ L1. │ .3238712 .0551187 .004669 .3243831 .2182006 .4343945 │ ygap │ L1. │ .0801297 .0641431 .00581 .0773504 -.0436599 .1989345 │ _cons │ -.3283278 .1315534 .011183 -.3253656 -.6088118 -.0825758 ─────────────┼──────────────────────────────────────────────────────────────── tau │ -1.144866 .0370001 .002212 -1.157775 -1.182468 -1.030312 var1 │ 2.230633 .7411225 .055102 2.107737 1.227005 4.183789 var2 │ .2537051 .0384719 .001684 .2511057 .1879426 .3328634 ─────────────┴──────────────────────────────────────────────────────────────── Note: Adaptation continues during simulation.
. bayesgraph diagnostics {tau}
. bayesstats ess Efficiency summaries MCMC sample size = 2,000 Efficiency: min = .03825 avg = .08808 max = .261 ─────────────┬────────────────────────────────────── │ ESS Corr. time Efficiency ─────────────┼────────────────────────────────────── eq1 │ ffr │ L1. │ 137.64 14.53 0.0688 │ pi │ L1. │ 128.04 15.62 0.0640 │ ygap │ L1. │ 81.96 24.40 0.0410 │ _cons │ 76.50 26.14 0.0383 ─────────────┼────────────────────────────────────── eq2 │ ffr │ L1. │ 131.38 15.22 0.0657 │ pi │ L1. │ 139.38 14.35 0.0697 │ ygap │ L1. │ 121.88 16.41 0.0609 │ _cons │ 138.39 14.45 0.0692 ─────────────┼────────────────────────────────────── tau │ 279.67 7.15 0.1398 var1 │ 180.90 11.06 0.0905 var2 │ 521.93 3.83 0.2610 ─────────────┴──────────────────────────────────────
引言
马尔科夫链蒙特卡洛模拟(MCMC)
计量模型的贝叶斯估计与检验
特定模型的似然函数或后验分布
贝叶斯预测
参数是随机的,预测时从参数的后验分布随机抽样,得到参数的R个随机数,进而得到对每个观测值的预测,即每个观测值有R个预测(模拟)数值。
bayespredict {_ysim#[numlist]}
得到模拟的因变量的名称为:_ysim1_1, _ysim1_2, …,表示对第1个方程中第1、2、…个预测值的预测。
numlist:设定对哪些观测值进行预测。
. use production, clear . qui bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(-1*{rho}))), likelihood(normal( > {sig2})) prior({sig2}, igamma(0.01, 0.01)) prior({b0}, normal(0, 100)) prior({delta}, uniform(0, 1)) prior({rho}, gamma( > 1, 1)) block({b0 delta rho sig2}, split) rseed(16) saving(ces_mcmc, replace) . . bayespredict {_ysim}, saving(ces_pred, replace) rseed(16) Computing predictions ... file ces_pred.dta saved. file ces_pred.ster saved. . describe _ysim1_1 _ysim1_2 _ysim1_3 using ces_pred Variable Storage Display Value name type format label Variable label ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── _ysim1_1 double %10.0g Simulated lnoutput, obs #1 _ysim1_2 double %10.0g Simulated lnoutput, obs #2 _ysim1_3 double %10.0g Simulated lnoutput, obs #3
模拟的前5个观测值的统计指标。
. bayesstats summary {_ysim[1/5]} using ces_pred Posterior summary statistics MCMC sample size = 10,000 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── _ysim1_1 │ 3.111305 .564188 .005642 3.114486 2.013625 4.230432 _ysim1_2 │ 4.47848 .5647274 .006264 4.487651 3.362394 5.576457 _ysim1_3 │ 2.033675 .5562368 .005692 2.033444 .9360682 3.115028 _ysim1_4 │ 2.234464 .5692011 .005783 2.234235 1.130026 3.361621 _ysim1_5 │ 2.894212 .5655813 .005948 2.894129 1.789959 4.013839 ─────────────┴────────────────────────────────────────────────────────────────
预测值的其它统计函数:
bayesgraph histogram [label:]@*func(arg1* [, arg2)], saving(file)
bayesgraph histogram [label:]@*userprog, arg1* [arg2], saving(file)
其中,func为mata函数(该函数对列向量操作得到标量),比如mean, variance等。
arg1和arg2为{_ysim[#]}, {_resid[#]}(残差), 或{_mu[#]}(因变量的期望)。 预测值的函数(比如均值、方差、偏度等):
. bayesgraph histogram (resvar:@variance({_resid})) using ces_pred
可以直接得到R次预测数值的均值(或中位数、标准差)等统计量。
bayespredict newvar, mean median std cri outcome(depvar)
. bayespredict pmean, mean rseed(123) Computing predictions ... . twoway (kdensity lnoutput) (kdensity pmean, lp(dash)), legend(off) title(Observed vs replicated data)
贝叶斯预测:对每个观测值,随机生成对应的因变量的S个随机数,计算其均值。
. set obs 101 Number of observations (_N) was 100, now 101. . qui replace capital = 1.5 in 101 . qui replace labor = 2 in 101 . bayespredict pmean2, mean rseed(123) Computing predictions ... . bayespredict cilow ciupp, cri rseed(123) Computing predictions ... . list labor capital lnoutput pmean2 cilow ciupp in 96/101 ┌─────────────────────────────────────────────────────────────────┐ │ labor capital lnoutput pmean2 cilow ciupp │ ├─────────────────────────────────────────────────────────────────┤ 96. │ .4023916 .2614668 2.737468 2.678401 1.561842 3.798991 │ 97. │ 3.801785 .6329353 3.619768 3.782753 2.692533 4.906854 │ 98. │ .3353206 .2557932 2.292713 2.597483 1.527138 3.708973 │ 99. │ 31.5595 2.72437 5.026271 5.273558 4.144 6.378542 │ 100. │ 3.80443 .946865 3.363704 4.154781 3.045156 5.251562 │ ├─────────────────────────────────────────────────────────────────┤ 101. │ 2 1.5 . 4.362791 3.242146 5.457026 │ └─────────────────────────────────────────────────────────────────┘
直接预测用户自定义的其它指标:
bayespredict [label:]@*func(arg1* [, arg2)], saving(file)
bayespredict [label:]@*userprog, arg1* [arg2], saving(file)
. bayespredict (mu:@mean({_mu})), saving(mc2, replace) Computing predictions ... file mc2.dta saved. file mc2.ster saved.
计算统计量的概率值。统计量可以利用mata函数来计算。
. bayesstats ppvalues (mean:@mean({_ysim})) (min:@min({_ysim})) (max:@max({_ysim})) using ces_pred Posterior predictive summary MCMC sample size = 10,000 ─────────────┬───────────────────────────────────────────── T │ Mean Std. dev. E(T_obs) P(T>=T_obs) ─────────────┼───────────────────────────────────────────── mean │ 3.045143 .0787588 3.044554 .5026 min │ .5130189 .3401942 1.049675 .0365 max │ 5.84806 .3703789 5.703145 .626 ─────────────┴───────────────────────────────────────────── Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.
这些统计量的概率值越接近0.5,表明模拟值与实际值越吻合。
可以通过mata函数或者ado程序(或者二者混合使用)定义自己的统计量。
例:偏度和峰度。
. mata ───────────────────────────────────────────────── mata (type end to exit) ─────────────────────────────────────────────────── : real scalar skew(real colvector x) { skew() already exists r(3000); : return (sqrt(length(x))*sum((x:-mean(x)):^3)/(sum((x:-mean(x)):^2)^1.5)) 'return' found where almost anything else expected r(3000); : } expression invalid r(3000); : end ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
. capture prog drop kurt . program kurt 1. version 17 2. args kurtosis x 3. quietly summarize `x', detail 4. scalar `kurtosis' = r(kurtosis) 5. end
. bayespredict (skewness:@skew({_ysim})) (kurtosis:@kurt {_ysim}), saving(ces_teststat, replace) rseed(16) Computing predictions ... file ces_teststat.dta saved. file ces_teststat.ster saved.
. bayesstats summary {skewness} {kurtosis} using ces_teststat Posterior summary statistics MCMC sample size = 10,000 ─────────────┬──────────────────────────────────────────────────────────────── │ Equal-tailed │ Mean Std. dev. MCSE Median [95% cred. interval] ─────────────┼──────────────────────────────────────────────────────────────── skewness │ .1352403 .1606844 .001607 .1350271 -.181804 .4555612 kurtosis │ 2.646662 .2969499 .003685 2.613873 2.165103 3.318124 ─────────────┴──────────────────────────────────────────────────────────────── . bayesstats ppvalues {skewness} {kurtosis} using ces_teststat Posterior predictive summary MCMC sample size = 10,000 ─────────────┬───────────────────────────────────────────── T │ Mean Std. dev. E(T_obs) P(T>=T_obs) ─────────────┼───────────────────────────────────────────── skewness │ .1352403 .1606844 .1562801 .4449 kurtosis │ 2.646662 .2969499 2.252134 .9359 ─────────────┴───────────────────────────────────────────── Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.
预测时所模拟生成的前3个变量为:
. bayesreps yrep*, nreps(3) rseed(123) Computing predictions ... . list lnoutput yrep1 yrep2 yrep3 in 1/5 ┌───────────────────────────────────────────┐ │ lnoutput yrep1 yrep2 yrep3 │ ├───────────────────────────────────────────┤ 1. │ 2.933451 3.397611 3.147219 4.039656 │ 2. │ 4.613716 4.896418 4.197628 5.014223 │ 3. │ 1.654005 1.758073 1.464527 2.596067 │ 4. │ 2.025361 2.30658 1.71803 2.535869 │ 5. │ 3.165065 2.221415 2.840843 3.330061 │ └───────────────────────────────────────────┘
贝叶斯预测也可以用来评估模型的拟合优度。利用后验分布随机模拟因变量的取值,并与实际值进行比较。