贝叶斯推理 (2)

上一篇文章中提到的贝叶斯推理的一般步骤是:

  • 结合总体、样本和先验信息,得到参数贝叶斯后验分布%5Cpi%28%5Ctheta%7C%5Cbold%7Bx%7D%29
  • 数据后验预测分布p%28x%7C%5Cbold%7Bx%7D%29%20%3D%20%5Cint_%7B%5Ctheta%7Dp%28x%7C%5Ctheta%29%5Cpi%28%5Ctheta%7C%5Cbold%7Bx%7D%29d%5Ctheta,相对于先验预测分布p%28x%29%20%3D%20%5Cint_%7B%5Ctheta%7Dp%28x%7C%5Ctheta%29p%28%5Ctheta%29d%5Ctheta,后验预测分布增加了样本信息,因此更接近真实的预测分布。
  • 通过适当提取后验预测分布p%28X%7C%5Cbold%7Bx%7D%29,可以得到后验预测均值E%28X%7C%5Cbold%7Bx%7D%29%20%3D%20%5Cint_%7Bx%7Dxp%28x%7C%5Cbold%7Bx%7D%29dx和后验预测方差Var%28X%7C%5Cbold%7Bx%7D%29%20%3D%20%5Cint_%7B%5Ctheta%7D%28x-E%28X%7C%5Cbold%7Bx%7D%29%29%5Cpi%28%5Ctheta%7C%5Cbold%7Bx%7D%29d%5Ctheta

第二步和第三步都有积分计算,一般很难得到解析解。获得近似解的方法有两种,即蒙特卡洛采样法和变分推断法。

蒙特卡罗抽样

蒙特卡洛抽样是一种从目标分布中抽取样本,然后使用样本代替分布进行相关性计算的方法。该方法解决问题的基本步骤是:

  • 构造一个概率模型,使得要解决的问题正好是概率模型的一个参数,比如概率模型的期望值
  • 根据构建的概率模型生成样本
  • 从样本中构建一个估计器并给出问题的近似解

例如求解积分%5Cint%20f%28x%29p%28x%29dx,如果难以直接求解,可以通过蒙特卡洛采样法求解:

  • 目标积分其实等价于E_%7Bx%5Csim%20p%28x%29%7D%5Bf%28x%29%5D,这里我们先构造概率模型x%5Csim%20p%28x%29
  • 然后从分布p%28x%29中抽取几个样本x_1%2Cx_2%2C...%2Cx_n,将样本带入函数f%28x%29得到f%28x_1%29%2Cf%28x_2%29%2C...%2Cf%28x_n%29
  • 计算样本均值%5Coverline%7Bx%7D%20%3D%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5Enf%28x_n%29作为积分问题的近似解

如果E%28X%7C%5Cbold%7Bx%7D%29用蒙特卡洛抽样法求解,D用于表示样本%5Cbold%7Bx%7D或以下描述中样本的充分统计量T%28%5Cbold%7Bx%7D%29x表示特征,y表示要估计的值:

E%5By%7Cx%2C%20D%5D%20%3D%20%5Cint%20y%28%5Cint%20p%28y%7Cx%2CD%29%5Cpi%28%5Ctheta%7CD%29d%5Ctheta%29dy

%3D%5Cint%20%5Cint%20yp%28y%7Cx%2C%5Ctheta%29p%28%5Ctheta%7CD%29d%5Ctheta%20dy

%3D%5Cint%20%28%5Cint%20yp%28y%7Cx%2C%5Ctheta%29dy%29p%28%5Ctheta%7CD%29d%5Ctheta

%3D%5Cint%20E%5By%7Cx%2C%5Ctheta%5Dp%28%5Ctheta%7CD%29d%5Ctheta

当我们对似然度建模时,我们一般对期望值进行建模,例如线性回归建模:

y%20%3D%20%5Cbeta_0%2B%5Cbeta_1x%20%2B%20%5Cepsilon%2C%20%5Cepsilon%20%5Csim%20N%280%2C%20%5Csigma%5E2%29

同时评估两边的条件分布给出:

p%28y%7Cx%29%20%5Csim%20N%28%5Cbeta_0%20%2B%20%5Cbeta_1x%2C%20%5Csigma%5E2%29

或者我们将神经网络的输出采样为高斯分布的平均值:

p%28y%7Cx%29%20%5Csim%20N%28NN%28x%2C%5Ctheta%29%2C%5Csigma%5E2%29

于是上面的公式就变成了:

E%5By%7Cx%2CD%5D%3D%5Cint%20E%5By%7Cx%2C%5Ctheta%5Dp%28%5Ctheta%7CD%29d%5Ctheta%20%3D%20%5Cint%20NN%28x%2C%5Ctheta%29p%28%5Ctheta%7CD%29d%5Ctheta%5Capprox%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5En%20NN%28x%2C%5Ctheta_i%29

其中%5Ctheta_i%2C%20i%20%3D%201%2C%202%2C...%2Cn是从p%28%5Ctheta%7CD%29分布中采样的几个参数。

如果我们能从p%28%5Ctheta%7CD%29中取样,问题就可以解决。考虑下面的抽样问题。

拒绝抽样

假设目标抽样分布为p%28x%29,是一个比较复杂的分布,直接抽样并不容易。并且目标分布可能没有被归一化,用%5Ctilde%7Bp%7D%28x%29表示:

%5Ctilde%7Bp%7D%28x%29%20%3D%20Z_pp%28x%29

但我们可以从另外一个容易采样的分布q%28x%29中采样,比如高斯分布,这个分布称为Proposal分布。我对们q%28x%29分布乘以常数系数k,使得下面不等式恒成立:

%5Ctilde%7Bq%7D%28x%29%20%3D%20kq%28x%29%20%5Cge%20%5Ctilde%7Bp%7D%28x%29

拒绝抽样的步骤是:

  • q%28x%29分布中抽样得到样本x_i
  • 以概率%5Cfrac%7Bkq%28x_i%29-%5Ctilde%7Bp%7D%28x_i%29%7D%7Bkq%28x_i%29%7D丢弃样本x_i,或以概率%5Cfrac%7B%5Ctilde%7Bp%7D%28x_i%29%7D%7Bkq%28x_i%29%7D接受样本
  • 重复以上2步骤,到样本容量满足要求为止

样本的真实概率为q%28x_i%29%20%2A%20%5Cfrac%7BZ_pp%28x_i%29%7D%7Bkq%28x_i%29%7D%20%3D%20%5Cfrac%7BZ_p%7D%7Bk%7Dp%28x_i%29,等于目标分布乘以一个不影响数据分布的常数系数,采样得到的样本服从目标分布p%28x%29

下面是一个完整的例子,假设目标分布是混合高斯分布%5Ctilde%7Bp%7D%28x%29

%5Ctilde%7Bp%7D%28x%29%20%3D%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%5Csigma_1%5E2%7D%7De%5E%7B-%5Cfrac%7B%28x-%5Cmu_1%29%7D%7B2%5Csigma_1%5E2%7D%7D%20%2B%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%5Csigma_2%5E2%7D%7De%5E%7B-%5Cfrac%7B%28x-%5Cmu_2%29%7D%7B2%5Csigma_2%5E2%7D%7D%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%200.5%5E2%7D%7De%5E%7B-%5Cfrac%7B%28x-1.0%29%5E2%7D%7B2%20%2A%200.5%5E2%7D%7D%20%2B%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%201%5E2%7D%7De%5E%7B-%5Cfrac%7B%28x-0%29%5E2%7D%7B2%2A1%5E2%7D%7D

采用的proposal分布为标准高斯分布N%280.0%2C%201.0%29

q%28x%29%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%5Csigma_0%5E2%7D%7De%5E%7B-%5Cfrac%7B%28x-%5Cmu_0%29%5E2%7D%7B2%5Csigma_0%5E2%7D%7D%20%3D%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%7De%5E%7B-%5Cfrac%7Bx%5E2%7D%7B2%7D%7D

参数系数k的推导如下:

kq%28x%29%5Cge%20%5Ctilde%7Bp%7D%28x%29

k%20%5Cge%20%5Cfrac%7B%5Ctilde%7Bp%7D%28x%29%7D%7Bq%28x%29%7D%20%3D%202e%5E%7B%28-3x%5E2%2B8x-4%29%20/%202%7D%20%2B%201%20%3D%202e%5E%7B-%5Cfrac%7B3%7D%7B2%7D%28x-4/3%29%5E2%20%2B%202/3%7D%20%2B%201

k%20%3D%202e%5E%7B2/3%7D%20%2B%201

以下是示例实现代码:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf


def gaussian_pdf(x, mu, sigma):
    return tf.compat.v1.distributions.Normal(mu,sigma).prob(x)

def p_pdf(x):
    # target distribution
    return gaussian_pdf(x, 1.0, 0.5) + gaussian_pdf(x, 0.0, 1.0)


def q_pdf(x):
    # proposal distribution
    return gaussian_pdf(x, 0.0, 1.0)


def accept_rate(x):
    k = 2 * np.exp(2.0/3.0) + 1
    return p_pdf(x) / (k * q_pdf(x))


X = []
AC =[]
while len(X) < 2000:
    x = np.random.normal(0.0, 1.0, 1)
    ar = accept_rate(x)
    AC.append(ar)
    if tf.random.uniform([1]) < ar:
        X.append(x)
        Y.append(p_x) 
    r += 1

plt.plot(AC)
plt.grid()


请添加图片描述

import pandas as pd
pd.DataFrame(X).plot(kind='density') 

贝叶斯推理 (2)

请添加图片描述

重要性抽样

目标积分分解如下:

E_%7Bx%5Csim%20p%28x%29%7D%5Bf%28x%29%5D%20%3D%20%5Cint%20f%28x%29p%28x%29dx

%3D%5Cint%20f%28x%29%5Cfrac%7Bp%28x%29%7D%7Bq%28x%29%7Dq%28x%29dx

%5Capprox%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5Enf%28x%5Ei%29%5Cfrac%7Bp%28x%5Ei%29%7D%7Bq%28x%5Ei%29%7D

可以看出,将q%28x%29中采样的样本x_i带入函数f%28x%29后,乘以权重p%28x%5Ei%29/q%28x%5Ei%29即可得到目标结果。拒绝采样和重要性采样都需要proposal distribution和目标分布越接近越好。

如何解决概率归一化问题:

E_%7Bx%5Csim%20p%28x%29%7D%5Bf%28x%29%5D%3D%5Cint%20f%28x%29p%28x%29dx

%3D%5Cfrac%7BZ_q%7D%7BZ_p%7D%5Cint%20f%28x%29%5Cfrac%7B%5Ctilde%7Bp%7D%28x%29%7D%7B%5Ctilde%7Bq%7D%28x%29%7Dq%28x%29dx

%5Capprox%20%5Cfrac%7BZ_q%7D%7BZ_p%7D%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5Enf%28x%5Ei%29%5Cfrac%7B%5Ctilde%7Bp%7D%28x%5Ei%29%7D%7B%5Ctilde%7Bq%7D%28x%5Ei%29%7D

%5Cfrac%7BZ_p%7D%7BZ_q%7D%20%3D%20%5Cfrac%7B1%7D%7BZ_q%7D%5Cint%20%5Ctilde%7Bp%7D%28x%29dx

%3D%5Cint%20%5Cfrac%7Bq%28x%29%7D%7B%5Ctilde%7Bq%7D%28x%29%7D%5Ctilde%7Bp%7D%28x%29dx

%3D%5Cint%20%5Cfrac%7B%5Ctilde%7Bp%7D%28x%29%7D%7B%5Ctilde%7Bq%7D%28x%29%7Dq%28x%29dx

%5Capprox%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5En%5Cfrac%7B%5Ctilde%7Bp%7D%28x%5Ei%29%7D%7B%5Ctilde%7Bq%7D%28x%5Ei%29%7D

归一化问题解决了。

MCMC(马尔科夫链蒙特卡洛采样)

这里介绍称为M-H的MCMC采样算法,算法步骤为:

  • 选择一个我们已经可以采样的分布(例如高斯分布)作为齐次马尔可夫链的转移概率:T%28x%5Ei%5Crightarrow%20x%5Ej%29
  • 以任意方式获取初始样本x_0,确定超参数M
  • 根据转移概率T和当前状态x_i,采样下一个状态x_j,以概率T接受新状态,如果接受新状态,则x%5E%7B%28m%29%7D%20%3D%20x%5Ej,否则x%5E%7B%28m%29%7D%3Dx%5E%7B%28m-1%29%7D
  • 重复第三步,选择x%5E%7B%28M%29%7D后的样本作为采样结果
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

def gaussian_pdf(x, mu, sigma):
    return tf.compat.v1.distributions.Normal(mu,sigma).prob(x)

def p_pdf(x):
    # target distribution
    return gaussian_pdf(x, 1.0, 0.5) + gaussian_pdf(x, 0.0, 1.0)

def accept_rate(x_i, x_j):
    t1 = p_pdf(x_j) * gaussian_pdf(x_j - x_i, 0.0, 0.1)
    t2 = p_pdf(x_i) * gaussian_pdf(x_i - x_j, 0.0, 0.1)
    ac_rate = t1 / t2
    return min(1.0, ac_rate)

X = []
AC = []
M = 1000
x_i = 0.0
x_j = 0.0
r = 0
while len(X) < 1000:
    x_j = x_i + np.random.normal(0.0, 0.1, 1)
    ac = accept_rate(x_i, x_j)
    AC.append(ac)
    if tf.random.uniform([1]) < ac:
        x_i = x_j
    if r > M:
        X.append(x_i)
        
    r += 1
    
plt.plot(AC)
plt.grid()

请添加图片描述

import pandas as pd
pd.DataFrame(X).plot(kind='density') 

请添加图片描述

变分推理

变分推断的方式是寻找一个与目标分接近的参数分布q%28x%3Bw%29,具体做法可以通过优化q%28x%3Bw%29与目标分布间的KL相散度:

KL%28q%28x%3Bw%29%2C%20p%28x%29%29%20%3D%20%5Cint%20q%28x%29log%28%5Cfrac%7Bq%28x%29%7D%7Bp%28x%29%7D%29dx

%3D%5Cint%20%28q%28x%29log%28q%28x%29%29%20-%20q%28x%29log%28p%28x%29%29%29dx

%3D%5Cint%20q%28x%29log%28q%28x%29%29dx%20-%20%5Cint%20q%28x%29log%28p%28x%29%29dx

%5Capprox%20%5Cfrac%7B1%7D%7Bn%7D%5Csum%20log%28q%28x_i%29%29%20-%20%5Cfrac%7B1%7D%7Bn%7D%5Csum%20log%28p%28x_i%29%29

Loss%28w%29%20%3D%20%5Capprox%20%5Cfrac%7B1%7D%7Bn%7D%5Csum%20log%28q%28x_i%29%29%20-%20%5Cfrac%7B1%7D%7Bn%7D%5Csum%20log%28p%28x_i%29%29

定义了Loss后,可以通过梯度下降的方式寻找参数w,下面是具体实现代码,任然一上面的混合高斯分布为目标分布,这里采用q%28x%3Bw%29%3DN%28x_%7Bmu%7D%2C%20x_%7Brho%7D%5E2%29,作为相似分布:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.optimizers import Adam

def gaussian_pdf(x, mu, sigma):
    return tf.compat.v1.distributions.Normal(mu,sigma).prob(x)

def p_pdf(x):
    # target distribution
    return gaussian_pdf(x, 1.0, 0.5) + gaussian_pdf(x, 0.0, 1.0)

# 训练参数 w = {x_mu, x_rho}
x_mu = tf.Variable(0.0)
x_rho = tf.Variable(0.1)
trainables = [x_mu, x_rho]
x = x_mu + x_rho * tf.random.normal(x_mu.shape)
loss = tf.math.log(gaussian_pdf(x, x_mu, x_rho)) - tf.math.log(p_pdf(x))

r = 0
mus = []
rhos = []
losses = []
optimizer = Adam(0.0001)
while r < 20000:
    with tf.GradientTape() as g:
        x = x_mu + x_rho * tf.random.normal(x_mu.shape)
        loss = tf.math.log(gaussian_pdf(x, x_mu, x_rho)) - tf.math.log(p_pdf(x))
        gradients = g.gradient(loss, trainables)
        optimizer.apply_gradients(zip(gradients, trainables))
        losses.append(loss.numpy())
        mus.append(x_mu.numpy())
        rhos.append(x_rho.numpy())
    r += 1

# 打印loss 
plt.plot(losses)
plt.grid()

请添加图片描述

# 打印均值变化
plt.plot(mus)

请添加图片描述

# 打印标准差变化
plt.plot(rhos)

请添加图片描述

import pandas as pd

# 从训练完的分布q(x;w)中采样若干样本x
r = 0
X = []
Y = []
while r < 1000:
    x = x_mu + x_rho * tf.random.normal(x_mu.shape)
    p_x = gaussian_pdf(x, x_mu, x_rho)
    X.append(x.numpy())
    Y.append(p_x)
    r += 1

pd.DataFrame(X).plot(kind='density') 

请添加图片描述

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
社会演员多的头像社会演员多普通用户
上一篇 2022年3月15日 下午8:15
下一篇 2022年3月15日 下午8:38

相关推荐