NCE（Noise Contrastive Estimation）、NCE Loss

NCE Loss

NCE（Noise Contrastive Estimation）损失是一种在大规模数据集上训练语言模型时使用的损失函数，它通过将问题转化为二分类问题来避免对整个词汇表的归一化计算，从而显著降低了计算成本。NCE损失函数的表达式如下：

$\text{NCELoss} = -\frac{1}{N}\sum_{i=1}^{N}\left[\log \frac{P_{\text{model}}(x_i)}{P_{\text{model}}(x_i) + kP_n(x_i)} + \sum_{j=1}^{k} \log \frac{kP_n(x_{ij})}{P_{\text{model}}(x_{ij}) + kP_n(x_{ij})}\right]$

其中， $P_{\text{model}}(x_i)$ 代表模型输出的概率，是一个二分类的概率，即模型判别当前正样本 $x_i$ 的概率。

InfoNCE Loss是NCE损失的一种变体，它也是一种用于自监督学习的损失函数，通常用于学习特征表示或者表征学习。InfoNCE Loss基于信息论的思想，通过对比正样本和负样本的相似性来学习模型参数。InfoNCE Loss的公式如下：

$\text{InfoNCE Loss} = -\frac{1}{N} \sum_{i=1}^{N} \log \left( \frac{\exp \left( \frac{q_i \cdot k_{i^+}}{\tau} \right)}{\sum_{j=1}^{N} \exp \left( \frac{q_i \cdot k_{j^-}}{\tau} \right)} \right)$

这里， $q_i$ 和 $k_{i^+}$ 分别代表正样本的特征和标签，而 $k_{j^-}$ 代表负样本的标签。

NCE损失函数广泛应用于自然语言处理、推荐系统等领域的大规模分类问题。例如，在自然语言处理中，Word2Vec模型就采用了NCE损失函数来训练模型，通过预测给定上下文中的目标词来生成词向量。在推荐系统中，NCE损失函数也被用于处理用户行为预测等任务，通过对用户历史行为数据进行建模，推荐系统可以预测用户可能感兴趣的项目，并为用户推荐相应的内容。

nce函数

train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

def nce_loss(weights, biases, labels, inputs, num_sampled, num_classes,
             num_true=1,
             sampled_values=None,
             remove_accidental_hits=False,
             partition_strategy="mod",
             name="nce_loss")

参数

weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor`
        objects whose concatenation along dimension 0 has shape
        [num_classes, dim].  The (possibly-partitioned) class embeddings.
    biases: A `Tensor` of shape `[num_classes]`.  The class biases.
    labels: A `Tensor` of type `int64` and shape `[batch_size,
        num_true]`. The target classes.
    inputs: A `Tensor` of shape `[batch_size, dim]`.  The forward
        activations of the input network.

nce loss计算流程

负样本采样num_samples - input vector分别与（num_samples + 1）个向量做内积，得到预测值x，x 与y做交叉熵loss（此时维度为batch_size * (num_samples + 1)）
然后用ones矩阵和loss矩阵相乘，得到batch_size * 1的 loss向量，然后求均值

实现细节

def log_uniform_candidate_sampler(true_classes, num_true, num_sampled, unique,
                                  range_max, seed=None, name=None):

所以，默认情况下，他会用log_uniform_candidate_sampler去采样。那么log_uniform_candidate_sampler是怎么采样的呢？他的实现在这里：

他会在[0, range_max)中采样出一个整数k P(k) = (log(k + 2) - log(k + 1)) / log(range_max + 1)

参考

关于word2vec的skip-gram模型使用负例采样nce_loss损失函数的源码剖析

NCE（Noise Contrastive Estimation）、NCE Loss

目录

NCE Loss

nce函数

参数

nce loss计算流程

实现细节

参考

tensorflow相关文章

损失函数相关文章

最近热门

最常浏览