CONTENTS

Model

Five components

  • Input layer: input sentence to this model
  • Embedding layer: map each word into a low dimension vector
  • LSTM layer: utilize BLSTM to get high level features from step
  • Attention layer: produce a weight vector, and merge word-level features from each time step into a sentence-level feature vector, by multiplying the weight vector
  • Output layer: the sentence-level feature vector is finally used for relation classification

Word Embeddings

look up the embedding matrix.

Bidirectional Network

use element-wise sum to combine the forward and backward pass outputs.

Attention

The representation r of the sentence is formed by a weighted sum of these output vectors:

avatar

Classifying

cost function

Regularization

  • Dropout
  • L2-norms

Experiments

Dataset

SemEval-2010 Task 8 dataset

Experimental Setup

AdaDelta with a learning rate of 1.0 and a minibatch size 10.

The model parameters were regularized with a perminibatch L2 regularization strength of 10−5.

evaluate the effect of dropout embedding layer, dropout LSTM layer and dropout the penultimate layer, the model has a better performance, when the dropout rate is set as 0.3, 0.3, 0.5 respectively.

Other parameters in model are initialized randomly.

Experimental Results

F1-score of 84.0%

REFERENCES

  • 《Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification》