CONTENTS
Model
Five components
- Input layer: input sentence to this model
- Embedding layer: map each word into a low dimension vector
- LSTM layer: utilize BLSTM to get high level features from step
- Attention layer: produce a weight vector, and merge word-level features from each time step into a sentence-level feature vector, by multiplying the weight vector
- Output layer: the sentence-level feature vector is finally used for relation classification
Word Embeddings
look up the embedding matrix.
Bidirectional Network
use element-wise sum to combine the forward and backward pass outputs.
Attention
The representation r of the sentence is formed by a weighted sum of these output vectors:
Classifying
cost function
Regularization
- Dropout
- L2-norms
Experiments
Dataset
SemEval-2010 Task 8 dataset
Experimental Setup
AdaDelta with a learning rate of 1.0 and a minibatch size 10.
The model parameters were regularized with a perminibatch L2 regularization strength of 10−5.
evaluate the effect of dropout embedding layer, dropout LSTM layer and dropout the penultimate layer, the model has a better performance, when the dropout rate is set as 0.3, 0.3, 0.5 respectively.
Other parameters in model are initialized randomly.
Experimental Results
F1-score of 84.0%
REFERENCES
- 《Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification》