CONTENTS
Introduction
BERT: Bidirectional Encoder Representations from Transformers.
They propose to extract contextsensitive features from a language model.
Related Work
two existing strategies for applying pre-trained language representations to downstream tasks