CONTENTS

Introduction

BERT: Bidirectional Encoder Representations from Transformers.

They propose to extract contextsensitive features from a language model.

two existing strategies for applying pre-trained language representations to downstream tasks

Feature-based Approaches

Fine-tuning Approaches