原因
有几个原因
- 其他程序已经分配了很多内存
- 当前程序申请了太多内存
解决方案
- 关掉同gpu下的其他程序
- 申请小的资源
- 设置小的batch_size
The recommended way is to use a partitioner to shard this large tensor across several parts:
embedding = tf.get_variable("embedding", [1000000000, 20],
partitioner=tf.fixed_size_partitioner(3))
This will split the tensor into 3 shards along 0 axis, but the rest of the program will see it as an ordinary tensor. The biggest benefit is to use a partitioner along with parameter server replication, like this:
with tf.device(tf.train.replica_device_setter(ps_tasks=3)):
embedding = tf.get_variable("embedding", [1000000000, 20],
partitioner=tf.fixed_size_partitioner(3))
The key function here is tf.train.replica_device_setter. It allows you to run 3 different processes, called parameter servers, that store all of model variables. The large embedding tensor will be split across these servers like on this picture.