文档

AdagradDecay Optimizer

更新时间:
重要

本文中含有需要您注意的重要提示信息,忽略该信息可能对您的业务造成影响,请务必仔细阅读。

本文为您介绍如何使用AdagradDecay Optimizer进行超大规模训练。

警告

公共云GPU服务器即将过保下线,您可以继续提交CPU版本的TensorFlow任务。如需使用GPU进行模型训练,请前往DLC提交任务,具体操作请参见创建训练任务

背景信息

超大规模模型的训练样本通常在10亿规模以上,且持续增量训练时间在一个月以上。为解决该问题,PAI-TF推出AdagradDecay优化器。

开启AdagradDecay Optimizer优化器

使用AdagradDecay Optimizer优化器进行超大规模训练,需要定义tf.train.AdagradDecayOptimizer。AdagradDecay Optimizer的使用方法与TensorFlow原生Optimizer的使用方法相同,具体定义如下。

class AdagradDecayOptimizer(optimizer.Optimizer):
  """Optimizer that implements the Adagrad algorithm with accumulator decay.
  Different from the original Adagrad algorithm, AdagradDecay performs decay
  at given step with given rate. So that the accumulator will not be infinity.
  """
  def __init__(self,
               learning_rate,
               global_step,
               initial_accumulator_value=0.1,
               accumulator_decay_step=100000,
               accumulator_decay_rate=0.9,
               use_locking=False,
               name="AdagradDecay"):
    """Construct a new AdagradDecay optimizer.
    Args:
      learning_rate: A `Tensor` or a floating point value.  The learning rate.
      global_step: global step variable, used for calculating t%T .
      initial_accumulator_value: A floating point value. Starting and baseline
        value for the accumulators, must be positive. The accumulators will not
        be less than it.
      accumulator_decay_step: When global_step reaches times of
        accumulator_decay_step, accumulator will be decayed with
        accumulator_decay_rate. accumulator *= accumulator_decay_rate
      accumulator_decay_rate: Decay rate as above described.
      use_locking: If `True` use locks for update operations.
      name: Optional name prefix for the operations created when applying
        gradients.  Defaults to "AdagradDecay".
    Raises:
      ValueError: If the `initial_accumulator_value`, `accumulator_decay_step`
        or `accumulator_decay_rate` is invalid.
    """
  • 本页导读 (1)