Deep Learning: What are the differences between DeepMind's Learning to Learn method and a grid search of a network's hyperparameters?

2017-04-21 11:52:51

If we have a meta-learner that trains an optimizer (who contains certain hyperparameters) and the optimizer is fine-tuned by the meta-learner depending on how it performs, how is it different from a usual grid search of the best hyperparameters? One way I think it's different is that the meta-learner can supposedly 'intelligently' find the best optimizer but a gridsearch is rather a brute-force method. But a grid search would likely include human knowledge on the range of hyperparameters where the model is likely to perform well.

My current impression is that the meta-learner simply tweaks the hyperparameters of the optimizer (which are usually fixed in many other cases) after certain number of epochs, evaluates the performance, then dynamically changes how the hyperparameters should be tweaked. Is this what the authors of the paper did?