- Segment view controller with Add button
- Parallel structure in sentence
- Can I use two series of commas in the same phrase or sentence?
- What do you call this clamp in English?
- are there words exist that are for foreign bodies that
- Is it appropriate to use 'the' in front of a company name that starts with an adjective?
- What is the word that means an “unanswerable question”?
- I filed my 2015 taxes with 1040 using TurboTax, but I am F1 student. How should I file for amendment
- About paying tax for money earned from online job, earner is outside US but earnings deposited to US PayPal account of another person?
- A picture of one copy from a W-2: can it be used for filing taxes?
- Natbib doesn't recognize multiple citations of same author in same year
- Reference order
- Engineer some features of harvmac package
- Issue with \drawwidebeam on pst-optexp
- Compiling LaTeX->DVI->PS->PDF with TeXnicCenter
- How to get listings package line number behavior of v1.3 in v1.5?
- Redefine \intertext and \shortintertext to measure their width
- How can I manipulate the reflectivity of mirrors in pstricks?
- diagnosing an incompatibility between pythontex and a siam .cls file
- Custom font for chapter titles not working
Deep Learning: What are the differences between DeepMind's Learning to Learn method and a grid search of a network's hyperparameters?
If we have a meta-learner that trains an optimizer (who contains certain hyperparameters) and the optimizer is fine-tuned by the meta-learner depending on how it performs, how is it different from a usual grid search of the best hyperparameters? One way I think it's different is that the meta-learner can supposedly 'intelligently' find the best optimizer but a gridsearch is rather a brute-force method. But a grid search would likely include human knowledge on the range of hyperparameters where the model is likely to perform well.
My current impression is that the meta-learner simply tweaks the hyperparameters of the optimizer (which are usually fixed in many other cases) after certain number of epochs, evaluates the performance, then dynamically changes how the hyperparameters should be tweaked. Is this what the authors of the paper did?