Confused about Nesterov momentum gradient descent algorithmGradient descent with constraintsA constrained gradient descent algorithmIntuition for gradient descent with Nesterov momentumProjected gradient descent with momentumgradient descent algorithm definitionGradient Descent DivergenceWhy is gradient descent used?Momentum in gradient descentReason for differences in EWA equations and Momentum equationsGetting to the gradient descent algorithm
Student evaluations of teaching assistants
Displaying the order of the columns of a table
Increase performance creating Mandelbrot set in python
Is HostGator storing my password in plaintext?
How do I rename a LINUX host without needing to reboot for the rename to take effect?
Will it be accepted, if there is no ''Main Character" stereotype?
There is only s̶i̶x̶t̶y one place he can be
What's the purpose of "true" in bash "if sudo true; then"
Is expanding the research of a group into machine learning as a PhD student risky?
What are the ramifications of creating a homebrew world without an Astral Plane?
What is difference between behavior and behaviour
Everything Bob says is false. How does he get people to trust him?
Why are on-board computers allowed to change controls without notifying the pilots?
Is a roofing delivery truck likely to crack my driveway slab?
How will losing mobility of one hand affect my career as a programmer?
Is there any reason not to eat food that's been dropped on the surface of the moon?
Trouble understanding overseas colleagues
How was Earth single-handedly capable of creating 3 of the 4 gods of chaos?
Hide Select Output from T-SQL
Why does John Bercow say “unlock” after reading out the results of a vote?
Why Were Madagascar and New Zealand Discovered So Late?
How can a jailer prevent the Forge Cleric's Artisan's Blessing from being used?
Is it correct to write "is not focus on"?
Have I saved too much for retirement so far?
Confused about Nesterov momentum gradient descent algorithm
Gradient descent with constraintsA constrained gradient descent algorithmIntuition for gradient descent with Nesterov momentumProjected gradient descent with momentumgradient descent algorithm definitionGradient Descent DivergenceWhy is gradient descent used?Momentum in gradient descentReason for differences in EWA equations and Momentum equationsGetting to the gradient descent algorithm
$begingroup$
I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.
Here is one I found that can just be re-arranged, can someone explain why I am wrong?
$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$
Plug first equation into second,
$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$
Let $Delta y_t = y_t+1 - y_t$ then it simply becomes
$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$
so what am I doing wrong?
In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.
where gradient descent with momentum is defined as
$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)
optimization numerical-optimization gradient-descent
$endgroup$
add a comment |
$begingroup$
I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.
Here is one I found that can just be re-arranged, can someone explain why I am wrong?
$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$
Plug first equation into second,
$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$
Let $Delta y_t = y_t+1 - y_t$ then it simply becomes
$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$
so what am I doing wrong?
In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.
where gradient descent with momentum is defined as
$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)
optimization numerical-optimization gradient-descent
$endgroup$
$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24
$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago
add a comment |
$begingroup$
I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.
Here is one I found that can just be re-arranged, can someone explain why I am wrong?
$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$
Plug first equation into second,
$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$
Let $Delta y_t = y_t+1 - y_t$ then it simply becomes
$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$
so what am I doing wrong?
In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.
where gradient descent with momentum is defined as
$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)
optimization numerical-optimization gradient-descent
$endgroup$
I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.
Here is one I found that can just be re-arranged, can someone explain why I am wrong?
$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$
Plug first equation into second,
$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$
Let $Delta y_t = y_t+1 - y_t$ then it simply becomes
$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$
so what am I doing wrong?
In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.
where gradient descent with momentum is defined as
$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)
optimization numerical-optimization gradient-descent
optimization numerical-optimization gradient-descent
edited Mar 17 at 12:49
Alexis Drakopoulos
asked Mar 17 at 12:44
Alexis DrakopoulosAlexis Drakopoulos
1013
1013
$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24
$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago
add a comment |
$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24
$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago
$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24
$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24
$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago
$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3151501%2fconfused-about-nesterov-momentum-gradient-descent-algorithm%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3151501%2fconfused-about-nesterov-momentum-gradient-descent-algorithm%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24
$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago