Confused about Nesterov momentum gradient descent algorithmGradient descent with constraintsA constrained gradient descent algorithmIntuition for gradient descent with Nesterov momentumProjected gradient descent with momentumgradient descent algorithm definitionGradient Descent DivergenceWhy is gradient descent used?Momentum in gradient descentReason for differences in EWA equations and Momentum equationsGetting to the gradient descent algorithm

Student evaluations of teaching assistants

Displaying the order of the columns of a table

Increase performance creating Mandelbrot set in python

Is HostGator storing my password in plaintext?

How do I rename a LINUX host without needing to reboot for the rename to take effect?

Will it be accepted, if there is no ''Main Character" stereotype?

There is only s̶i̶x̶t̶y one place he can be

What's the purpose of "true" in bash "if sudo true; then"

Is expanding the research of a group into machine learning as a PhD student risky?

What are the ramifications of creating a homebrew world without an Astral Plane?

What is difference between behavior and behaviour

Everything Bob says is false. How does he get people to trust him?

Why are on-board computers allowed to change controls without notifying the pilots?

Is a roofing delivery truck likely to crack my driveway slab?

How will losing mobility of one hand affect my career as a programmer?

Is there any reason not to eat food that's been dropped on the surface of the moon?

Trouble understanding overseas colleagues

How was Earth single-handedly capable of creating 3 of the 4 gods of chaos?

Hide Select Output from T-SQL

Why does John Bercow say “unlock” after reading out the results of a vote?

Why Were Madagascar and New Zealand Discovered So Late?

How can a jailer prevent the Forge Cleric's Artisan's Blessing from being used?

Is it correct to write "is not focus on"?

Have I saved too much for retirement so far?

Confused about Nesterov momentum gradient descent algorithm

Gradient descent with constraintsA constrained gradient descent algorithmIntuition for gradient descent with Nesterov momentumProjected gradient descent with momentumgradient descent algorithm definitionGradient Descent DivergenceWhy is gradient descent used?Momentum in gradient descentReason for differences in EWA equations and Momentum equationsGetting to the gradient descent algorithm

I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.

Here is one I found that can just be re-arranged, can someone explain why I am wrong?

$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$

Plug first equation into second,

$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$

Let $Delta y_t = y_t+1 - y_t$ then it simply becomes

$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$

so what am I doing wrong?

In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.

where gradient descent with momentum is defined as

$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)

edited Mar 17 at 12:49

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24

$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago

add a comment |

I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.

Here is one I found that can just be re-arranged, can someone explain why I am wrong?

$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$

Plug first equation into second,

$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$

Let $Delta y_t = y_t+1 - y_t$ then it simply becomes

$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$

so what am I doing wrong?

In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.

where gradient descent with momentum is defined as

$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)

edited Mar 17 at 12:49

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24

$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago

add a comment |

I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.

Here is one I found that can just be re-arranged, can someone explain why I am wrong?

$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$

Plug first equation into second,

$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$

Let $Delta y_t = y_t+1 - y_t$ then it simply becomes

$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$

so what am I doing wrong?

In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.

where gradient descent with momentum is defined as

$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)

edited Mar 17 at 12:49

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.

Here is one I found that can just be re-arranged, can someone explain why I am wrong?

$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$

Plug first equation into second,

$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$

Let $Delta y_t = y_t+1 - y_t$ then it simply becomes

$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$

so what am I doing wrong?

In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.

where gradient descent with momentum is defined as

$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)

optimization numerical-optimization gradient-descent

edited Mar 17 at 12:49

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

edited Mar 17 at 12:49

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

edited Mar 17 at 12:49

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

asked Mar 17 at 12:44

Alexis Drakopoulos

1013

$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24

$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago

add a comment |

$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24

$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago

Why do you think it’s wrong to write it in one line?

– David M.
Mar 17 at 16:24

What about the update for the momentum term in the one-liner?

– user3658307
2 days ago

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3151501%2fconfused-about-nesterov-momentum-gradient-descent-algorithm%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

VJh e,NV4q5eCWX8fv6dj2FS8 gH0R,g,rL,pnWmn07JkmBNQ,SKV4ae QpAjFoQ09ZfWDDoRmn9prqrX1,8h

搜尋此網誌

Fdtxjr

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Football at the 1986 Brunei Merdeka Games Contents Teams Group stage Knockout stage References Navigation menu"Brunei Merdeka Games 1986".

Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Football at the 1986 Brunei Merdeka Games Contents Teams Group stage Knockout stage References Navigation menu"Brunei Merdeka Games 1986".

Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e