Confused about Nesterov momentum gradient descent algorithmGradient descent with constraintsA constrained gradient descent algorithmIntuition for gradient descent with Nesterov momentumProjected gradient descent with momentumgradient descent algorithm definitionGradient Descent DivergenceWhy is gradient descent used?Momentum in gradient descentReason for differences in EWA equations and Momentum equationsGetting to the gradient descent algorithm

Student evaluations of teaching assistants

Displaying the order of the columns of a table

Increase performance creating Mandelbrot set in python

Is HostGator storing my password in plaintext?

How do I rename a LINUX host without needing to reboot for the rename to take effect?

Will it be accepted, if there is no ''Main Character" stereotype?

There is only s̶i̶x̶t̶y one place he can be

What's the purpose of "true" in bash "if sudo true; then"

Is expanding the research of a group into machine learning as a PhD student risky?

What are the ramifications of creating a homebrew world without an Astral Plane?

What is difference between behavior and behaviour

Everything Bob says is false. How does he get people to trust him?

Why are on-board computers allowed to change controls without notifying the pilots?

Is a roofing delivery truck likely to crack my driveway slab?

How will losing mobility of one hand affect my career as a programmer?

Is there any reason not to eat food that's been dropped on the surface of the moon?

Trouble understanding overseas colleagues

How was Earth single-handedly capable of creating 3 of the 4 gods of chaos?

Hide Select Output from T-SQL

Why does John Bercow say “unlock” after reading out the results of a vote?

Why Were Madagascar and New Zealand Discovered So Late?

How can a jailer prevent the Forge Cleric's Artisan's Blessing from being used?

Is it correct to write "is not focus on"?

Have I saved too much for retirement so far?



Confused about Nesterov momentum gradient descent algorithm


Gradient descent with constraintsA constrained gradient descent algorithmIntuition for gradient descent with Nesterov momentumProjected gradient descent with momentumgradient descent algorithm definitionGradient Descent DivergenceWhy is gradient descent used?Momentum in gradient descentReason for differences in EWA equations and Momentum equationsGetting to the gradient descent algorithm













0












$begingroup$


I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.



Here is one I found that can just be re-arranged, can someone explain why I am wrong?



$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$



Plug first equation into second,



$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$



Let $Delta y_t = y_t+1 - y_t$ then it simply becomes



$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$



so what am I doing wrong?



In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.



where gradient descent with momentum is defined as



$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)










share|cite|improve this question











$endgroup$











  • $begingroup$
    Why do you think it’s wrong to write it in one line?
    $endgroup$
    – David M.
    Mar 17 at 16:24










  • $begingroup$
    What about the update for the momentum term in the one-liner?
    $endgroup$
    – user3658307
    2 days ago















0












$begingroup$


I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.



Here is one I found that can just be re-arranged, can someone explain why I am wrong?



$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$



Plug first equation into second,



$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$



Let $Delta y_t = y_t+1 - y_t$ then it simply becomes



$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$



so what am I doing wrong?



In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.



where gradient descent with momentum is defined as



$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)










share|cite|improve this question











$endgroup$











  • $begingroup$
    Why do you think it’s wrong to write it in one line?
    $endgroup$
    – David M.
    Mar 17 at 16:24










  • $begingroup$
    What about the update for the momentum term in the one-liner?
    $endgroup$
    – user3658307
    2 days ago













0












0








0





$begingroup$


I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.



Here is one I found that can just be re-arranged, can someone explain why I am wrong?



$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$



Plug first equation into second,



$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$



Let $Delta y_t = y_t+1 - y_t$ then it simply becomes



$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$



so what am I doing wrong?



In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.



where gradient descent with momentum is defined as



$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)










share|cite|improve this question











$endgroup$




I've found a variety of variations of writing Nesterov but I cannot understand why they cannot simply be expanded into a one liner.



Here is one I found that can just be re-arranged, can someone explain why I am wrong?



$theta_t = y_t - gamma nabla f(y_t) \
y_t+1 = theta_t + rho (theta_t - theta_t-1)$



Plug first equation into second,



$y_t+1 = y_t - gamma nabla f(y_t) + rho (theta_t - theta_t-1)$



Let $Delta y_t = y_t+1 - y_t$ then it simply becomes



$$Delta y_t = - gamma nabla f(y_t) + rho (y_t - gamma nabla f(y_t) - y_t-1 + gamma nabla f(y_t-1) \
= - gamma nabla f(y_t) + rho (Delta y_t-1 + gamma (nabla f(y_t-1) - nabla f(y_t)) $$



so what am I doing wrong?



In fact I've found a similar form in a paper: Ning Qian. On the momentum term in gradient descentlearning algorithms.Neural Networks, 12(1):145 – 151,1999.



where gradient descent with momentum is defined as



$$Delta theta_t = - gamma nabla f(theta) + rho Delta theta_t-1 $$ (I'm also not sure why it's $f(theta)$ and not $f(theta_t)$)







optimization numerical-optimization gradient-descent






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Mar 17 at 12:49







Alexis Drakopoulos

















asked Mar 17 at 12:44









Alexis DrakopoulosAlexis Drakopoulos

1013




1013











  • $begingroup$
    Why do you think it’s wrong to write it in one line?
    $endgroup$
    – David M.
    Mar 17 at 16:24










  • $begingroup$
    What about the update for the momentum term in the one-liner?
    $endgroup$
    – user3658307
    2 days ago
















  • $begingroup$
    Why do you think it’s wrong to write it in one line?
    $endgroup$
    – David M.
    Mar 17 at 16:24










  • $begingroup$
    What about the update for the momentum term in the one-liner?
    $endgroup$
    – user3658307
    2 days ago















$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24




$begingroup$
Why do you think it’s wrong to write it in one line?
$endgroup$
– David M.
Mar 17 at 16:24












$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago




$begingroup$
What about the update for the momentum term in the one-liner?
$endgroup$
– user3658307
2 days ago










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3151501%2fconfused-about-nesterov-momentum-gradient-descent-algorithm%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3151501%2fconfused-about-nesterov-momentum-gradient-descent-algorithm%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e

Kathakali Contents Etymology and nomenclature History Repertoire Songs and musical instruments Traditional plays Styles: Sampradayam Training centers and awards Relationship to other dance forms See also Notes References External links Navigation menueThe Illustrated Encyclopedia of Hinduism: A-MSouth Asian Folklore: An EncyclopediaRoutledge International Encyclopedia of Women: Global Women's Issues and KnowledgeKathakali Dance-drama: Where Gods and Demons Come to PlayKathakali Dance-drama: Where Gods and Demons Come to PlayKathakali Dance-drama: Where Gods and Demons Come to Play10.1353/atj.2005.0004The Illustrated Encyclopedia of Hinduism: A-MEncyclopedia of HinduismKathakali Dance-drama: Where Gods and Demons Come to PlaySonic Liturgy: Ritual and Music in Hindu Tradition"The Mirror of Gesture"Kathakali Dance-drama: Where Gods and Demons Come to Play"Kathakali"Indian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceMedieval Indian Literature: An AnthologyThe Oxford Companion to Indian TheatreSouth Asian Folklore: An Encyclopedia : Afghanistan, Bangladesh, India, Nepal, Pakistan, Sri LankaThe Rise of Performance Studies: Rethinking Richard Schechner's Broad SpectrumIndian Theatre: Traditions of PerformanceModern Asian Theatre and Performance 1900-2000Critical Theory and PerformanceBetween Theater and AnthropologyKathakali603847011Indian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceBetween Theater and AnthropologyBetween Theater and AnthropologyNambeesan Smaraka AwardsArchivedThe Cambridge Guide to TheatreRoutledge International Encyclopedia of Women: Global Women's Issues and KnowledgeThe Garland Encyclopedia of World Music: South Asia : the Indian subcontinentThe Ethos of Noh: Actors and Their Art10.2307/1145740By Means of Performance: Intercultural Studies of Theatre and Ritual10.1017/s204912550000100xReconceiving the Renaissance: A Critical ReaderPerformance TheoryListening to Theatre: The Aural Dimension of Beijing Opera10.2307/1146013Kathakali: The Art of the Non-WorldlyOn KathakaliKathakali, the dance theatreThe Kathakali Complex: Performance & StructureKathakali Dance-Drama: Where Gods and Demons Come to Play10.1093/obo/9780195399318-0071Drama and Ritual of Early Hinduism"In the Shadow of Hollywood Orientalism: Authentic East Indian Dancing"10.1080/08949460490274013Sanskrit Play Production in Ancient IndiaIndian Music: History and StructureBharata, the Nāṭyaśāstra233639306Table of Contents2238067286469807Dance In Indian Painting10.2307/32047833204783Kathakali Dance-Theatre: A Visual Narrative of Sacred Indian MimeIndian Classical Dance: The Renaissance and BeyondKathakali: an indigenous art-form of Keralaeee

Method to test if a number is a perfect power? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Detecting perfect squares faster than by extracting square rooteffective way to get the integer sequence A181392 from oeisA rarely mentioned fact about perfect powersHow many numbers such $n$ are there that $n<100,lfloorsqrtn rfloor mid n$Check perfect squareness by modulo division against multiple basesFor what pair of integers $(a,b)$ is $3^a + 7^b$ a perfect square.Do there exist any positive integers $n$ such that $lfloore^nrfloor$ is a perfect power? What is the probability that one exists?finding perfect power factors of an integerProve that the sequence contains a perfect square for any natural number $m $ in the domain of $f$ .Counting Perfect Powers