When can the sigmoid-activated output of a neural network be interpreted as a probability?Can a regular neural network be used as a probablistic model rather than classifier?How can I derive the back propagation formula in a more elegant way?How to evaluate the quality of the probability distribution output of a classifier?Which functional space does feedforward neural network approximate?Neural network cost function - why squared error?Neural Network Sigmoid ProblemCan a regular neural network be used as a probablistic model rather than classifier?Expected Value of neural network outputCross-Entropy loss in Reinforcement LearningWhat does it essentially mean if the neural network has convex error surface?Why does a $sum_k$ appear when using the chain rule to derive $delta^L_j?$

Can I make popcorn with any corn?

Revoked SSL certificate

expand `ifthenelse` immediately

What doth I be?

Why does Kotter return in Welcome Back Kotter?

What does the "remote control" for a QF-4 look like?

What would happen to a modern skyscraper if it rains micro blackholes?

dbcc cleantable batch size explanation

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

What is a clear way to write a bar that has an extra beat?

How to format long polynomial?

How to source a part of a file

How do I deal with an unproductive colleague in a small company?

Is it possible to run Internet Explorer on OS X El Capitan?

"You are your self first supporter", a more proper way to say it

Can a vampire attack twice with their claws using Multiattack?

Was any UN Security Council vote triple-vetoed?

What typically incentivizes a professor to change jobs to a lower ranking university?

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

Theorems that impeded progress

How old can references or sources in a thesis be?

How much of data wrangling is a data scientist's job?

Alternative to sending password over mail?

Could an aircraft fly or hover using only jets of compressed air?



When can the sigmoid-activated output of a neural network be interpreted as a probability?


Can a regular neural network be used as a probablistic model rather than classifier?How can I derive the back propagation formula in a more elegant way?How to evaluate the quality of the probability distribution output of a classifier?Which functional space does feedforward neural network approximate?Neural network cost function - why squared error?Neural Network Sigmoid ProblemCan a regular neural network be used as a probablistic model rather than classifier?Expected Value of neural network outputCross-Entropy loss in Reinforcement LearningWhat does it essentially mean if the neural network has convex error surface?Why does a $sum_k$ appear when using the chain rule to derive $delta^L_j?$













1












$begingroup$


I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.



Since the sigmoid function -- specifically, in this case, the logistic function
$$frac11 + e^-x$$
-- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.










share|cite|improve this question









$endgroup$
















    1












    $begingroup$


    I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.



    Since the sigmoid function -- specifically, in this case, the logistic function
    $$frac11 + e^-x$$
    -- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.










    share|cite|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.



      Since the sigmoid function -- specifically, in this case, the logistic function
      $$frac11 + e^-x$$
      -- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.










      share|cite|improve this question









      $endgroup$




      I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.



      Since the sigmoid function -- specifically, in this case, the logistic function
      $$frac11 + e^-x$$
      -- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.







      probability-theory machine-learning






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Mar 21 at 17:47









      quevivasbienquevivasbien

      153




      153




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.



          In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).



          enter image description here






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
            $endgroup$
            – quevivasbien
            Mar 25 at 18:36










          • $begingroup$
            Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
            $endgroup$
            – MachineLearner
            Mar 25 at 18:50











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3157129%2fwhen-can-the-sigmoid-activated-output-of-a-neural-network-be-interpreted-as-a-pr%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.



          In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).



          enter image description here






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
            $endgroup$
            – quevivasbien
            Mar 25 at 18:36










          • $begingroup$
            Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
            $endgroup$
            – MachineLearner
            Mar 25 at 18:50















          1












          $begingroup$

          The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.



          In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).



          enter image description here






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
            $endgroup$
            – quevivasbien
            Mar 25 at 18:36










          • $begingroup$
            Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
            $endgroup$
            – MachineLearner
            Mar 25 at 18:50













          1












          1








          1





          $begingroup$

          The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.



          In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).



          enter image description here






          share|cite|improve this answer









          $endgroup$



          The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.



          In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).



          enter image description here







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Mar 22 at 23:18









          MachineLearnerMachineLearner

          1,319112




          1,319112











          • $begingroup$
            Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
            $endgroup$
            – quevivasbien
            Mar 25 at 18:36










          • $begingroup$
            Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
            $endgroup$
            – MachineLearner
            Mar 25 at 18:50
















          • $begingroup$
            Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
            $endgroup$
            – quevivasbien
            Mar 25 at 18:36










          • $begingroup$
            Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
            $endgroup$
            – MachineLearner
            Mar 25 at 18:50















          $begingroup$
          Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
          $endgroup$
          – quevivasbien
          Mar 25 at 18:36




          $begingroup$
          Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
          $endgroup$
          – quevivasbien
          Mar 25 at 18:36












          $begingroup$
          Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
          $endgroup$
          – MachineLearner
          Mar 25 at 18:50




          $begingroup$
          Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
          $endgroup$
          – MachineLearner
          Mar 25 at 18:50

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3157129%2fwhen-can-the-sigmoid-activated-output-of-a-neural-network-be-interpreted-as-a-pr%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Moe incest case Sentencing See also References Navigation menu"'Australian Josef Fritzl' fathered four children by daughter""Small town recoils in horror at 'Australian Fritzl' incest case""Victorian rape allegations echo Fritzl case - Just In (Australian Broadcasting Corporation)""Incest father jailed for 22 years""'Australian Fritzl' sentenced to 22 years in prison for abusing daughter for three decades""RSJ v The Queen"

          Who is our nearest planetary neighbor, on average?Santa Claus flies to the South PoleSeven Spheres of Unequal Mass, a weighing problem with a twistDescribe a large integerFast Mental Calculation of $7.5^7$Math in Space (without the help of celebrities)Find the value of $bigstar$: Puzzle 8 - InequalityWho drinks beer while running anyway?A Crucial DeliveryRanking And AverageHow long will my money last at roulette?

          Daza language Contents Vocabulary Phonology References External links Navigation menudaza1242Daza"Dazaga"eeee178086576