When can the sigmoid-activated output of a neural network be interpreted as a probability?Can a regular neural network be used as a probablistic model rather than classifier?How can I derive the back propagation formula in a more elegant way?How to evaluate the quality of the probability distribution output of a classifier?Which functional space does feedforward neural network approximate?Neural network cost function - why squared error?Neural Network Sigmoid ProblemCan a regular neural network be used as a probablistic model rather than classifier?Expected Value of neural network outputCross-Entropy loss in Reinforcement LearningWhat does it essentially mean if the neural network has convex error surface?Why does a $sum_k$ appear when using the chain rule to derive $delta^L_j?$
Can I make popcorn with any corn?
Revoked SSL certificate
expand `ifthenelse` immediately
What doth I be?
Why does Kotter return in Welcome Back Kotter?
What does the "remote control" for a QF-4 look like?
What would happen to a modern skyscraper if it rains micro blackholes?
dbcc cleantable batch size explanation
A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?
What is a clear way to write a bar that has an extra beat?
How to format long polynomial?
How to source a part of a file
How do I deal with an unproductive colleague in a small company?
Is it possible to run Internet Explorer on OS X El Capitan?
"You are your self first supporter", a more proper way to say it
Can a vampire attack twice with their claws using Multiattack?
Was any UN Security Council vote triple-vetoed?
What typically incentivizes a professor to change jobs to a lower ranking university?
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
Theorems that impeded progress
How old can references or sources in a thesis be?
How much of data wrangling is a data scientist's job?
Alternative to sending password over mail?
Could an aircraft fly or hover using only jets of compressed air?
When can the sigmoid-activated output of a neural network be interpreted as a probability?
Can a regular neural network be used as a probablistic model rather than classifier?How can I derive the back propagation formula in a more elegant way?How to evaluate the quality of the probability distribution output of a classifier?Which functional space does feedforward neural network approximate?Neural network cost function - why squared error?Neural Network Sigmoid ProblemCan a regular neural network be used as a probablistic model rather than classifier?Expected Value of neural network outputCross-Entropy loss in Reinforcement LearningWhat does it essentially mean if the neural network has convex error surface?Why does a $sum_k$ appear when using the chain rule to derive $delta^L_j?$
$begingroup$
I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.
Since the sigmoid function -- specifically, in this case, the logistic function
$$frac11 + e^-x$$
-- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.
probability-theory machine-learning
$endgroup$
add a comment |
$begingroup$
I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.
Since the sigmoid function -- specifically, in this case, the logistic function
$$frac11 + e^-x$$
-- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.
probability-theory machine-learning
$endgroup$
add a comment |
$begingroup$
I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.
Since the sigmoid function -- specifically, in this case, the logistic function
$$frac11 + e^-x$$
-- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.
probability-theory machine-learning
$endgroup$
I've created a neural network whose final layer outputs a single sigmoid-activated value. I've trained the network on binary-labeled data (i.e. data labeled either 0 for negative class or 1 for positive class). Normally, when predicting the class of some unlabeled data, I would assume it to be of positive class if the output of the network for that data is above some cutoff (say, 0.5) and of negative class otherwise. However, I want to know whether I can correctly interpret the output of such a network as a probability that a given sample is of positive class.
Since the sigmoid function -- specifically, in this case, the logistic function
$$frac11 + e^-x$$
-- has range $[0, 1]$, it seems reasonable to interpret its outputs as probabilities, and I've seen a few sources that lead me to think that this is in fact a valid interpretation (e.g. this post), although I'm unsure about why, mathematically, this would be the case and under what conditions this would hold.
probability-theory machine-learning
probability-theory machine-learning
asked Mar 21 at 17:47
quevivasbienquevivasbien
153
153
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.
In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).

$endgroup$
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3157129%2fwhen-can-the-sigmoid-activated-output-of-a-neural-network-be-interpreted-as-a-pr%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.
In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).

$endgroup$
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
add a comment |
$begingroup$
The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.
In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).

$endgroup$
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
add a comment |
$begingroup$
The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.
In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).

$endgroup$
The logistic regression does not give you a real probability it is rather a measure for the confidence of the model (not in the meaning of statistical confidence interval). In order to understand the difference between confidence and probability imagine you have trained a logistic regression as a classifier for predicting if an image does show a snowy or dry road. Assume that we are using the mean pixel value and we are able to discriminate between snowy and dry roads. Now, we take a new image which is showing a wooden floor. We calculate the mean pixel value and put it into the logistic regression. As the mean pixel value will be somewhere between snowy road and dry road the logistic regression will give you some value from the magnitude of $approx 0.5$. Hence, if we interpret this result as probability this would mean that the logistic regression thinks that chances are $50-50$ for dry road vs. snowy road. But we know this is nonsense. But it is not a surprise because the logistic regression does only give us the confidence of the model, not the probability. This is why logistic regression is called a discriminative model.
In contrast to discriminative models, other models like generative probabilistic models try to model the distribution (often normal distribution) of the mean pixel value for the classes $mathcalC_1$ for dry road and $mathcalC_2$ for the snowy road. If we use such a procedure we will see that we get two distributions of the mean pixel values $x$. The following figure demonstrates the two distributions. If the distributions are very well separated (not much overlap) a new value of the mean pixel value $x$ will result in a low probability for both classes (e.g. imagine the mean pixel value of the wooden floor is at the intersection of both distributions).

answered Mar 22 at 23:18
MachineLearnerMachineLearner
1,319112
1,319112
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
add a comment |
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Just to make sure I understand what you're saying here: a logistic regression, being a discriminative model, is a model of the probability of seeing certain outcomes given certain features. But when we use it for, say, image classification, the images we are trying to classify have definite classes (they either are or are not part of a given class and are maybe even of a class that the model wasn't trained to deal with, as in your example), so we should think about the model's predictions as confidence levels rather than probabilities. Is this correct?
$endgroup$
– quevivasbien
Mar 25 at 18:36
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
$begingroup$
Yes, you are correct. The logistic regression in modeling $p(mathcalC_iboldsymbolx)$. In the binary case, we will be forced to assign one of the two classes to the observation.
$endgroup$
– MachineLearner
Mar 25 at 18:50
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3157129%2fwhen-can-the-sigmoid-activated-output-of-a-neural-network-be-interpreted-as-a-pr%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown