Floating point representation in 8 bitWhat are the biggest and smallest represent-able numbers with single precision floating points?What is the maximum difference between two successive real numbers in the given floating point representation?Floating point number,Mantissa,ExponentCalculating range and eps-machine of floating-point systemDoes the rounding unit of a floating point system depend only on the mantissa?Are all integers with exponent over 52 are even in 64 bit floating pointCalculate the largest possible floating-point value: formula?IEEE 754 32 and 64 bitFloating point representationHow to find smallest and largest representable number possible given a Normalized Floating Point System

How do I fix the group tension caused by my character stealing and possibly killing without provocation?

How do I Interface a PS/2 Keyboard without Modern Techniques?

Air travel with refrigerated insulin

When is "ei" a diphthong?

Unable to disable Microsoft Store in domain environment

Mimic lecturing on blackboard, facing audience

How to reduce predictors the right way for a logistic regression model

What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?

I'm just a whisper. Who am I?

Does Doodling or Improvising on the Piano Have Any Benefits?

Why does the Persian emissary display a string of crowned skulls?

What does "Scientists rise up against statistical significance" mean? (Comment in Nature)

Personal or impersonal in a technical resume

Quoting Keynes in a lecture

Can you identify this lizard-like creature I observed in the UK?

Why can't the Brexit deadlock in the UK parliament be solved with a plurality vote?

Should I assume I have passed probation?

Did I make a mistake by ccing email to boss to others?

What is the meaning of "You've never met a graph you didn't like?"

Should I warn new/prospective PhD Student that supervisor is terrible?

Limit max CPU usage SQL SERVER with WSRM

How to get directions in deep space?

How do I tell my boss that I'm quitting in 15 days (a colleague left this week)

"Oh no!" in Latin



Floating point representation in 8 bit


What are the biggest and smallest represent-able numbers with single precision floating points?What is the maximum difference between two successive real numbers in the given floating point representation?Floating point number,Mantissa,ExponentCalculating range and eps-machine of floating-point systemDoes the rounding unit of a floating point system depend only on the mantissa?Are all integers with exponent over 52 are even in 64 bit floating pointCalculate the largest possible floating-point value: formula?IEEE 754 32 and 64 bitFloating point representationHow to find smallest and largest representable number possible given a Normalized Floating Point System













2












$begingroup$



A computer has 8 bits of memory for floating point representation.



The first is assigned for the sign, the next four bits for the exponent and the last three for the mantissa.



The computer has no representation for $infty$ and 0 is represented like in IEEE754. Assume that the mantissa starts with $textbase^-1$ and that to the left of the mantissa there is an implied 1 that does not consume a place value.



  1. What is the smallest positive number that can be represented?


  2. What is the machine epsilon


  3. How many numbers in base 10 can be represented?




  1. In general the exponent is $2^(textbits)-2$ so in this case we have $2^4-2=14$ so the exponent range from -6 to 7 so the smallest positive number is $1.001*2^-6=2^-6+2^-9=0.017578125$


  2. To find machine epsilon we take $textbase^-(p-1)$ where $p$ is the number of significant bits in the mantissa which is $2^-(3-1)=2^-2=0.25$


How should I approach 3, and are my solutions to 1 and 2 correct?










share|cite|improve this question











$endgroup$











  • $begingroup$
    Regarding part 3, there are 8 bits available, so there are 256 possible bit patterns. If all of these represent distinct numbers, then 256 is the answer. So the question boils down to whether there are any cases where two distinct bit patterns represent the same number. Can this happen? Offhand I would think the only possible candidate would be 0 (with the sign bit set or not set), but I don't know the details of IEEE754 representation. I'm not sure what base 10 has to do with this.
    $endgroup$
    – Bungo
    Mar 5 '17 at 18:30











  • $begingroup$
    The quoted question is unfortunately incompletely specified. Since some of the encoding is "like IEEE-754," I would guess the exponent is probably meant to use excess-7 encoding, but could it be excess-6 or excess-8? $2^bits - 2$ is the number of possible exponents when we have infinities, NaN, and denormals; without infinities and NaN, another exponent is possible, and if 00000001 is treated as a normalized positive number it has yet another exponent.
    $endgroup$
    – David K
    Feb 12 '18 at 2:25















2












$begingroup$



A computer has 8 bits of memory for floating point representation.



The first is assigned for the sign, the next four bits for the exponent and the last three for the mantissa.



The computer has no representation for $infty$ and 0 is represented like in IEEE754. Assume that the mantissa starts with $textbase^-1$ and that to the left of the mantissa there is an implied 1 that does not consume a place value.



  1. What is the smallest positive number that can be represented?


  2. What is the machine epsilon


  3. How many numbers in base 10 can be represented?




  1. In general the exponent is $2^(textbits)-2$ so in this case we have $2^4-2=14$ so the exponent range from -6 to 7 so the smallest positive number is $1.001*2^-6=2^-6+2^-9=0.017578125$


  2. To find machine epsilon we take $textbase^-(p-1)$ where $p$ is the number of significant bits in the mantissa which is $2^-(3-1)=2^-2=0.25$


How should I approach 3, and are my solutions to 1 and 2 correct?










share|cite|improve this question











$endgroup$











  • $begingroup$
    Regarding part 3, there are 8 bits available, so there are 256 possible bit patterns. If all of these represent distinct numbers, then 256 is the answer. So the question boils down to whether there are any cases where two distinct bit patterns represent the same number. Can this happen? Offhand I would think the only possible candidate would be 0 (with the sign bit set or not set), but I don't know the details of IEEE754 representation. I'm not sure what base 10 has to do with this.
    $endgroup$
    – Bungo
    Mar 5 '17 at 18:30











  • $begingroup$
    The quoted question is unfortunately incompletely specified. Since some of the encoding is "like IEEE-754," I would guess the exponent is probably meant to use excess-7 encoding, but could it be excess-6 or excess-8? $2^bits - 2$ is the number of possible exponents when we have infinities, NaN, and denormals; without infinities and NaN, another exponent is possible, and if 00000001 is treated as a normalized positive number it has yet another exponent.
    $endgroup$
    – David K
    Feb 12 '18 at 2:25













2












2








2





$begingroup$



A computer has 8 bits of memory for floating point representation.



The first is assigned for the sign, the next four bits for the exponent and the last three for the mantissa.



The computer has no representation for $infty$ and 0 is represented like in IEEE754. Assume that the mantissa starts with $textbase^-1$ and that to the left of the mantissa there is an implied 1 that does not consume a place value.



  1. What is the smallest positive number that can be represented?


  2. What is the machine epsilon


  3. How many numbers in base 10 can be represented?




  1. In general the exponent is $2^(textbits)-2$ so in this case we have $2^4-2=14$ so the exponent range from -6 to 7 so the smallest positive number is $1.001*2^-6=2^-6+2^-9=0.017578125$


  2. To find machine epsilon we take $textbase^-(p-1)$ where $p$ is the number of significant bits in the mantissa which is $2^-(3-1)=2^-2=0.25$


How should I approach 3, and are my solutions to 1 and 2 correct?










share|cite|improve this question











$endgroup$





A computer has 8 bits of memory for floating point representation.



The first is assigned for the sign, the next four bits for the exponent and the last three for the mantissa.



The computer has no representation for $infty$ and 0 is represented like in IEEE754. Assume that the mantissa starts with $textbase^-1$ and that to the left of the mantissa there is an implied 1 that does not consume a place value.



  1. What is the smallest positive number that can be represented?


  2. What is the machine epsilon


  3. How many numbers in base 10 can be represented?




  1. In general the exponent is $2^(textbits)-2$ so in this case we have $2^4-2=14$ so the exponent range from -6 to 7 so the smallest positive number is $1.001*2^-6=2^-6+2^-9=0.017578125$


  2. To find machine epsilon we take $textbase^-(p-1)$ where $p$ is the number of significant bits in the mantissa which is $2^-(3-1)=2^-2=0.25$


How should I approach 3, and are my solutions to 1 and 2 correct?







numerical-methods floating-point






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Mar 14 at 7:52









Winfield Chen

484




484










asked Dec 27 '16 at 16:33









gboxgbox

5,49562262




5,49562262











  • $begingroup$
    Regarding part 3, there are 8 bits available, so there are 256 possible bit patterns. If all of these represent distinct numbers, then 256 is the answer. So the question boils down to whether there are any cases where two distinct bit patterns represent the same number. Can this happen? Offhand I would think the only possible candidate would be 0 (with the sign bit set or not set), but I don't know the details of IEEE754 representation. I'm not sure what base 10 has to do with this.
    $endgroup$
    – Bungo
    Mar 5 '17 at 18:30











  • $begingroup$
    The quoted question is unfortunately incompletely specified. Since some of the encoding is "like IEEE-754," I would guess the exponent is probably meant to use excess-7 encoding, but could it be excess-6 or excess-8? $2^bits - 2$ is the number of possible exponents when we have infinities, NaN, and denormals; without infinities and NaN, another exponent is possible, and if 00000001 is treated as a normalized positive number it has yet another exponent.
    $endgroup$
    – David K
    Feb 12 '18 at 2:25
















  • $begingroup$
    Regarding part 3, there are 8 bits available, so there are 256 possible bit patterns. If all of these represent distinct numbers, then 256 is the answer. So the question boils down to whether there are any cases where two distinct bit patterns represent the same number. Can this happen? Offhand I would think the only possible candidate would be 0 (with the sign bit set or not set), but I don't know the details of IEEE754 representation. I'm not sure what base 10 has to do with this.
    $endgroup$
    – Bungo
    Mar 5 '17 at 18:30











  • $begingroup$
    The quoted question is unfortunately incompletely specified. Since some of the encoding is "like IEEE-754," I would guess the exponent is probably meant to use excess-7 encoding, but could it be excess-6 or excess-8? $2^bits - 2$ is the number of possible exponents when we have infinities, NaN, and denormals; without infinities and NaN, another exponent is possible, and if 00000001 is treated as a normalized positive number it has yet another exponent.
    $endgroup$
    – David K
    Feb 12 '18 at 2:25















$begingroup$
Regarding part 3, there are 8 bits available, so there are 256 possible bit patterns. If all of these represent distinct numbers, then 256 is the answer. So the question boils down to whether there are any cases where two distinct bit patterns represent the same number. Can this happen? Offhand I would think the only possible candidate would be 0 (with the sign bit set or not set), but I don't know the details of IEEE754 representation. I'm not sure what base 10 has to do with this.
$endgroup$
– Bungo
Mar 5 '17 at 18:30





$begingroup$
Regarding part 3, there are 8 bits available, so there are 256 possible bit patterns. If all of these represent distinct numbers, then 256 is the answer. So the question boils down to whether there are any cases where two distinct bit patterns represent the same number. Can this happen? Offhand I would think the only possible candidate would be 0 (with the sign bit set or not set), but I don't know the details of IEEE754 representation. I'm not sure what base 10 has to do with this.
$endgroup$
– Bungo
Mar 5 '17 at 18:30













$begingroup$
The quoted question is unfortunately incompletely specified. Since some of the encoding is "like IEEE-754," I would guess the exponent is probably meant to use excess-7 encoding, but could it be excess-6 or excess-8? $2^bits - 2$ is the number of possible exponents when we have infinities, NaN, and denormals; without infinities and NaN, another exponent is possible, and if 00000001 is treated as a normalized positive number it has yet another exponent.
$endgroup$
– David K
Feb 12 '18 at 2:25




$begingroup$
The quoted question is unfortunately incompletely specified. Since some of the encoding is "like IEEE-754," I would guess the exponent is probably meant to use excess-7 encoding, but could it be excess-6 or excess-8? $2^bits - 2$ is the number of possible exponents when we have infinities, NaN, and denormals; without infinities and NaN, another exponent is possible, and if 00000001 is treated as a normalized positive number it has yet another exponent.
$endgroup$
– David K
Feb 12 '18 at 2:25










1 Answer
1






active

oldest

votes


















0












$begingroup$

Due to the finite precision of the computer, numbers used in calculations must conform to the format imposed by the machine. So only real numbers with a finite number of digits can be represented. A normalized floating point system $mathbbF=F(beta,p,e_textmin,e_textmax)$ consists of a set of real numbers written in normalized floating point form $x=pm m times beta^e$, where $m$ is the mantissa of $x$ and $e$ is the exponent.



If $x neq 0$ then the mantissa $m$ can be written as:
beginequation
m = a_N +a_N-1 beta^-1+...+a_-p beta^-p-N
endequation

with $a_N neq 0$ and $e_textmin leq e leq e_textmax$. If $x=0$ then the mantissa $m=0$ while the exponent $e$ can take any value.



In the above expressions, $p$ is the precision of the system, $beta$ the base, and $[e_textmin,e_textmax]$ the exponent range, with $e_textmin<0$, and $e_textmax=|e_textmin|+1$.



According to the definition the mantissa $m$ belongs to the range $[1,beta)$. The machine epsilon is $beta^1-p$ and represents the difference between the
mantissae of two successive positive numbers.
Now a number $x$ belong to the range $[x_textmin, x_textmax]$ where:
beginequation
x_textmin = beta^e_textmin
endequation

and
beginequation
x_textmax = (beta-1)(1+beta^-1+beta^-2+... + beta^-(p-1)) beta^e_textmax< beta^e_textmax+1
endequation

We now prove the statement above. The general representation of $x in mathbbR$ in base $beta$ is:
beginequation
x=pm (a_N beta^N+a_N-1 beta^N-1+...+a_1 beta+a_0+a_-1 beta^-1+...+a_-p beta^-p)= pm m times beta^e
endequation

When we collect the terms $beta^N$ we have:
beginequation
x=pm (a_N +a_N-1 beta^-1+...+a_1 beta^-N+1+a_0 beta^-N+a_-1 beta^-1-N+...+a_-p beta^-p-N) times beta^N= pm m times beta^e
endequation

We can identify $N$ with $e$ ($N=e$). Then:
beginequation
m=sum_i=-p^N a_i beta^i-N
endequation

The minimum value of $m$ is reached when $a_0=1$ and $a_i=0$ with $1 leq i leq p-1$. In this case $m=1$ and $x_textmin = beta^e_textmin$.
The maximum value of $m$ is obtained when $a_i=beta-1$ for all $0 leq i leq p-1$.



The machine epsilon is defined as $epsilon_M=beta^1-p$. It is a measure of the precision of the system, since it is a maximum bound on the relative distance between two consecutive numbers. It also represents the difference between the mantissae of two successive positive numbers. In normalized floating point systems, no number that does not fit the finite format imposed by the computer can be represented.



The total number of elements in $mathbbF$ is given by the following expression:
beginequation
2 (beta-1) beta^p-1 (e_textmax-e_textmin+1)+2
endequation

Computers can work with single- or double-precision. IEEE standard single-precision floating point numbers belong to the normalized floating point system $F(2, 24, −126, +127)$, while IEEE standard double-precision floating point numbers belong to the normalized floating point system $F(2, 53, −1022, +1023)$.






share|cite|improve this answer











$endgroup$












    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2073753%2ffloating-point-representation-in-8-bit%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    Due to the finite precision of the computer, numbers used in calculations must conform to the format imposed by the machine. So only real numbers with a finite number of digits can be represented. A normalized floating point system $mathbbF=F(beta,p,e_textmin,e_textmax)$ consists of a set of real numbers written in normalized floating point form $x=pm m times beta^e$, where $m$ is the mantissa of $x$ and $e$ is the exponent.



    If $x neq 0$ then the mantissa $m$ can be written as:
    beginequation
    m = a_N +a_N-1 beta^-1+...+a_-p beta^-p-N
    endequation

    with $a_N neq 0$ and $e_textmin leq e leq e_textmax$. If $x=0$ then the mantissa $m=0$ while the exponent $e$ can take any value.



    In the above expressions, $p$ is the precision of the system, $beta$ the base, and $[e_textmin,e_textmax]$ the exponent range, with $e_textmin<0$, and $e_textmax=|e_textmin|+1$.



    According to the definition the mantissa $m$ belongs to the range $[1,beta)$. The machine epsilon is $beta^1-p$ and represents the difference between the
    mantissae of two successive positive numbers.
    Now a number $x$ belong to the range $[x_textmin, x_textmax]$ where:
    beginequation
    x_textmin = beta^e_textmin
    endequation

    and
    beginequation
    x_textmax = (beta-1)(1+beta^-1+beta^-2+... + beta^-(p-1)) beta^e_textmax< beta^e_textmax+1
    endequation

    We now prove the statement above. The general representation of $x in mathbbR$ in base $beta$ is:
    beginequation
    x=pm (a_N beta^N+a_N-1 beta^N-1+...+a_1 beta+a_0+a_-1 beta^-1+...+a_-p beta^-p)= pm m times beta^e
    endequation

    When we collect the terms $beta^N$ we have:
    beginequation
    x=pm (a_N +a_N-1 beta^-1+...+a_1 beta^-N+1+a_0 beta^-N+a_-1 beta^-1-N+...+a_-p beta^-p-N) times beta^N= pm m times beta^e
    endequation

    We can identify $N$ with $e$ ($N=e$). Then:
    beginequation
    m=sum_i=-p^N a_i beta^i-N
    endequation

    The minimum value of $m$ is reached when $a_0=1$ and $a_i=0$ with $1 leq i leq p-1$. In this case $m=1$ and $x_textmin = beta^e_textmin$.
    The maximum value of $m$ is obtained when $a_i=beta-1$ for all $0 leq i leq p-1$.



    The machine epsilon is defined as $epsilon_M=beta^1-p$. It is a measure of the precision of the system, since it is a maximum bound on the relative distance between two consecutive numbers. It also represents the difference between the mantissae of two successive positive numbers. In normalized floating point systems, no number that does not fit the finite format imposed by the computer can be represented.



    The total number of elements in $mathbbF$ is given by the following expression:
    beginequation
    2 (beta-1) beta^p-1 (e_textmax-e_textmin+1)+2
    endequation

    Computers can work with single- or double-precision. IEEE standard single-precision floating point numbers belong to the normalized floating point system $F(2, 24, −126, +127)$, while IEEE standard double-precision floating point numbers belong to the normalized floating point system $F(2, 53, −1022, +1023)$.






    share|cite|improve this answer











    $endgroup$

















      0












      $begingroup$

      Due to the finite precision of the computer, numbers used in calculations must conform to the format imposed by the machine. So only real numbers with a finite number of digits can be represented. A normalized floating point system $mathbbF=F(beta,p,e_textmin,e_textmax)$ consists of a set of real numbers written in normalized floating point form $x=pm m times beta^e$, where $m$ is the mantissa of $x$ and $e$ is the exponent.



      If $x neq 0$ then the mantissa $m$ can be written as:
      beginequation
      m = a_N +a_N-1 beta^-1+...+a_-p beta^-p-N
      endequation

      with $a_N neq 0$ and $e_textmin leq e leq e_textmax$. If $x=0$ then the mantissa $m=0$ while the exponent $e$ can take any value.



      In the above expressions, $p$ is the precision of the system, $beta$ the base, and $[e_textmin,e_textmax]$ the exponent range, with $e_textmin<0$, and $e_textmax=|e_textmin|+1$.



      According to the definition the mantissa $m$ belongs to the range $[1,beta)$. The machine epsilon is $beta^1-p$ and represents the difference between the
      mantissae of two successive positive numbers.
      Now a number $x$ belong to the range $[x_textmin, x_textmax]$ where:
      beginequation
      x_textmin = beta^e_textmin
      endequation

      and
      beginequation
      x_textmax = (beta-1)(1+beta^-1+beta^-2+... + beta^-(p-1)) beta^e_textmax< beta^e_textmax+1
      endequation

      We now prove the statement above. The general representation of $x in mathbbR$ in base $beta$ is:
      beginequation
      x=pm (a_N beta^N+a_N-1 beta^N-1+...+a_1 beta+a_0+a_-1 beta^-1+...+a_-p beta^-p)= pm m times beta^e
      endequation

      When we collect the terms $beta^N$ we have:
      beginequation
      x=pm (a_N +a_N-1 beta^-1+...+a_1 beta^-N+1+a_0 beta^-N+a_-1 beta^-1-N+...+a_-p beta^-p-N) times beta^N= pm m times beta^e
      endequation

      We can identify $N$ with $e$ ($N=e$). Then:
      beginequation
      m=sum_i=-p^N a_i beta^i-N
      endequation

      The minimum value of $m$ is reached when $a_0=1$ and $a_i=0$ with $1 leq i leq p-1$. In this case $m=1$ and $x_textmin = beta^e_textmin$.
      The maximum value of $m$ is obtained when $a_i=beta-1$ for all $0 leq i leq p-1$.



      The machine epsilon is defined as $epsilon_M=beta^1-p$. It is a measure of the precision of the system, since it is a maximum bound on the relative distance between two consecutive numbers. It also represents the difference between the mantissae of two successive positive numbers. In normalized floating point systems, no number that does not fit the finite format imposed by the computer can be represented.



      The total number of elements in $mathbbF$ is given by the following expression:
      beginequation
      2 (beta-1) beta^p-1 (e_textmax-e_textmin+1)+2
      endequation

      Computers can work with single- or double-precision. IEEE standard single-precision floating point numbers belong to the normalized floating point system $F(2, 24, −126, +127)$, while IEEE standard double-precision floating point numbers belong to the normalized floating point system $F(2, 53, −1022, +1023)$.






      share|cite|improve this answer











      $endgroup$















        0












        0








        0





        $begingroup$

        Due to the finite precision of the computer, numbers used in calculations must conform to the format imposed by the machine. So only real numbers with a finite number of digits can be represented. A normalized floating point system $mathbbF=F(beta,p,e_textmin,e_textmax)$ consists of a set of real numbers written in normalized floating point form $x=pm m times beta^e$, where $m$ is the mantissa of $x$ and $e$ is the exponent.



        If $x neq 0$ then the mantissa $m$ can be written as:
        beginequation
        m = a_N +a_N-1 beta^-1+...+a_-p beta^-p-N
        endequation

        with $a_N neq 0$ and $e_textmin leq e leq e_textmax$. If $x=0$ then the mantissa $m=0$ while the exponent $e$ can take any value.



        In the above expressions, $p$ is the precision of the system, $beta$ the base, and $[e_textmin,e_textmax]$ the exponent range, with $e_textmin<0$, and $e_textmax=|e_textmin|+1$.



        According to the definition the mantissa $m$ belongs to the range $[1,beta)$. The machine epsilon is $beta^1-p$ and represents the difference between the
        mantissae of two successive positive numbers.
        Now a number $x$ belong to the range $[x_textmin, x_textmax]$ where:
        beginequation
        x_textmin = beta^e_textmin
        endequation

        and
        beginequation
        x_textmax = (beta-1)(1+beta^-1+beta^-2+... + beta^-(p-1)) beta^e_textmax< beta^e_textmax+1
        endequation

        We now prove the statement above. The general representation of $x in mathbbR$ in base $beta$ is:
        beginequation
        x=pm (a_N beta^N+a_N-1 beta^N-1+...+a_1 beta+a_0+a_-1 beta^-1+...+a_-p beta^-p)= pm m times beta^e
        endequation

        When we collect the terms $beta^N$ we have:
        beginequation
        x=pm (a_N +a_N-1 beta^-1+...+a_1 beta^-N+1+a_0 beta^-N+a_-1 beta^-1-N+...+a_-p beta^-p-N) times beta^N= pm m times beta^e
        endequation

        We can identify $N$ with $e$ ($N=e$). Then:
        beginequation
        m=sum_i=-p^N a_i beta^i-N
        endequation

        The minimum value of $m$ is reached when $a_0=1$ and $a_i=0$ with $1 leq i leq p-1$. In this case $m=1$ and $x_textmin = beta^e_textmin$.
        The maximum value of $m$ is obtained when $a_i=beta-1$ for all $0 leq i leq p-1$.



        The machine epsilon is defined as $epsilon_M=beta^1-p$. It is a measure of the precision of the system, since it is a maximum bound on the relative distance between two consecutive numbers. It also represents the difference between the mantissae of two successive positive numbers. In normalized floating point systems, no number that does not fit the finite format imposed by the computer can be represented.



        The total number of elements in $mathbbF$ is given by the following expression:
        beginequation
        2 (beta-1) beta^p-1 (e_textmax-e_textmin+1)+2
        endequation

        Computers can work with single- or double-precision. IEEE standard single-precision floating point numbers belong to the normalized floating point system $F(2, 24, −126, +127)$, while IEEE standard double-precision floating point numbers belong to the normalized floating point system $F(2, 53, −1022, +1023)$.






        share|cite|improve this answer











        $endgroup$



        Due to the finite precision of the computer, numbers used in calculations must conform to the format imposed by the machine. So only real numbers with a finite number of digits can be represented. A normalized floating point system $mathbbF=F(beta,p,e_textmin,e_textmax)$ consists of a set of real numbers written in normalized floating point form $x=pm m times beta^e$, where $m$ is the mantissa of $x$ and $e$ is the exponent.



        If $x neq 0$ then the mantissa $m$ can be written as:
        beginequation
        m = a_N +a_N-1 beta^-1+...+a_-p beta^-p-N
        endequation

        with $a_N neq 0$ and $e_textmin leq e leq e_textmax$. If $x=0$ then the mantissa $m=0$ while the exponent $e$ can take any value.



        In the above expressions, $p$ is the precision of the system, $beta$ the base, and $[e_textmin,e_textmax]$ the exponent range, with $e_textmin<0$, and $e_textmax=|e_textmin|+1$.



        According to the definition the mantissa $m$ belongs to the range $[1,beta)$. The machine epsilon is $beta^1-p$ and represents the difference between the
        mantissae of two successive positive numbers.
        Now a number $x$ belong to the range $[x_textmin, x_textmax]$ where:
        beginequation
        x_textmin = beta^e_textmin
        endequation

        and
        beginequation
        x_textmax = (beta-1)(1+beta^-1+beta^-2+... + beta^-(p-1)) beta^e_textmax< beta^e_textmax+1
        endequation

        We now prove the statement above. The general representation of $x in mathbbR$ in base $beta$ is:
        beginequation
        x=pm (a_N beta^N+a_N-1 beta^N-1+...+a_1 beta+a_0+a_-1 beta^-1+...+a_-p beta^-p)= pm m times beta^e
        endequation

        When we collect the terms $beta^N$ we have:
        beginequation
        x=pm (a_N +a_N-1 beta^-1+...+a_1 beta^-N+1+a_0 beta^-N+a_-1 beta^-1-N+...+a_-p beta^-p-N) times beta^N= pm m times beta^e
        endequation

        We can identify $N$ with $e$ ($N=e$). Then:
        beginequation
        m=sum_i=-p^N a_i beta^i-N
        endequation

        The minimum value of $m$ is reached when $a_0=1$ and $a_i=0$ with $1 leq i leq p-1$. In this case $m=1$ and $x_textmin = beta^e_textmin$.
        The maximum value of $m$ is obtained when $a_i=beta-1$ for all $0 leq i leq p-1$.



        The machine epsilon is defined as $epsilon_M=beta^1-p$. It is a measure of the precision of the system, since it is a maximum bound on the relative distance between two consecutive numbers. It also represents the difference between the mantissae of two successive positive numbers. In normalized floating point systems, no number that does not fit the finite format imposed by the computer can be represented.



        The total number of elements in $mathbbF$ is given by the following expression:
        beginequation
        2 (beta-1) beta^p-1 (e_textmax-e_textmin+1)+2
        endequation

        Computers can work with single- or double-precision. IEEE standard single-precision floating point numbers belong to the normalized floating point system $F(2, 24, −126, +127)$, while IEEE standard double-precision floating point numbers belong to the normalized floating point system $F(2, 53, −1022, +1023)$.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Mar 14 at 8:01









        Winfield Chen

        484




        484










        answered Mar 4 '17 at 20:37









        UpaxUpax

        1,522613




        1,522613



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2073753%2ffloating-point-representation-in-8-bit%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e

            Kathakali Contents Etymology and nomenclature History Repertoire Songs and musical instruments Traditional plays Styles: Sampradayam Training centers and awards Relationship to other dance forms See also Notes References External links Navigation menueThe Illustrated Encyclopedia of Hinduism: A-MSouth Asian Folklore: An EncyclopediaRoutledge International Encyclopedia of Women: Global Women's Issues and KnowledgeKathakali Dance-drama: Where Gods and Demons Come to PlayKathakali Dance-drama: Where Gods and Demons Come to PlayKathakali Dance-drama: Where Gods and Demons Come to Play10.1353/atj.2005.0004The Illustrated Encyclopedia of Hinduism: A-MEncyclopedia of HinduismKathakali Dance-drama: Where Gods and Demons Come to PlaySonic Liturgy: Ritual and Music in Hindu Tradition"The Mirror of Gesture"Kathakali Dance-drama: Where Gods and Demons Come to Play"Kathakali"Indian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceMedieval Indian Literature: An AnthologyThe Oxford Companion to Indian TheatreSouth Asian Folklore: An Encyclopedia : Afghanistan, Bangladesh, India, Nepal, Pakistan, Sri LankaThe Rise of Performance Studies: Rethinking Richard Schechner's Broad SpectrumIndian Theatre: Traditions of PerformanceModern Asian Theatre and Performance 1900-2000Critical Theory and PerformanceBetween Theater and AnthropologyKathakali603847011Indian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceBetween Theater and AnthropologyBetween Theater and AnthropologyNambeesan Smaraka AwardsArchivedThe Cambridge Guide to TheatreRoutledge International Encyclopedia of Women: Global Women's Issues and KnowledgeThe Garland Encyclopedia of World Music: South Asia : the Indian subcontinentThe Ethos of Noh: Actors and Their Art10.2307/1145740By Means of Performance: Intercultural Studies of Theatre and Ritual10.1017/s204912550000100xReconceiving the Renaissance: A Critical ReaderPerformance TheoryListening to Theatre: The Aural Dimension of Beijing Opera10.2307/1146013Kathakali: The Art of the Non-WorldlyOn KathakaliKathakali, the dance theatreThe Kathakali Complex: Performance & StructureKathakali Dance-Drama: Where Gods and Demons Come to Play10.1093/obo/9780195399318-0071Drama and Ritual of Early Hinduism"In the Shadow of Hollywood Orientalism: Authentic East Indian Dancing"10.1080/08949460490274013Sanskrit Play Production in Ancient IndiaIndian Music: History and StructureBharata, the Nāṭyaśāstra233639306Table of Contents2238067286469807Dance In Indian Painting10.2307/32047833204783Kathakali Dance-Theatre: A Visual Narrative of Sacred Indian MimeIndian Classical Dance: The Renaissance and BeyondKathakali: an indigenous art-form of Keralaeee

            Method to test if a number is a perfect power? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Detecting perfect squares faster than by extracting square rooteffective way to get the integer sequence A181392 from oeisA rarely mentioned fact about perfect powersHow many numbers such $n$ are there that $n<100,lfloorsqrtn rfloor mid n$Check perfect squareness by modulo division against multiple basesFor what pair of integers $(a,b)$ is $3^a + 7^b$ a perfect square.Do there exist any positive integers $n$ such that $lfloore^nrfloor$ is a perfect power? What is the probability that one exists?finding perfect power factors of an integerProve that the sequence contains a perfect square for any natural number $m $ in the domain of $f$ .Counting Perfect Powers