Examples of a statistic that is not independent of sample's distribution?Intuition behind (statistical) completenessJointly sufficient statistic?Is there a result that provides the bootstrap is valid if and only if the statistic is smooth?Definition of sufficient statistic when the support of the statistic depends on the unknown parameter?definition of Black swan random variablesHow to show that a sufficient statistic is NOT minimal sufficient?Conditional Expectation of Order Statisticminimum number of rolls necessary to determine how many sides a die hasOrder statistics for log series distributionP-value: Fisherian vs. contemporary frequentist definitions

Rules about breaking the rules. How do I do it well?

How to deal with taxi scam when on vacation?

Why are the outputs of printf and std::cout different

Why doesn't the EU now just force the UK to choose between referendum and no-deal?

Why do passenger jet manufacturers design their planes with stall prevention systems?

How could a female member of a species produce eggs unto death?

How to deal with a cynical class?

Will a pinhole camera work with instant film?

Why must traveling waves have the same amplitude to form a standing wave?

Why did it take so long to abandon sail after steamships were demonstrated?

Does this AnyDice function accurately calculate the number of ogres you make unconcious with three 4th-level castings of Sleep?

What has been your most complicated TikZ drawing?

Current sense amp + op-amp buffer + ADC: Measuring down to 0 with single supply

Bash: What does "masking return values" mean?

I need to drive a 7/16" nut but am unsure how to use the socket I bought for my screwdriver

Is it possible that AIC = BIC?

Define, (actually define) the "stability" and "energy" of a compound

2D counterpart of std::array in C++17

Be in awe of my brilliance!

What are some nice/clever ways to introduce the tonic's dominant seventh chord?

Does this property of comaximal ideals always holds?

Sword in the Stone story where the sword was held in place by electromagnets

Is it true that real estate prices mainly go up?

Ban on all campaign finance?



Examples of a statistic that is not independent of sample's distribution?


Intuition behind (statistical) completenessJointly sufficient statistic?Is there a result that provides the bootstrap is valid if and only if the statistic is smooth?Definition of sufficient statistic when the support of the statistic depends on the unknown parameter?definition of Black swan random variablesHow to show that a sufficient statistic is NOT minimal sufficient?Conditional Expectation of Order Statisticminimum number of rolls necessary to determine how many sides a die hasOrder statistics for log series distributionP-value: Fisherian vs. contemporary frequentist definitions













13












$begingroup$


This is the definition for statistic on wikipedia




More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realization of the data. The term statistic is used both for the function and for the value of the function on a given sample.




I think I understand most of this definition, however the part - where the function is independent of the sample's distribution I haven't been able to sort out.



My understanding of statistic so far



A sample is a set of realizations of some number of independent, identically distributed (iid) random variables with distribution F (10 realizations of a roll of a 20-sided fair dice, 100 realizations of 5 rolls of a 6-sided fair dice, randomly draw 100 people from a population).



A function, whose domain is that set, and whose range is the real numbers (or maybe it can produce other things, like a vector or other mathematical object...) would be considered a statistic.



When I think of examples, mean, median, variance all make sense in this context. They are a function on set of realizations (blood pressure measurements from a random sample). I can also see how a linear regression model could be considered a statistic $y_i = alpha + beta cdot x_i$ - is this not just a function on a set of realizations?



Where I'm confused



Assuming that my understanding from above is correct, I haven't been able to understand where a function might not be independent of the sample's distribution. I've been trying to think of an example to make sense of it, but no luck. Any insight would be much appreciated!










share|cite|improve this question







New contributor




Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    13












    $begingroup$


    This is the definition for statistic on wikipedia




    More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realization of the data. The term statistic is used both for the function and for the value of the function on a given sample.




    I think I understand most of this definition, however the part - where the function is independent of the sample's distribution I haven't been able to sort out.



    My understanding of statistic so far



    A sample is a set of realizations of some number of independent, identically distributed (iid) random variables with distribution F (10 realizations of a roll of a 20-sided fair dice, 100 realizations of 5 rolls of a 6-sided fair dice, randomly draw 100 people from a population).



    A function, whose domain is that set, and whose range is the real numbers (or maybe it can produce other things, like a vector or other mathematical object...) would be considered a statistic.



    When I think of examples, mean, median, variance all make sense in this context. They are a function on set of realizations (blood pressure measurements from a random sample). I can also see how a linear regression model could be considered a statistic $y_i = alpha + beta cdot x_i$ - is this not just a function on a set of realizations?



    Where I'm confused



    Assuming that my understanding from above is correct, I haven't been able to understand where a function might not be independent of the sample's distribution. I've been trying to think of an example to make sense of it, but no luck. Any insight would be much appreciated!










    share|cite|improve this question







    New contributor




    Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      13












      13








      13


      2



      $begingroup$


      This is the definition for statistic on wikipedia




      More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realization of the data. The term statistic is used both for the function and for the value of the function on a given sample.




      I think I understand most of this definition, however the part - where the function is independent of the sample's distribution I haven't been able to sort out.



      My understanding of statistic so far



      A sample is a set of realizations of some number of independent, identically distributed (iid) random variables with distribution F (10 realizations of a roll of a 20-sided fair dice, 100 realizations of 5 rolls of a 6-sided fair dice, randomly draw 100 people from a population).



      A function, whose domain is that set, and whose range is the real numbers (or maybe it can produce other things, like a vector or other mathematical object...) would be considered a statistic.



      When I think of examples, mean, median, variance all make sense in this context. They are a function on set of realizations (blood pressure measurements from a random sample). I can also see how a linear regression model could be considered a statistic $y_i = alpha + beta cdot x_i$ - is this not just a function on a set of realizations?



      Where I'm confused



      Assuming that my understanding from above is correct, I haven't been able to understand where a function might not be independent of the sample's distribution. I've been trying to think of an example to make sense of it, but no luck. Any insight would be much appreciated!










      share|cite|improve this question







      New contributor




      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      This is the definition for statistic on wikipedia




      More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realization of the data. The term statistic is used both for the function and for the value of the function on a given sample.




      I think I understand most of this definition, however the part - where the function is independent of the sample's distribution I haven't been able to sort out.



      My understanding of statistic so far



      A sample is a set of realizations of some number of independent, identically distributed (iid) random variables with distribution F (10 realizations of a roll of a 20-sided fair dice, 100 realizations of 5 rolls of a 6-sided fair dice, randomly draw 100 people from a population).



      A function, whose domain is that set, and whose range is the real numbers (or maybe it can produce other things, like a vector or other mathematical object...) would be considered a statistic.



      When I think of examples, mean, median, variance all make sense in this context. They are a function on set of realizations (blood pressure measurements from a random sample). I can also see how a linear regression model could be considered a statistic $y_i = alpha + beta cdot x_i$ - is this not just a function on a set of realizations?



      Where I'm confused



      Assuming that my understanding from above is correct, I haven't been able to understand where a function might not be independent of the sample's distribution. I've been trying to think of an example to make sense of it, but no luck. Any insight would be much appreciated!







      mathematical-statistics definition






      share|cite|improve this question







      New contributor




      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question







      New contributor




      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question






      New contributor




      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 11 at 9:55









      Jake KirschJake Kirsch

      755




      755




      New contributor




      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Jake Kirsch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          43












          $begingroup$

          That definition is a somewhat awkward way to state it. A "statistic" is any function of the observable values. All that definition means is that a statistic is a function only of the observable values, not a function of the distribution or any of its parameters. For example, if $X_1, X_2, ..., X_n sim textN(mu, 1)$ then a statistic would be any function $T(X_1,...,X_n)$ whereas a function $H(X_1,....,X_n, mu)$ would not be a statistic, since it depends on $mu$. Here are some further examples:



          $$beginequation beginaligned
          textStatistic & & & & & barX_n = frac1n sum_i=1^n X_i, \[12pt]
          textStatistic & & & & & S_n^2 = frac1n sum_i=1^n (X_i - barX_n)^2, \[12pt]
          textNot a statistic & & & & & D_n = barX_n - mu, \[12pt]
          textNot a statistic & & & & & p_i = textN(x_i | mu, 1), \[12pt]
          textNot a statistic & & & & & Q = 10 mu. \[12pt]
          endaligned endequation$$



          Every statistic is a function only of the observable values, and not of their distribution or its parameters. So there are no examples of a statistic that is a function of the distribution or its parameters (any such function would not be a statistic). However, it is important to note that the distribution of a statistic (as opposed to the statistic itself) will generally depend on the underlying distribution of the values. (This is true for all statistics other than ancillary statistics.)




          What about a function where the parameters are known? In the comments below, Alecos asks an excellent follow-up question. What about a function that uses a fixed hypothesised value of the parameter? For example, what about the statistic $sqrtn (barx - mu)$ where $mu = mu_0$ is taken to be equal to a known hypothesised value $mu_0 in mathbbR$. Here the function is indeed a statistic, so long as it is defined on the appropriately restricted domain. So the function $H_0: mathbbR^n rightarrow mathbbR$ with $H_0(x_1,...,x_n) = sqrtn (barx - mu_0)$ would be a statistic, but the function $H: mathbbR^n+1 rightarrow mathbbR$ with $H(x_1,...,x_n, mu) = sqrtn (barx - mu)$ would not be a statistic.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 10:56






          • 4




            $begingroup$
            @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
            $endgroup$
            – James Martin
            Mar 11 at 14:06







          • 4




            $begingroup$
            Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
            $endgroup$
            – James Martin
            Mar 11 at 14:53







          • 2




            $begingroup$
            I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 15:41







          • 2




            $begingroup$
            @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
            $endgroup$
            – Ben
            Mar 11 at 21:11


















          4












          $begingroup$

          I interpret that as saying that you should decide before you see the data what statistic you are going to calculate. So, for instance, if you're going to take out outliers, you should decide before you see the data what constitutes an "outlier". If you decide after you see the data, then your function is dependent on the data.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 19:37






          • 2




            $begingroup$
            (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
            $endgroup$
            – Cliff AB
            Mar 11 at 23:41










          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          Jake Kirsch is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f396815%2fexamples-of-a-statistic-that-is-not-independent-of-samples-distribution%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          43












          $begingroup$

          That definition is a somewhat awkward way to state it. A "statistic" is any function of the observable values. All that definition means is that a statistic is a function only of the observable values, not a function of the distribution or any of its parameters. For example, if $X_1, X_2, ..., X_n sim textN(mu, 1)$ then a statistic would be any function $T(X_1,...,X_n)$ whereas a function $H(X_1,....,X_n, mu)$ would not be a statistic, since it depends on $mu$. Here are some further examples:



          $$beginequation beginaligned
          textStatistic & & & & & barX_n = frac1n sum_i=1^n X_i, \[12pt]
          textStatistic & & & & & S_n^2 = frac1n sum_i=1^n (X_i - barX_n)^2, \[12pt]
          textNot a statistic & & & & & D_n = barX_n - mu, \[12pt]
          textNot a statistic & & & & & p_i = textN(x_i | mu, 1), \[12pt]
          textNot a statistic & & & & & Q = 10 mu. \[12pt]
          endaligned endequation$$



          Every statistic is a function only of the observable values, and not of their distribution or its parameters. So there are no examples of a statistic that is a function of the distribution or its parameters (any such function would not be a statistic). However, it is important to note that the distribution of a statistic (as opposed to the statistic itself) will generally depend on the underlying distribution of the values. (This is true for all statistics other than ancillary statistics.)




          What about a function where the parameters are known? In the comments below, Alecos asks an excellent follow-up question. What about a function that uses a fixed hypothesised value of the parameter? For example, what about the statistic $sqrtn (barx - mu)$ where $mu = mu_0$ is taken to be equal to a known hypothesised value $mu_0 in mathbbR$. Here the function is indeed a statistic, so long as it is defined on the appropriately restricted domain. So the function $H_0: mathbbR^n rightarrow mathbbR$ with $H_0(x_1,...,x_n) = sqrtn (barx - mu_0)$ would be a statistic, but the function $H: mathbbR^n+1 rightarrow mathbbR$ with $H(x_1,...,x_n, mu) = sqrtn (barx - mu)$ would not be a statistic.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 10:56






          • 4




            $begingroup$
            @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
            $endgroup$
            – James Martin
            Mar 11 at 14:06







          • 4




            $begingroup$
            Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
            $endgroup$
            – James Martin
            Mar 11 at 14:53







          • 2




            $begingroup$
            I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 15:41







          • 2




            $begingroup$
            @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
            $endgroup$
            – Ben
            Mar 11 at 21:11















          43












          $begingroup$

          That definition is a somewhat awkward way to state it. A "statistic" is any function of the observable values. All that definition means is that a statistic is a function only of the observable values, not a function of the distribution or any of its parameters. For example, if $X_1, X_2, ..., X_n sim textN(mu, 1)$ then a statistic would be any function $T(X_1,...,X_n)$ whereas a function $H(X_1,....,X_n, mu)$ would not be a statistic, since it depends on $mu$. Here are some further examples:



          $$beginequation beginaligned
          textStatistic & & & & & barX_n = frac1n sum_i=1^n X_i, \[12pt]
          textStatistic & & & & & S_n^2 = frac1n sum_i=1^n (X_i - barX_n)^2, \[12pt]
          textNot a statistic & & & & & D_n = barX_n - mu, \[12pt]
          textNot a statistic & & & & & p_i = textN(x_i | mu, 1), \[12pt]
          textNot a statistic & & & & & Q = 10 mu. \[12pt]
          endaligned endequation$$



          Every statistic is a function only of the observable values, and not of their distribution or its parameters. So there are no examples of a statistic that is a function of the distribution or its parameters (any such function would not be a statistic). However, it is important to note that the distribution of a statistic (as opposed to the statistic itself) will generally depend on the underlying distribution of the values. (This is true for all statistics other than ancillary statistics.)




          What about a function where the parameters are known? In the comments below, Alecos asks an excellent follow-up question. What about a function that uses a fixed hypothesised value of the parameter? For example, what about the statistic $sqrtn (barx - mu)$ where $mu = mu_0$ is taken to be equal to a known hypothesised value $mu_0 in mathbbR$. Here the function is indeed a statistic, so long as it is defined on the appropriately restricted domain. So the function $H_0: mathbbR^n rightarrow mathbbR$ with $H_0(x_1,...,x_n) = sqrtn (barx - mu_0)$ would be a statistic, but the function $H: mathbbR^n+1 rightarrow mathbbR$ with $H(x_1,...,x_n, mu) = sqrtn (barx - mu)$ would not be a statistic.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 10:56






          • 4




            $begingroup$
            @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
            $endgroup$
            – James Martin
            Mar 11 at 14:06







          • 4




            $begingroup$
            Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
            $endgroup$
            – James Martin
            Mar 11 at 14:53







          • 2




            $begingroup$
            I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 15:41







          • 2




            $begingroup$
            @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
            $endgroup$
            – Ben
            Mar 11 at 21:11













          43












          43








          43





          $begingroup$

          That definition is a somewhat awkward way to state it. A "statistic" is any function of the observable values. All that definition means is that a statistic is a function only of the observable values, not a function of the distribution or any of its parameters. For example, if $X_1, X_2, ..., X_n sim textN(mu, 1)$ then a statistic would be any function $T(X_1,...,X_n)$ whereas a function $H(X_1,....,X_n, mu)$ would not be a statistic, since it depends on $mu$. Here are some further examples:



          $$beginequation beginaligned
          textStatistic & & & & & barX_n = frac1n sum_i=1^n X_i, \[12pt]
          textStatistic & & & & & S_n^2 = frac1n sum_i=1^n (X_i - barX_n)^2, \[12pt]
          textNot a statistic & & & & & D_n = barX_n - mu, \[12pt]
          textNot a statistic & & & & & p_i = textN(x_i | mu, 1), \[12pt]
          textNot a statistic & & & & & Q = 10 mu. \[12pt]
          endaligned endequation$$



          Every statistic is a function only of the observable values, and not of their distribution or its parameters. So there are no examples of a statistic that is a function of the distribution or its parameters (any such function would not be a statistic). However, it is important to note that the distribution of a statistic (as opposed to the statistic itself) will generally depend on the underlying distribution of the values. (This is true for all statistics other than ancillary statistics.)




          What about a function where the parameters are known? In the comments below, Alecos asks an excellent follow-up question. What about a function that uses a fixed hypothesised value of the parameter? For example, what about the statistic $sqrtn (barx - mu)$ where $mu = mu_0$ is taken to be equal to a known hypothesised value $mu_0 in mathbbR$. Here the function is indeed a statistic, so long as it is defined on the appropriately restricted domain. So the function $H_0: mathbbR^n rightarrow mathbbR$ with $H_0(x_1,...,x_n) = sqrtn (barx - mu_0)$ would be a statistic, but the function $H: mathbbR^n+1 rightarrow mathbbR$ with $H(x_1,...,x_n, mu) = sqrtn (barx - mu)$ would not be a statistic.






          share|cite|improve this answer











          $endgroup$



          That definition is a somewhat awkward way to state it. A "statistic" is any function of the observable values. All that definition means is that a statistic is a function only of the observable values, not a function of the distribution or any of its parameters. For example, if $X_1, X_2, ..., X_n sim textN(mu, 1)$ then a statistic would be any function $T(X_1,...,X_n)$ whereas a function $H(X_1,....,X_n, mu)$ would not be a statistic, since it depends on $mu$. Here are some further examples:



          $$beginequation beginaligned
          textStatistic & & & & & barX_n = frac1n sum_i=1^n X_i, \[12pt]
          textStatistic & & & & & S_n^2 = frac1n sum_i=1^n (X_i - barX_n)^2, \[12pt]
          textNot a statistic & & & & & D_n = barX_n - mu, \[12pt]
          textNot a statistic & & & & & p_i = textN(x_i | mu, 1), \[12pt]
          textNot a statistic & & & & & Q = 10 mu. \[12pt]
          endaligned endequation$$



          Every statistic is a function only of the observable values, and not of their distribution or its parameters. So there are no examples of a statistic that is a function of the distribution or its parameters (any such function would not be a statistic). However, it is important to note that the distribution of a statistic (as opposed to the statistic itself) will generally depend on the underlying distribution of the values. (This is true for all statistics other than ancillary statistics.)




          What about a function where the parameters are known? In the comments below, Alecos asks an excellent follow-up question. What about a function that uses a fixed hypothesised value of the parameter? For example, what about the statistic $sqrtn (barx - mu)$ where $mu = mu_0$ is taken to be equal to a known hypothesised value $mu_0 in mathbbR$. Here the function is indeed a statistic, so long as it is defined on the appropriately restricted domain. So the function $H_0: mathbbR^n rightarrow mathbbR$ with $H_0(x_1,...,x_n) = sqrtn (barx - mu_0)$ would be a statistic, but the function $H: mathbbR^n+1 rightarrow mathbbR$ with $H(x_1,...,x_n, mu) = sqrtn (barx - mu)$ would not be a statistic.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Mar 11 at 21:17

























          answered Mar 11 at 10:05









          BenBen

          26.7k230124




          26.7k230124







          • 1




            $begingroup$
            Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 10:56






          • 4




            $begingroup$
            @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
            $endgroup$
            – James Martin
            Mar 11 at 14:06







          • 4




            $begingroup$
            Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
            $endgroup$
            – James Martin
            Mar 11 at 14:53







          • 2




            $begingroup$
            I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 15:41







          • 2




            $begingroup$
            @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
            $endgroup$
            – Ben
            Mar 11 at 21:11












          • 1




            $begingroup$
            Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 10:56






          • 4




            $begingroup$
            @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
            $endgroup$
            – James Martin
            Mar 11 at 14:06







          • 4




            $begingroup$
            Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
            $endgroup$
            – James Martin
            Mar 11 at 14:53







          • 2




            $begingroup$
            I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 15:41







          • 2




            $begingroup$
            @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
            $endgroup$
            – Ben
            Mar 11 at 21:11







          1




          1




          $begingroup$
          Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
          $endgroup$
          – Jake Kirsch
          Mar 11 at 10:56




          $begingroup$
          Very helpful answer, considering the underlying statistical parameter as part of the non-statistic was particularly helpful.
          $endgroup$
          – Jake Kirsch
          Mar 11 at 10:56




          4




          4




          $begingroup$
          @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
          $endgroup$
          – James Martin
          Mar 11 at 14:06





          $begingroup$
          @CarlWitthoft I don't get your point. If it's a function of the observable values, then it's a statistic. It may be a function of a smaller subset of the values; that can still be a useful thing to consider. If you want to estimate the mean and you have $10^10$ observations, you might still look at $(X_1+X_2+dots+X_1000)/1000$ if the cost of processing data is high and the cost of error is small. Or for some reason you might want to consider two independent estimates of the mean, and could consider $(X_1+dots+X_n/2)/(n/2)$ and $(X_n/2+1+dots+X_n)/(n/2)$. These are still statistics.
          $endgroup$
          – James Martin
          Mar 11 at 14:06





          4




          4




          $begingroup$
          Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
          $endgroup$
          – James Martin
          Mar 11 at 14:53





          $begingroup$
          Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?
          $endgroup$
          – James Martin
          Mar 11 at 14:53





          2




          2




          $begingroup$
          I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
          $endgroup$
          – Jake Kirsch
          Mar 11 at 15:41





          $begingroup$
          I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.
          $endgroup$
          – Jake Kirsch
          Mar 11 at 15:41





          2




          2




          $begingroup$
          @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
          $endgroup$
          – Ben
          Mar 11 at 21:11




          $begingroup$
          @Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".
          $endgroup$
          – Ben
          Mar 11 at 21:11













          4












          $begingroup$

          I interpret that as saying that you should decide before you see the data what statistic you are going to calculate. So, for instance, if you're going to take out outliers, you should decide before you see the data what constitutes an "outlier". If you decide after you see the data, then your function is dependent on the data.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 19:37






          • 2




            $begingroup$
            (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
            $endgroup$
            – Cliff AB
            Mar 11 at 23:41















          4












          $begingroup$

          I interpret that as saying that you should decide before you see the data what statistic you are going to calculate. So, for instance, if you're going to take out outliers, you should decide before you see the data what constitutes an "outlier". If you decide after you see the data, then your function is dependent on the data.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 19:37






          • 2




            $begingroup$
            (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
            $endgroup$
            – Cliff AB
            Mar 11 at 23:41













          4












          4








          4





          $begingroup$

          I interpret that as saying that you should decide before you see the data what statistic you are going to calculate. So, for instance, if you're going to take out outliers, you should decide before you see the data what constitutes an "outlier". If you decide after you see the data, then your function is dependent on the data.






          share|cite|improve this answer









          $endgroup$



          I interpret that as saying that you should decide before you see the data what statistic you are going to calculate. So, for instance, if you're going to take out outliers, you should decide before you see the data what constitutes an "outlier". If you decide after you see the data, then your function is dependent on the data.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Mar 11 at 17:29









          AcccumulationAcccumulation

          1,68626




          1,68626











          • $begingroup$
            this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 19:37






          • 2




            $begingroup$
            (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
            $endgroup$
            – Cliff AB
            Mar 11 at 23:41
















          • $begingroup$
            this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
            $endgroup$
            – Jake Kirsch
            Mar 11 at 19:37






          • 2




            $begingroup$
            (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
            $endgroup$
            – Cliff AB
            Mar 11 at 23:41















          $begingroup$
          this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
          $endgroup$
          – Jake Kirsch
          Mar 11 at 19:37




          $begingroup$
          this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.
          $endgroup$
          – Jake Kirsch
          Mar 11 at 19:37




          2




          2




          $begingroup$
          (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
          $endgroup$
          – Cliff AB
          Mar 11 at 23:41




          $begingroup$
          (+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.
          $endgroup$
          – Cliff AB
          Mar 11 at 23:41










          Jake Kirsch is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          Jake Kirsch is a new contributor. Be nice, and check out our Code of Conduct.












          Jake Kirsch is a new contributor. Be nice, and check out our Code of Conduct.











          Jake Kirsch is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f396815%2fexamples-of-a-statistic-that-is-not-independent-of-samples-distribution%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e

          Kathakali Contents Etymology and nomenclature History Repertoire Songs and musical instruments Traditional plays Styles: Sampradayam Training centers and awards Relationship to other dance forms See also Notes References External links Navigation menueThe Illustrated Encyclopedia of Hinduism: A-MSouth Asian Folklore: An EncyclopediaRoutledge International Encyclopedia of Women: Global Women's Issues and KnowledgeKathakali Dance-drama: Where Gods and Demons Come to PlayKathakali Dance-drama: Where Gods and Demons Come to PlayKathakali Dance-drama: Where Gods and Demons Come to Play10.1353/atj.2005.0004The Illustrated Encyclopedia of Hinduism: A-MEncyclopedia of HinduismKathakali Dance-drama: Where Gods and Demons Come to PlaySonic Liturgy: Ritual and Music in Hindu Tradition"The Mirror of Gesture"Kathakali Dance-drama: Where Gods and Demons Come to Play"Kathakali"Indian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceMedieval Indian Literature: An AnthologyThe Oxford Companion to Indian TheatreSouth Asian Folklore: An Encyclopedia : Afghanistan, Bangladesh, India, Nepal, Pakistan, Sri LankaThe Rise of Performance Studies: Rethinking Richard Schechner's Broad SpectrumIndian Theatre: Traditions of PerformanceModern Asian Theatre and Performance 1900-2000Critical Theory and PerformanceBetween Theater and AnthropologyKathakali603847011Indian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceIndian Theatre: Traditions of PerformanceBetween Theater and AnthropologyBetween Theater and AnthropologyNambeesan Smaraka AwardsArchivedThe Cambridge Guide to TheatreRoutledge International Encyclopedia of Women: Global Women's Issues and KnowledgeThe Garland Encyclopedia of World Music: South Asia : the Indian subcontinentThe Ethos of Noh: Actors and Their Art10.2307/1145740By Means of Performance: Intercultural Studies of Theatre and Ritual10.1017/s204912550000100xReconceiving the Renaissance: A Critical ReaderPerformance TheoryListening to Theatre: The Aural Dimension of Beijing Opera10.2307/1146013Kathakali: The Art of the Non-WorldlyOn KathakaliKathakali, the dance theatreThe Kathakali Complex: Performance & StructureKathakali Dance-Drama: Where Gods and Demons Come to Play10.1093/obo/9780195399318-0071Drama and Ritual of Early Hinduism"In the Shadow of Hollywood Orientalism: Authentic East Indian Dancing"10.1080/08949460490274013Sanskrit Play Production in Ancient IndiaIndian Music: History and StructureBharata, the Nāṭyaśāstra233639306Table of Contents2238067286469807Dance In Indian Painting10.2307/32047833204783Kathakali Dance-Theatre: A Visual Narrative of Sacred Indian MimeIndian Classical Dance: The Renaissance and BeyondKathakali: an indigenous art-form of Keralaeee

          Method to test if a number is a perfect power? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Detecting perfect squares faster than by extracting square rooteffective way to get the integer sequence A181392 from oeisA rarely mentioned fact about perfect powersHow many numbers such $n$ are there that $n<100,lfloorsqrtn rfloor mid n$Check perfect squareness by modulo division against multiple basesFor what pair of integers $(a,b)$ is $3^a + 7^b$ a perfect square.Do there exist any positive integers $n$ such that $lfloore^nrfloor$ is a perfect power? What is the probability that one exists?finding perfect power factors of an integerProve that the sequence contains a perfect square for any natural number $m $ in the domain of $f$ .Counting Perfect Powers