Information

Bounds on skew and kurtosis of IQ

Bounds on skew and kurtosis of IQ


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The question of whether IQ is Normally distributed, or instead follows e.g. a Pearson type IV distribution, has been debated since at least the 1910s. The quotient- and deviation-based definitions give rise to very different eras in that debate, of course. (However, the distribution of an integer-valued IQ cannot be exactly Normal, even on a deviation-based definition.) A Normal distribution is uniquely characterised by its mean $mu$ and standard deviation $sigma$. Its next two moments are the skew $gamma_1=0$ and excess kurtosis $kappa_ ext{excess}=0$. To disambiguate, I've defined

$$gamma_1=mathbb{E}igg( frac{X-mu}{sigma}igg)^3,,kappa_ ext{excess}:=mathbb{E}igg( frac{X-mu}{sigma}igg)^4-3.$$

By contrast, a Pearson type IV distribution requires all four moments to be specified.

While we can't literally prove $gamma_1=kappa_ ext{excess}=0$ empirically, we can constrain such quantities. Have any empirical studies provided either upper or lower bounds on these moments of the IQ distribution (or something analogous such as another quantification estimating psychometric $g$), on either the quotient or deviation definition? In the interests of keeping this question appropriate to the site, I don't care what method of defining or measuring IQ was assumed in a particular study, so there's no need to take a stance on that.


There are studies where higher order moments are analyzed. Just off the top of my head, see (Johnson, Carothers, Deary, 2008). The actual point of this study was to examine the Greater Male Variability Hypothesis (which the data was found to be strongly consistent with), however they also analyzed the distributions of ability more generally. They analyze the Scottish Mental Survey data, which tested essentially all children of Scotland of a given age. They find that the distribution is definitely unsymmetrical with more people below the mode. Here is the relevant part of the abstract:

… Clear analysis of the actual distribution of general intelligence based on large and appropriately population-representative samples is rare, however. Using two population-wide surveys of general intelligence in 11-year-olds in Scotland, we showed that there were substantial departures from normality in the distribution, with less variability in the higher range than in the lower. Despite mean IQ-scale scores of 100, modal scores were about 105… This is consistent with a model of the population distribution of general intelligence as a mixture of two essentially normal distributions, one reflecting normal variation in general intelligence and one reflecting normal variation in effects of genetic and environmental conditions involving mental retardation.

See the study for further discussion around kurtosis and skewness. They also reference other studies you may find valuable.


One answer is that, since $g$ does not really exist as a unidimensional biological entity (rather, it is a near infinite-dimensional soup of inherited DNA and accumulated life experiences), the question of its univariate distribution is moot.


‘s’ Possession and ‘of’ Possession

Possessive nouns are used in different ways to express different meanings. The most common uses are expressing

  1. Possession or ownership: The family’s dog.
  2. Association: Archer’s office,
  3. Action: Lana’s determination to shoot Archer.
  4. Measurement: The train’s delay,
  5. Characteristics of something: Lana’s Black Hair.

Often the preposition ‘of’ may serve an alternative to ‘s’ possessives. As I have said in my previous articles, repetitive sentence formate makes your writing dull. So it’s a good idea to switch between ‘s’ possessive and ‘of’ possessives. To do that, at first we need to know when we can use ‘of’ possessives in the right manner.

  1. Possession: To express ownership it’s always preferable to use the s possessive. For example, ‘Archer’s gun’ is preferred to ‘The gun of Archer,
  2. Associations: When the possessive noun is animate always s possessives. E.g. Archer’s Mother, Lana’s Baby, Kriger’s holographic girlfriend. But when the possessive noun is inanimate both s possessives and of possessives can be used. E.g. The university’s area, Area of the university.
  3. Attribution: Like the case of the association, here also use only s possessives for animate nouns. For inanimate nouns, both ‘s’ possessives and ‘of’ possessives can be used.

For Example: Danny’s White Hair. (Animate Possessive Noun) The university’s garage. (Inanimate Possessive Noun) The garage of the university. (Inanimate Possessive Noun)

  • 4. Action & Measurement: You can use both ‘s’ possessions and ‘of’ possessions to state somebody’s action.

“Archer’s revenge rampage for his butler” or “The revenge rampage of Archer for his butler”
Similarly for expressing measurement both types of possessions can be used.
“The Onion’s Increasing rate has made the people very upset.” (‘s’ possessions)

” The increasing rate of the onion has made the people very upset.” ( ‘of’ Possessions.)

Reference:Lester, M. (2011). Advanced English grammar for ESL learners. New York: McGraw-Hill.


Symmetry, Skewness and Kurtosis

We consider a random variable x and a data set S = <x1, x2, …, xn> of size n which contains possible values of x. The data set can represent either the population being studied or a sample drawn from the population.

Looking at S as representing a distribution, the skewness of S is a measure of symmetry while kurtosis is a measure of peakedness of the data in S.

Symmetry and Skewness

Definition 1: We use skewness as a measure of symmetry. If the skewness of S is zero then the distribution represented by S is perfectly symmetric. If the skewness is negative, then the distribution is skewed to the left, while if the skew is positive then the distribution is skewed to the right (see Figure 1 below for an example).

Excel calculates the skewness of a sample S as follows:

where is the mean and s is the standard deviation of S. To avoid division by zero, this formula requires that n > 2.

Observation: When a distribution is symmetric, the mean = median, when the distribution is positively skewed the mean > median and when the distribution is negatively skewed the mean < median.

Excel Function: Excel provides the SKEW function as a way to calculate the skewness of S, i.e. if R is a range in Excel containing the data elements in S then SKEW(R) = the skewness of S.

Excel 2013 Function: There is also a population version of the skewness given by the formula

This version has been implemented in Excel 2013 using the function, SKEW.P.

It turns out that for range R consisting of the data in S = <x1, …, xn>, SKEW.P(R) = SKEW(R)*(n–2)/SQRT(n(n–1)) where n = COUNT(R).

Real Statistics Function: Alternatively, you can calculate the population skewness using the SKEWP(R) function, which is contained in the Real Statistics Resource Pack.

Example 1: Suppose S = <2, 5, -1, 3, 4, 5, 0, 2>. The skewness of S = -0.43, i.e. SKEW(R) = -0.43 where R is a range in an Excel worksheet containing the data in S. Since this value is negative, the curve representing the distribution is skewed to the left (i.e. the fatter part of the curve is on the right). Also SKEW.P(R) = -0.34. See Figure 1.

Figure 1 – Examples of skewness and kurtosis

Observation: SKEW(R) and SKEW.P(R) ignore any empty cells or cells with non-numeric values.

Definition 2: Kurtosis provides a measurement about the extremities (i.e. tails) of the distribution of data, and therefore provides an indication of the presence of outliers.

Excel calculates the kurtosis of a sample S as follows:

where is the mean and s is the standard deviation of S. To avoid division by zero, this formula requires that n > 3.

Observation: It is commonly thought that kurtosis provides a measure of peakedness (or flatness), but this is not true. Kurtosis pertains to the extremities and not to the center of a distribution.

Excel Function: Excel provides the KURT function as a way to calculate the kurtosis of S, i.e. if R is a range in Excel containing the data elements in S then KURT(R) = the kurtosis of S.

Observation: The population kurtosis is calculated via the formula

which can be calculated in Excel via the formula

Real Statistics Function: Excel does not provide a population kurtosis function, but you can use the following Real Statistics function for this purpose:

KURTP(R, excess) = kurtosis of the distribution for the population in range R1. If excess = TRUE (default) then 3 is subtracted from the result (the usual approach so that a normal distribution has kurtosis of zero).

Example 2: Suppose S = <2, 5, -1, 3, 4, 5, 0, 2>. The kurtosis of S = -0.94, i.e. KURT(R) = -0.94 where R is a range in an Excel worksheet containing the data in S. The population kurtosis is -1.114. See Figure 1.

Observation: KURT(R) ignores any empty cells or cells with non-numeric values.

Graphical Illustration

We now look at an example of these concepts using the chi-square distribution.

Figure 2 – Example of skewness and kurtosis

Figure 2 contains the graphs of two chi-square distributions (with different degrees of freedom df). We study the chi-square distribution elsewhere, but for now note the following values for the kurtosis and skewness:


3 Answers 3

heard [. ] that a high positive kurtosis of residuals can be problematic for accurate hypothesis tests and confidence intervals (and therefore problems with statistical inference). Is this true and, if so, why?

For some kinds of hypothesis test, it's true.

Would a high positive kurtosis of residuals not indicate that the majority of the residuals are near the residual mean of 0 and therefore less large residuals are present?

It looks like you're conflating the concept of variance with that of kurtosis. If the variance were smaller, then a tendency to more small residuals and fewer large residuals would come together. Imagine we hold the standard deviation constant while we change the kurtosis (so we're definitely talking about changes to kurtosis rather than to variance).

Compare different variances (but the same kurtosis):

with different kurtosis but the same variance:

A high kurtosis is in many cases associated with more small deviations from the mean $^ddagger$ -- more small residuals than you'd find with a normal distribution .. but to keep the standard deviation at the same value, we must also have more big residuals (because having more small residuals would make the typical distance from the mean smaller). To get more of both the big residuals and small residuals, you will have fewer "typical sized" residuals -- those about one standard deviation away from the mean.

$ddagger$ it depends on how you define "smallness" you can't simply add lots of large residuals and hold variance constant, you need something to compensate for it -- but for some given measure of "small" you can find ways to increase the kurtosis without increasing that particular measure. (For example, higher kurtosis doesn't automatically imply a higher peak as such)

A higher kurtosis tends to go with more large residuals, even when you hold the variance constant.

[Further, in some cases, the concentration of small residuals may actually lead to more of a problem than the additional fraction of the largest residuals -- depending on what things you're looking at.]

Anyway, let's look at an example. Consider a one-sample t-test and a sample size of 10.

If we reject the null hypothesis when the absolute value of the t-statistic is bigger than 2.262, then when the observations are independent, identically distributed from a normal distribution, and the hypothesized mean is the true population mean, we'll reject the null hypothesis 5% of the time.

Consider a particular distribution with substantially higher kurtosis than the normal: 75% of our population have their values drawn from a normal distribution and the remaining 25% have their values drawn from a normal distribution with standard deviation 50 times as large.

If I calculated correctly, this corresponds to a kurtosis of 12 (an excess kurtosis of 9). The resulting distribution is much more peaked than the normal and has heavy tails. The density is compared with the normal density below -- you can see the higher peak, but you can't really see the heavier tail in the left image, so I also plotted the logarithm of the densities, which stretches out the lower part of the image and compresses the top, making it easier to see both the peak and the tails.

The actual significance level for this distribution if you carry out a "5%" one-sample t-test with $n=10$ is below 0.9%. This is pretty dramatic, and pulls down the power curve quite substantially.

(You'll also see a substantive effect on the coverage of confidence intervals.)

Note that a different distribution with the same kurtosis as that will have a different impact on the significance level.

So why does the rejection rate go down? It's because the heavier tail leads to a few large outliers, which has slightly larger impact on the standard deviation than it does on the mean this impacts the t-statistic because it leads to more t-values between -1 and 1, in the process reducing the proportion of values in the critical region.

If you take a sample that looks pretty consistent with having come from a normal distribution whose mean is just far enough above the hypothesized mean that it's significant, and then you take the observation furthest above the mean and pull it even further away (that is, make the mean even larger than under $H_0$ ), you actually make the t-statistic smaller.

Let me show you. Here's a sample of size 10:

Imagine we want to test it against $H_0: mu=2$ (a one-sample t-test). It turns out that the sample mean here is 2.68 and the sample standard deviation is 0.9424. You get a t-statistic of 2.282 -- just in the rejection region for a 5% test (p-value of 0.0484).

Now make that largest value 50:

Clearly we pull the mean up, so it should indicate a difference even more than it did before, right? Well, no, it doesn't. The t-statistic goes down. It is now 1.106, and the p-value is quite large (close to 30%). What happened? Well, we did pull the mean up (to 7.257), but the standard deviation shot up over 15.

Standard deviations are a bit more sensitive to outliers than means are -- when you put in an outlier, you tend to push the one-sample t-statistic toward 1 or -1.

If there's a chance of several outliers, much the same happens only they can sometimes be on opposite sides (in which case the standard deviation is even more inflated while the impact on the mean is reduced compared to one outlier), so the t-statistic tends to move closer to 0.

Similar stuff goes on with a number of other common tests that assume normality -- higher kurtosis tends to be associated with heavier tails, which means more outliers, which means that standard deviations get inflated relative to means and so differences you want to pick up tend to get "swamped" by the impact of the outliers on the test. That is, low power.


Discussion

The traditional standard errors for skewness and kurtosis printed by many statistics packages are very poor. While they are appropriate for normal distributions, deviations from normality, like a t distribution with df = 5 or a mixture of two normal curves with different standard deviations, produce standard errors that can be 5 times too small. Because normal distributions are rare in psychology (Micceri, 1989), the practice of encouraging the use of standard errors that are grossly in error with deviations from normality by printing them in statistics packages should be stopped. The function , written for this article, produces the bootstrap standard errors, BCa confidence intervals, and standard errors, which take into account characteristics of the distributions.

One question is whether the traditional standard errors can be used if you are using this only to test whether the distribution is normal. This approach is not recommended for at least two reasons. First, there are several better tests that are designed to test normality, such as the Shapiro and Wilk test (1965), and are available in many statistics packages. Testing skewness and kurtosis, individually, is problematic because they are dependent on each other, so rather than simply following the textbook approach, tests like D'Agostino's test (D'Agostino, Belanger, & D'Agostino, 1990) could be used. Second, because the traditional standard errors both under- and overestimate the true standard errors, you would not know whether the statistical test was liberal or conservative without guessing about the distribution. One method for guessing about the distribution is to take thousands of resamples of the observed distribution, which is what bootstrapping does.

The problems with the standard errors shown in this article are due to the use of the third and fourth moments to calculate skewness and kurtosis. These are very sensitive to deviations in the tails of the distributions and are not sensitive to deviations in the peaks of the distributions (Lindsay & Basak, 2000). Seier and Bonett (2003) discussed formulae that allow the user to vary the relative influence of deviations in the tails and the peak of a distribution, but these are not commonly used. The transformation described by Anscombe and Glynn (1983) can also be used. A function is described in the Appendix that uses this, but it is also not widely used. If skewness and kurtosis are operationalized in other ways, the impact of extreme points can be lessened. Balanda and MacGillivray (1988) discussed several ways in which quartiles could be used for operationalizing these statistics that would be less influenced by extreme points. One of the most promising is L-moments (Hosking, 1992). Hosking shows how skewness and kurtosis measures based on these are more consistent with the Shapiro–Wilk test of normality. These can be adjusted, for example, by trimming, which further increases their robustness (Elamir & Seheult, 2003). In time, these alternative methods may become more popular, but dramatically changing the statistics for measuring them would also change the meaning of skewness and kurtosis. Therefore, it is unlikely that these alternatives will become widespread in the near future.

The main recommendations from this article are that researchers should be careful when using traditional measures of standard error for skewness and kurtosis. Bootstrap confidence intervals can be used, but researchers should be aware that these may still be in error. For testing whether a distribution is of a certain shape, tests designed specifically for this purpose should be used. Methods based on transformations of the moments and L-moments should be considered.


Bounds on skew and kurtosis of IQ - Psychology

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case.

The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. If the data are multi-modal, then this may affect the sign of the skewness.

Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.

Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear. Examples The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a Weibull distribution.

Normal Distribution The first histogram is a sample from a normal distribution. The normal distribution is a symmetric distribution with well-behaved tails. This is indicated by the skewness of 0.03. The kurtosis of 2.96 is near the expected value of 3. The histogram verifies the symmetry. Double Exponential Distribution The second histogram is a sample from a double exponential distribution. The double exponential is a symmetric distribution. Compared to the normal, it has a stronger peak, more rapid decay, and heavier tails. That is, we would expect a skewness near zero and a kurtosis higher than 3. The skewness is 0.06 and the kurtosis is 5.9. Cauchy Distribution The third histogram is a sample from a Cauchy distribution.

For better visual comparison with the other data sets, we restricted the histogram of the Cauchy distribution to values between -10 and 10. The full data set for the Cauchy data in fact has a minimum of approximately -29,000 and a maximum of approximately 89,000.

The Cauchy distribution is a symmetric distribution with heavy tails and a single peak at the center of the distribution. Since it is symmetric, we would expect a skewness near zero. Due to the heavier tails, we might expect the kurtosis to be larger than for a normal distribution. In fact the skewness is 69.99 and the kurtosis is 6,693. These extremely high values can be explained by the heavy tails. Just as the mean and standard deviation can be distorted by extreme values in the tails, so too can the skewness and kurtosis measures. Weibull Distribution The fourth histogram is a sample from a Weibull distribution with shape parameter 1.5. The Weibull distribution is a skewed distribution with the amount of skewness depending on the value of the shape parameter. The degree of decay as we move away from the center also depends on the value of the shape parameter. For this data set, the skewness is 1.08 and the kurtosis is 4.46, which indicates moderate skewness and kurtosis. Dealing with Skewness and Kurtosis Many classical statistical tests and intervals depend on normality assumptions. Significant skewness and kurtosis clearly indicate that data are not normal. If a data set exhibits significant skewness or kurtosis (as indicated by a histogram or the numerical measures), what can we do about it?

One approach is to apply some type of transformation to try to make the data normal, or more nearly normal. The Box-Cox transformation is a useful technique for trying to normalize a data set. In particular, taking the log or square root of a data set is often useful for data that exhibit moderate right skewness.


Why Is the Normal Curve So Important in Psychological Testing

In testing, the normal curve model is used in ways that parallel the distinction between descriptive and inferential statistics:

1. The normal curve model is used descriptively to locate the position of scores that come from distributions that are normal. In a process known as normalization, described in Chapter 3, the normal curve is also used to make distributions that are not normal—but approximate the normal—conform to the model, in terms of the relative positions of scores.

2. The normal curve model is applied inferentially in the areas of (a) reliability, to derive confidence intervals to evaluate obtained scores and differences between obtained scores (see Chapter 4), and (b) validity, to derive confidence intervals for predictions or estimates based on test scores (see Chapter 5).

model. The manner and extent to which they deviate from it has implications with regard to the amount of information the distributions convey. An extreme case can be illustrated by the distribution that would result if all values in a set of data occurred with the same frequency. Such a distribution, which would be rectangular in shape, would imply no difference in the likelihood of occurrence of any given value and thus would not be useful in making decisions on the basis of whatever is being measured.

A different, and more plausible, type of deviation from the normal curve model happens when distributions have two or more modes. If a distribution is bimodal, or multimodal, one needs to consider the possibility of sampling problems or special features of the sample. For example, a distribution of class grades in which the peak frequencies occur in the grades of A and D, with very few B or C grades, could mean that the students in the class are atypical in some way or that they belong to groups that differ significantly in preparation, motivation, or ability level. Naturally, information of this nature would almost invariably have important implications in the case of this example, it might lead a teacher to divide the class into sections and use different pedagogical approaches with each.

Two other ways in which distributions may deviate from the normal curve model carry significant implications that are relevant to test data. These deviations pertain to the properties of the kurtosis and skewness of frequency distributions.

Kurtosis

This rather odd term, which stems from the Greek word for convexity, simply refers to the flatness or peakedness of a distribution. Kurtosis is directly related to the amount of dispersion in a distribution. Platykurtic distributions have the greatest amount of dispersion, manifested in tails that are more extended, and lep-tokurtic distributions have the least. The normal distribution is mesokurtic, meaning that it has an intermediate degree of dispersion.


You might also Like

Kurtosis does not measure anything about the "peak" as was historically reported. Rather, it is a measure of whether there are outliers (ie, rare extreme values) in the data. These show up in graphs as one or a very few points that are very far from the main body of the data. anon342994 July 26, 2013

Can anyone help me understand skewed distribution for my presentation. I can't seem to understand about graphs.

In my stats class today, the teacher mentioned that curves can have different levels of skewness and kurtosis. She didn't go into any real detail explaining what this is, especially kurtosis.

Could someone give me a brief explanation to help point me in the right direction as to what kurtosis means? titans62 July 14, 2011

@cardsfan27 - Great examples. I'm not sure why I couldn't think of those. When I was thinking of something like population that would have a right skewed distribution. There are many more young people than there are old people. I guess something else could be the size of trees in a forest. There are hundreds of seedlings and saplings for every tree that is fully grown.

I did some more research on the curves with two high points. They are called bimodal distributions. All of the examples I found were similar to yours and involved height or weight. I would be interested to hear more of them, though. cardsfan27 July 13, 2011

@titans62 - I'm having the opposite problem! All I can think of are things that have a normal distribution, but are not skewed. Some of the things I'm thinking of that would be a normal bell shape are height and weight of the population. Test scores are also a stereotypical bell curve example.

Your second question had me stumped for a few minutes, but I think I've got an example. This might be a little far fetched, but maybe someone else has a better one. If you took 25 men and 25 women, and then took their weights. You could assume that the men are going to weigh more on average than the women, so if you graphed the weights, you might find that the women make a peak lower on the scale, and men make a peak farther up. I hope that made sense. titans62 July 13, 2011

Can someone help me -- I understand what a bell curve looks like, but for whatever reason, I can't think of anything that would have that shape. Everything I am thinking of would have a skewed distribution where one tail was longer than the other.

Another related question - would it ever be possible for a curve to have two high spots compared to the curve going up and coming back down in a normal bell curve? Emilski July 12, 2011

It seems counter intuitive that a positively skewed distribution would have a tail going to the right. I would have guessed that the positive and negative was determined by where the highest point of the curve was.


Bounds on skew and kurtosis of IQ - Psychology

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited.

Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review prior to publication.

The Feature Paper can be either an original research article, a substantial novel research study that often involves several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest progress in the field that systematically reviews the most exciting advances in scientific literature. This type of paper provides an outlook on future directions of research or possible applications.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to authors, or important in this field. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.


Using Secondary Datasets to Understand Persons with Developmental Disabilities and their Families

4.1 Kurtosis and Skew

Measures of kurtosis and skew are used to determine if indicators met normality assumptions ( Kline, 2005 ). Measures of kurtosis help identify if a curve is normal or abnormally shaped. If a normal curve is leptokurtic, the curve is highly arched at the mean with short tails. Platykurtic curves, on the other hand, are flatter than normal with a lower peak and longer tails. A skewed curve is either positively or negatively skewed. Positively skewed curves show the majority of scores below the mean, and negatively skewed curves are just the opposite. Both curves result in an asymmetrical normal curve. Both skew and kurtosis can be analyzed through descriptive statistics. Acceptable values of skewness fall between − 3 and + 3, and kurtosis is appropriate from a range of − 10 to + 10 when utilizing SEM ( Brown, 2006 ). Values that fall above or below these ranges are suspect, but SEM is a fairly robust analytical method, so small deviations may not represent major violations of assumptions. Other types of analyses may have lower acceptable skew or kurtosis values so researchers should investigate their planned analysis to determine data screening guidelines.


Symmetry, Skewness and Kurtosis

We consider a random variable x and a data set S = <x1, x2, …, xn> of size n which contains possible values of x. The data set can represent either the population being studied or a sample drawn from the population.

Looking at S as representing a distribution, the skewness of S is a measure of symmetry while kurtosis is a measure of peakedness of the data in S.

Symmetry and Skewness

Definition 1: We use skewness as a measure of symmetry. If the skewness of S is zero then the distribution represented by S is perfectly symmetric. If the skewness is negative, then the distribution is skewed to the left, while if the skew is positive then the distribution is skewed to the right (see Figure 1 below for an example).

Excel calculates the skewness of a sample S as follows:

where is the mean and s is the standard deviation of S. To avoid division by zero, this formula requires that n > 2.

Observation: When a distribution is symmetric, the mean = median, when the distribution is positively skewed the mean > median and when the distribution is negatively skewed the mean < median.

Excel Function: Excel provides the SKEW function as a way to calculate the skewness of S, i.e. if R is a range in Excel containing the data elements in S then SKEW(R) = the skewness of S.

Excel 2013 Function: There is also a population version of the skewness given by the formula

This version has been implemented in Excel 2013 using the function, SKEW.P.

It turns out that for range R consisting of the data in S = <x1, …, xn>, SKEW.P(R) = SKEW(R)*(n–2)/SQRT(n(n–1)) where n = COUNT(R).

Real Statistics Function: Alternatively, you can calculate the population skewness using the SKEWP(R) function, which is contained in the Real Statistics Resource Pack.

Example 1: Suppose S = <2, 5, -1, 3, 4, 5, 0, 2>. The skewness of S = -0.43, i.e. SKEW(R) = -0.43 where R is a range in an Excel worksheet containing the data in S. Since this value is negative, the curve representing the distribution is skewed to the left (i.e. the fatter part of the curve is on the right). Also SKEW.P(R) = -0.34. See Figure 1.

Figure 1 – Examples of skewness and kurtosis

Observation: SKEW(R) and SKEW.P(R) ignore any empty cells or cells with non-numeric values.

Definition 2: Kurtosis provides a measurement about the extremities (i.e. tails) of the distribution of data, and therefore provides an indication of the presence of outliers.

Excel calculates the kurtosis of a sample S as follows:

where is the mean and s is the standard deviation of S. To avoid division by zero, this formula requires that n > 3.

Observation: It is commonly thought that kurtosis provides a measure of peakedness (or flatness), but this is not true. Kurtosis pertains to the extremities and not to the center of a distribution.

Excel Function: Excel provides the KURT function as a way to calculate the kurtosis of S, i.e. if R is a range in Excel containing the data elements in S then KURT(R) = the kurtosis of S.

Observation: The population kurtosis is calculated via the formula

which can be calculated in Excel via the formula

Real Statistics Function: Excel does not provide a population kurtosis function, but you can use the following Real Statistics function for this purpose:

KURTP(R, excess) = kurtosis of the distribution for the population in range R1. If excess = TRUE (default) then 3 is subtracted from the result (the usual approach so that a normal distribution has kurtosis of zero).

Example 2: Suppose S = <2, 5, -1, 3, 4, 5, 0, 2>. The kurtosis of S = -0.94, i.e. KURT(R) = -0.94 where R is a range in an Excel worksheet containing the data in S. The population kurtosis is -1.114. See Figure 1.

Observation: KURT(R) ignores any empty cells or cells with non-numeric values.

Graphical Illustration

We now look at an example of these concepts using the chi-square distribution.

Figure 2 – Example of skewness and kurtosis

Figure 2 contains the graphs of two chi-square distributions (with different degrees of freedom df). We study the chi-square distribution elsewhere, but for now note the following values for the kurtosis and skewness:


Why Is the Normal Curve So Important in Psychological Testing

In testing, the normal curve model is used in ways that parallel the distinction between descriptive and inferential statistics:

1. The normal curve model is used descriptively to locate the position of scores that come from distributions that are normal. In a process known as normalization, described in Chapter 3, the normal curve is also used to make distributions that are not normal—but approximate the normal—conform to the model, in terms of the relative positions of scores.

2. The normal curve model is applied inferentially in the areas of (a) reliability, to derive confidence intervals to evaluate obtained scores and differences between obtained scores (see Chapter 4), and (b) validity, to derive confidence intervals for predictions or estimates based on test scores (see Chapter 5).

model. The manner and extent to which they deviate from it has implications with regard to the amount of information the distributions convey. An extreme case can be illustrated by the distribution that would result if all values in a set of data occurred with the same frequency. Such a distribution, which would be rectangular in shape, would imply no difference in the likelihood of occurrence of any given value and thus would not be useful in making decisions on the basis of whatever is being measured.

A different, and more plausible, type of deviation from the normal curve model happens when distributions have two or more modes. If a distribution is bimodal, or multimodal, one needs to consider the possibility of sampling problems or special features of the sample. For example, a distribution of class grades in which the peak frequencies occur in the grades of A and D, with very few B or C grades, could mean that the students in the class are atypical in some way or that they belong to groups that differ significantly in preparation, motivation, or ability level. Naturally, information of this nature would almost invariably have important implications in the case of this example, it might lead a teacher to divide the class into sections and use different pedagogical approaches with each.

Two other ways in which distributions may deviate from the normal curve model carry significant implications that are relevant to test data. These deviations pertain to the properties of the kurtosis and skewness of frequency distributions.

Kurtosis

This rather odd term, which stems from the Greek word for convexity, simply refers to the flatness or peakedness of a distribution. Kurtosis is directly related to the amount of dispersion in a distribution. Platykurtic distributions have the greatest amount of dispersion, manifested in tails that are more extended, and lep-tokurtic distributions have the least. The normal distribution is mesokurtic, meaning that it has an intermediate degree of dispersion.


Bounds on skew and kurtosis of IQ - Psychology

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case.

The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. If the data are multi-modal, then this may affect the sign of the skewness.

Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.

Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear. Examples The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a Weibull distribution.

Normal Distribution The first histogram is a sample from a normal distribution. The normal distribution is a symmetric distribution with well-behaved tails. This is indicated by the skewness of 0.03. The kurtosis of 2.96 is near the expected value of 3. The histogram verifies the symmetry. Double Exponential Distribution The second histogram is a sample from a double exponential distribution. The double exponential is a symmetric distribution. Compared to the normal, it has a stronger peak, more rapid decay, and heavier tails. That is, we would expect a skewness near zero and a kurtosis higher than 3. The skewness is 0.06 and the kurtosis is 5.9. Cauchy Distribution The third histogram is a sample from a Cauchy distribution.

For better visual comparison with the other data sets, we restricted the histogram of the Cauchy distribution to values between -10 and 10. The full data set for the Cauchy data in fact has a minimum of approximately -29,000 and a maximum of approximately 89,000.

The Cauchy distribution is a symmetric distribution with heavy tails and a single peak at the center of the distribution. Since it is symmetric, we would expect a skewness near zero. Due to the heavier tails, we might expect the kurtosis to be larger than for a normal distribution. In fact the skewness is 69.99 and the kurtosis is 6,693. These extremely high values can be explained by the heavy tails. Just as the mean and standard deviation can be distorted by extreme values in the tails, so too can the skewness and kurtosis measures. Weibull Distribution The fourth histogram is a sample from a Weibull distribution with shape parameter 1.5. The Weibull distribution is a skewed distribution with the amount of skewness depending on the value of the shape parameter. The degree of decay as we move away from the center also depends on the value of the shape parameter. For this data set, the skewness is 1.08 and the kurtosis is 4.46, which indicates moderate skewness and kurtosis. Dealing with Skewness and Kurtosis Many classical statistical tests and intervals depend on normality assumptions. Significant skewness and kurtosis clearly indicate that data are not normal. If a data set exhibits significant skewness or kurtosis (as indicated by a histogram or the numerical measures), what can we do about it?

One approach is to apply some type of transformation to try to make the data normal, or more nearly normal. The Box-Cox transformation is a useful technique for trying to normalize a data set. In particular, taking the log or square root of a data set is often useful for data that exhibit moderate right skewness.


3 Answers 3

heard [. ] that a high positive kurtosis of residuals can be problematic for accurate hypothesis tests and confidence intervals (and therefore problems with statistical inference). Is this true and, if so, why?

For some kinds of hypothesis test, it's true.

Would a high positive kurtosis of residuals not indicate that the majority of the residuals are near the residual mean of 0 and therefore less large residuals are present?

It looks like you're conflating the concept of variance with that of kurtosis. If the variance were smaller, then a tendency to more small residuals and fewer large residuals would come together. Imagine we hold the standard deviation constant while we change the kurtosis (so we're definitely talking about changes to kurtosis rather than to variance).

Compare different variances (but the same kurtosis):

with different kurtosis but the same variance:

A high kurtosis is in many cases associated with more small deviations from the mean $^ddagger$ -- more small residuals than you'd find with a normal distribution .. but to keep the standard deviation at the same value, we must also have more big residuals (because having more small residuals would make the typical distance from the mean smaller). To get more of both the big residuals and small residuals, you will have fewer "typical sized" residuals -- those about one standard deviation away from the mean.

$ddagger$ it depends on how you define "smallness" you can't simply add lots of large residuals and hold variance constant, you need something to compensate for it -- but for some given measure of "small" you can find ways to increase the kurtosis without increasing that particular measure. (For example, higher kurtosis doesn't automatically imply a higher peak as such)

A higher kurtosis tends to go with more large residuals, even when you hold the variance constant.

[Further, in some cases, the concentration of small residuals may actually lead to more of a problem than the additional fraction of the largest residuals -- depending on what things you're looking at.]

Anyway, let's look at an example. Consider a one-sample t-test and a sample size of 10.

If we reject the null hypothesis when the absolute value of the t-statistic is bigger than 2.262, then when the observations are independent, identically distributed from a normal distribution, and the hypothesized mean is the true population mean, we'll reject the null hypothesis 5% of the time.

Consider a particular distribution with substantially higher kurtosis than the normal: 75% of our population have their values drawn from a normal distribution and the remaining 25% have their values drawn from a normal distribution with standard deviation 50 times as large.

If I calculated correctly, this corresponds to a kurtosis of 12 (an excess kurtosis of 9). The resulting distribution is much more peaked than the normal and has heavy tails. The density is compared with the normal density below -- you can see the higher peak, but you can't really see the heavier tail in the left image, so I also plotted the logarithm of the densities, which stretches out the lower part of the image and compresses the top, making it easier to see both the peak and the tails.

The actual significance level for this distribution if you carry out a "5%" one-sample t-test with $n=10$ is below 0.9%. This is pretty dramatic, and pulls down the power curve quite substantially.

(You'll also see a substantive effect on the coverage of confidence intervals.)

Note that a different distribution with the same kurtosis as that will have a different impact on the significance level.

So why does the rejection rate go down? It's because the heavier tail leads to a few large outliers, which has slightly larger impact on the standard deviation than it does on the mean this impacts the t-statistic because it leads to more t-values between -1 and 1, in the process reducing the proportion of values in the critical region.

If you take a sample that looks pretty consistent with having come from a normal distribution whose mean is just far enough above the hypothesized mean that it's significant, and then you take the observation furthest above the mean and pull it even further away (that is, make the mean even larger than under $H_0$ ), you actually make the t-statistic smaller.

Let me show you. Here's a sample of size 10:

Imagine we want to test it against $H_0: mu=2$ (a one-sample t-test). It turns out that the sample mean here is 2.68 and the sample standard deviation is 0.9424. You get a t-statistic of 2.282 -- just in the rejection region for a 5% test (p-value of 0.0484).

Now make that largest value 50:

Clearly we pull the mean up, so it should indicate a difference even more than it did before, right? Well, no, it doesn't. The t-statistic goes down. It is now 1.106, and the p-value is quite large (close to 30%). What happened? Well, we did pull the mean up (to 7.257), but the standard deviation shot up over 15.

Standard deviations are a bit more sensitive to outliers than means are -- when you put in an outlier, you tend to push the one-sample t-statistic toward 1 or -1.

If there's a chance of several outliers, much the same happens only they can sometimes be on opposite sides (in which case the standard deviation is even more inflated while the impact on the mean is reduced compared to one outlier), so the t-statistic tends to move closer to 0.

Similar stuff goes on with a number of other common tests that assume normality -- higher kurtosis tends to be associated with heavier tails, which means more outliers, which means that standard deviations get inflated relative to means and so differences you want to pick up tend to get "swamped" by the impact of the outliers on the test. That is, low power.


‘s’ Possession and ‘of’ Possession

Possessive nouns are used in different ways to express different meanings. The most common uses are expressing

  1. Possession or ownership: The family’s dog.
  2. Association: Archer’s office,
  3. Action: Lana’s determination to shoot Archer.
  4. Measurement: The train’s delay,
  5. Characteristics of something: Lana’s Black Hair.

Often the preposition ‘of’ may serve an alternative to ‘s’ possessives. As I have said in my previous articles, repetitive sentence formate makes your writing dull. So it’s a good idea to switch between ‘s’ possessive and ‘of’ possessives. To do that, at first we need to know when we can use ‘of’ possessives in the right manner.

  1. Possession: To express ownership it’s always preferable to use the s possessive. For example, ‘Archer’s gun’ is preferred to ‘The gun of Archer,
  2. Associations: When the possessive noun is animate always s possessives. E.g. Archer’s Mother, Lana’s Baby, Kriger’s holographic girlfriend. But when the possessive noun is inanimate both s possessives and of possessives can be used. E.g. The university’s area, Area of the university.
  3. Attribution: Like the case of the association, here also use only s possessives for animate nouns. For inanimate nouns, both ‘s’ possessives and ‘of’ possessives can be used.

For Example: Danny’s White Hair. (Animate Possessive Noun) The university’s garage. (Inanimate Possessive Noun) The garage of the university. (Inanimate Possessive Noun)

  • 4. Action & Measurement: You can use both ‘s’ possessions and ‘of’ possessions to state somebody’s action.

“Archer’s revenge rampage for his butler” or “The revenge rampage of Archer for his butler”
Similarly for expressing measurement both types of possessions can be used.
“The Onion’s Increasing rate has made the people very upset.” (‘s’ possessions)

” The increasing rate of the onion has made the people very upset.” ( ‘of’ Possessions.)

Reference:Lester, M. (2011). Advanced English grammar for ESL learners. New York: McGraw-Hill.


Discussion

The traditional standard errors for skewness and kurtosis printed by many statistics packages are very poor. While they are appropriate for normal distributions, deviations from normality, like a t distribution with df = 5 or a mixture of two normal curves with different standard deviations, produce standard errors that can be 5 times too small. Because normal distributions are rare in psychology (Micceri, 1989), the practice of encouraging the use of standard errors that are grossly in error with deviations from normality by printing them in statistics packages should be stopped. The function , written for this article, produces the bootstrap standard errors, BCa confidence intervals, and standard errors, which take into account characteristics of the distributions.

One question is whether the traditional standard errors can be used if you are using this only to test whether the distribution is normal. This approach is not recommended for at least two reasons. First, there are several better tests that are designed to test normality, such as the Shapiro and Wilk test (1965), and are available in many statistics packages. Testing skewness and kurtosis, individually, is problematic because they are dependent on each other, so rather than simply following the textbook approach, tests like D'Agostino's test (D'Agostino, Belanger, & D'Agostino, 1990) could be used. Second, because the traditional standard errors both under- and overestimate the true standard errors, you would not know whether the statistical test was liberal or conservative without guessing about the distribution. One method for guessing about the distribution is to take thousands of resamples of the observed distribution, which is what bootstrapping does.

The problems with the standard errors shown in this article are due to the use of the third and fourth moments to calculate skewness and kurtosis. These are very sensitive to deviations in the tails of the distributions and are not sensitive to deviations in the peaks of the distributions (Lindsay & Basak, 2000). Seier and Bonett (2003) discussed formulae that allow the user to vary the relative influence of deviations in the tails and the peak of a distribution, but these are not commonly used. The transformation described by Anscombe and Glynn (1983) can also be used. A function is described in the Appendix that uses this, but it is also not widely used. If skewness and kurtosis are operationalized in other ways, the impact of extreme points can be lessened. Balanda and MacGillivray (1988) discussed several ways in which quartiles could be used for operationalizing these statistics that would be less influenced by extreme points. One of the most promising is L-moments (Hosking, 1992). Hosking shows how skewness and kurtosis measures based on these are more consistent with the Shapiro–Wilk test of normality. These can be adjusted, for example, by trimming, which further increases their robustness (Elamir & Seheult, 2003). In time, these alternative methods may become more popular, but dramatically changing the statistics for measuring them would also change the meaning of skewness and kurtosis. Therefore, it is unlikely that these alternatives will become widespread in the near future.

The main recommendations from this article are that researchers should be careful when using traditional measures of standard error for skewness and kurtosis. Bootstrap confidence intervals can be used, but researchers should be aware that these may still be in error. For testing whether a distribution is of a certain shape, tests designed specifically for this purpose should be used. Methods based on transformations of the moments and L-moments should be considered.


Using Secondary Datasets to Understand Persons with Developmental Disabilities and their Families

4.1 Kurtosis and Skew

Measures of kurtosis and skew are used to determine if indicators met normality assumptions ( Kline, 2005 ). Measures of kurtosis help identify if a curve is normal or abnormally shaped. If a normal curve is leptokurtic, the curve is highly arched at the mean with short tails. Platykurtic curves, on the other hand, are flatter than normal with a lower peak and longer tails. A skewed curve is either positively or negatively skewed. Positively skewed curves show the majority of scores below the mean, and negatively skewed curves are just the opposite. Both curves result in an asymmetrical normal curve. Both skew and kurtosis can be analyzed through descriptive statistics. Acceptable values of skewness fall between − 3 and + 3, and kurtosis is appropriate from a range of − 10 to + 10 when utilizing SEM ( Brown, 2006 ). Values that fall above or below these ranges are suspect, but SEM is a fairly robust analytical method, so small deviations may not represent major violations of assumptions. Other types of analyses may have lower acceptable skew or kurtosis values so researchers should investigate their planned analysis to determine data screening guidelines.


Bounds on skew and kurtosis of IQ - Psychology

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited.

Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review prior to publication.

The Feature Paper can be either an original research article, a substantial novel research study that often involves several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest progress in the field that systematically reviews the most exciting advances in scientific literature. This type of paper provides an outlook on future directions of research or possible applications.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to authors, or important in this field. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.


You might also Like

Kurtosis does not measure anything about the "peak" as was historically reported. Rather, it is a measure of whether there are outliers (ie, rare extreme values) in the data. These show up in graphs as one or a very few points that are very far from the main body of the data. anon342994 July 26, 2013

Can anyone help me understand skewed distribution for my presentation. I can't seem to understand about graphs.

In my stats class today, the teacher mentioned that curves can have different levels of skewness and kurtosis. She didn't go into any real detail explaining what this is, especially kurtosis.

Could someone give me a brief explanation to help point me in the right direction as to what kurtosis means? titans62 July 14, 2011

@cardsfan27 - Great examples. I'm not sure why I couldn't think of those. When I was thinking of something like population that would have a right skewed distribution. There are many more young people than there are old people. I guess something else could be the size of trees in a forest. There are hundreds of seedlings and saplings for every tree that is fully grown.

I did some more research on the curves with two high points. They are called bimodal distributions. All of the examples I found were similar to yours and involved height or weight. I would be interested to hear more of them, though. cardsfan27 July 13, 2011

@titans62 - I'm having the opposite problem! All I can think of are things that have a normal distribution, but are not skewed. Some of the things I'm thinking of that would be a normal bell shape are height and weight of the population. Test scores are also a stereotypical bell curve example.

Your second question had me stumped for a few minutes, but I think I've got an example. This might be a little far fetched, but maybe someone else has a better one. If you took 25 men and 25 women, and then took their weights. You could assume that the men are going to weigh more on average than the women, so if you graphed the weights, you might find that the women make a peak lower on the scale, and men make a peak farther up. I hope that made sense. titans62 July 13, 2011

Can someone help me -- I understand what a bell curve looks like, but for whatever reason, I can't think of anything that would have that shape. Everything I am thinking of would have a skewed distribution where one tail was longer than the other.

Another related question - would it ever be possible for a curve to have two high spots compared to the curve going up and coming back down in a normal bell curve? Emilski July 12, 2011

It seems counter intuitive that a positively skewed distribution would have a tail going to the right. I would have guessed that the positive and negative was determined by where the highest point of the curve was.


Watch the video: Συναισθηματική Νοημοσύνη EQ και Ευφυία IQ στο χώρο των Επιχειρήσεων και την Εκπαίδευση tsikoman (June 2022).


Comments:

  1. Caius

    An interesting topic, thanks to the author pleased, tell me, where did you see something similar here? once more hoa to poyuzat.

  2. Vidor

    At me a similar situation. I invite to discussion.

  3. Guerin

    wow! ... and it happens! ...

  4. Randon

    Nice sitting at work. Get distracted from this boring work. Relax, and read the information written here

  5. Pekar

    In my opinion you commit an error. I suggest it to discuss. Write to me in PM.

  6. JoJokasa

    Certainly. And I have faced it.



Write a message