Tuesday, February 22, 2011

Dissertation Statistics Help : Part 2

Reach us at info@dissertationindia.com

The Null Hypothesis


The null hypothesis (usually symbolized H 0 ) is one guess about the true state of affairs in the population of interest. It is contrasted with the alternative hypothesis (usually symbolized H 1 ). The null and alternative hypotheses are usually formulated so that, as a matter of logic, they are mutually exclusive (only one of them can be true) and exhaustive (one of them must be true). In other words, if one is false, the other necessarily must be true. Thus if the null hypothesis states that it is not raining, then the alternative hypothesis states that it is raining. For example, if the null hypothesis states that the population mean is zero, then an alternative hypothesis could be that the population mean is not zero. Symbolically:

H 0 : μ = 0

H 1 : μ ≠ 0.

In this case, the alternative hypothesis commits to a mean different from zero no matter whether that mean is smaller or larger than zero. Thus this H 1 is nondirectional (sometimes called two-tailed). Another null hypothesis could state that the population mean is greater than zero whereas the alternative hypothesis could be that the population mean is not greater than zero. Symbolically:



H0: μ ≤ 0

H1: μ > 0.

Thus this H 1 is directional (sometimes called one-tailed) because it commits only to differences from zero that are larger than zero. Usually, the alternative hypothesis corresponds to the investigator's hunch regarding the true state of affairs; that is, it corresponds to what the investigator hopes to demonstrate (for the current example, that the money treatment has an effect). However, it is the tenability of the null hypothesis that is actually tested (that the money treatment has no effect). If the null hypothesis is found untenable, then as a matter of logic the alternative must be tenable, in which case investigators "reject the null" and, for that reason, "accept the alternative."

When encountered for the first time, this way of reasoning often seems indirect and even a bit tortured to many students. Why not simply demonstrate that the alternative hypothesis is tenable in the first place? This is partly a matter of convention, but there is good reason for it. There is a crucial difference between null and alternative hypotheses. In its typical form, a null hypothesis is exact. It commits to a particular value for a population parameter of interest. The alternative hypothesis, on the other hand, usually is inexact. It only claims that the parameter is not the value claimed by the null hypothesis but does not commit to a specific value (although, as we just saw, it may commit to a particular direction). (Exceptions to the foregoing exist, but they are usually found in textbooks working through all the implications of hypothesis testing and not in actual practice.)

In order to construct a sampling distribution for the test statistic, which is the next step in hypothesis testing, an exact value for the population parameter of interest is needed. In most cases, it is provided by the null hypothesis, and so it is that we test -- and perhaps reject -- not the hypothesis that embodies our research concern, which postulates an effect, but another, less interesting from a substantive point of view, hypothesis instead, which postulates exactly no effect.

According to the no-effect or null hypothesis for the present example, exactly half of the clients should get better just by chance alone. If this null hypothesis were true, then the population parameter for the probability of improving would be exactly .5. This may or may not be the true value of the population parameter, but at least it provides a basis for predicting how probable various values of the test statistic would be, if the true value for the population parameter were indeed.

The Sampling Distribution

Once the null hypothesis is defined, the sampling distribution appropriate for the test statistic can be specified. Sampling distributions are theoretical constructs and as such are based on logical and formal considerations. They resemble but should not be confused with frequency histograms, which are based on data and indicate the empirical frequency for the scores in a sample. Sampling distributions are important for hypothesis testing because we can use them to derive how likely (i.e., how probable) a particular value of a test statistic would be, in theory at least, if the null hypothesis were true.

The phrase "in theory" as it applies to sampling distributions is important. The probabilities provided by a sampling distribution are accurate for the test statistic to the extent that the data and the data collection procedures meet the assumptions used to generate the theoretical sampling distribution in the first place. Assumptions for different tests vary. And in practice many assumptions can be violated without severe consequences. One assumption, however, called independence of measurements (or simply, independence), is basic to most sampling distributions and, if violated, raises serious questions about any conclusions.

This key assumption requires that during data collection scores are assigned to each unit in the sample (subject, dyad, family, and so forth.) independently. In other words, each score must represent an independent draw from the population. For the present example, this means that the evaluation of one client cannot be linked, or affected by, the evaluation given another client. To use the classic example, imagine an urn filled with tickets, each of which has a number printed on it. We shake the urn, reach in, remove one ticket, and note its number. We then replace the ticket and repeat the procedure until we have accumulated N numbers, the size of our sample. (If the population of tickets is large enough, it may not matter much whether we draw with, or without, replacement of tickets previously drawn.) This constitutes an independent sample because presumably tickets

pulled on previous draws do not influence which ticket we pull on the next draw. We would not, however, select two tickets at a time because then pairs of tickets would be linked and the total of all tickets drawn would not constitute an independent sample. And if the assumption of independence is violated, then we can no longer be confident that a probability value derived from a theoretical sampling distribution provides us with the correct probability value for the particular test statistic being examined.

The Alpha Level

The alpha level is the probability value used for the statistical test. By convention, it is usually set to .05 or, more stringently, to .01. If the probability of the results observed in the sample occurring by chance alone (given that the null hypothesis is true) is equal to or less than the alpha level, then we declare the null hypothesis untenable.

The Test Statistic

In contrast to the sampling distribution, which is based on theory, the value of the test statistic depends on the data collected, thus in general terms a test statistic is a score computed from sample data. For example, if our null hypothesis involved the age of our clients, the average age might be used as a test statistic.

The Statistical Test

Given a particular null hypothesis, an appropriate sampling distribution, and a value for the appropriate test statistic, we are in a position to determine whether or not the result we observed in our sample would be probable if the null hypothesis is in fact true. If it turns out that our result would occur only rarely, given that the null hypothesis is true, we may decide that the null hypothesis is untenable. But how rare is rare? As noted a few paragraphs back, by convention and somewhat arbitrarily, 5% is generally accepted as a reasonable cutoff point. Certainly other percentages could be justified, but in general social scientists are willing to reject the null hypothesis and accept the alternative only if the results actually obtained would occur 5% of the time or less by chance alone if the null hypothesis were true. This process of deciding what level of risk is acceptable and, on

that basis, deciding whether or not to reject the null hypothesis constitutes the statistical test. For the present example (testing the effectiveness of the money cure), the appropriate test is called a sign test or a binomial test.

TYPE I ERROR: THE RISK OF MAKING A FALSE CLAIM

Earlier I claimed that knowledge of statistics allows us to make decisions from incomplete information. Thus we may make decisions about a population based only on a sample selected from the relevant population. For the money-cure example, only 10 subjects were examined-which falls far short of a complete survey of psychotherapy clients. Yet based on this sample of 10 subjects , we might conclude that the money cure affected so many clients in the sample positively that the null hypothesis (the hypothesis that the clients were selected from a population in which the money cure has no effect) is untenable, which leads us to conclude that, yes, the money cure does have a beneficial effect.

Basing decisions on incomplete information entails a certain amount of risk. What if, for example, in the population from which our subjects were selected, the money cure had no effect even though, in this one study, we just happened to select a high proportion of clients who got better? In this case, if we claimed an effect based on the particular sample we happened to draw, we would be wrong. We would be making what is called a type I error, which means we would have rejected the null hypothesis when in fact the null hypothesis is true. We would have made a false claim.

Given the nature of statistical inference, we can never eliminate type I errors, but at least we can control how likely they are to occur. As noted earlier, the probability cutoff point for rejecting the null hypothesis is called the alpha level. If we set our alpha level to the conventional .05, then the probability that we will reject the null hypothesis wrongly, that is, make a type I error, is also .05. After all, by setting the alpha level to .05 for a statistical test we commit ourselves to rejecting the null hypothesis if the results we obtain would occur 5% of the time or less given that the null hypothesis is true. If we did the same experiment again and again, and if in fact there is no effect in the population, over the long run 95% of the time we would correctly claim no effect. But 5% of the time, just by the luck of the draw, we would wrongly claim an effect. As noted earlier, by convention most social scientists find this level of risk acceptable.

TYPE II ERROR: THE RISK OF MISSING A REAL EFFECT

Making a false claim is not the only error that can result from statistical decision making. If, for the population from which our subjects were selected, the money cure indeed has an effect but, based on the particular sample drawn, we claimed that there was none, we would be making what is called a type II error. That is, we would have failed to reject the null hypothesis when in fact the null hypothesis is false. We would have missed a real effect.

Under most circumstances, we do not know the exact probability of a type II error. The probability of a type II error depends on the actual state of affairs in the population--which we do not know exactly, and not on the null-hypothesis assumed state of affairs--which we define and hence know exactly. A type I error occurs when the magnitude of the effect in the population (indexed by an appropriate population parameter) is zero (or some other specific value) and yet, based on the sample selected, we claim an effect (thereby making a false claim). The probability that this will occur is determined by the alpha level--which we set, and depends on the sampling distribution for the test statistic under the null hypothesis--which we assume. Hence we can specify and control the probability of type I error.

In contrast, a type II error occurs when the magnitude of the effect in the population is different from zero (or another specific value) by some unspecified amount and yet, based on the sample selected, we do not claim an effect (thereby missing a real effect). The probability of a type II error can be determined only if the exact magnitude of the effect in the population is known, which means that, under most circumstances, we cannot determine the probability of a type II error exactly. However, even if we do not know the probability of a type II error, we can affect its magnitude by changing the alpha level. If we select a more stringent alpha level (.01 instead of .05, for example), which has the effect of decreasing the probability of making a false claim, we necessarily increase the probability of missing a real effect. It is a trade off. Decreasing the probability of a type I error increases the probability of a type II error and vice versa. If we are less likely to make a false claim, we are more likely to miss a real effect. If we are less likely to miss a real effect, we are more likely to make a false claim. Dissertation Statistics Help



No comments: