– MedMuv

Home

June 19, 2024

Univariate analysis of variance. Parametric model

univariate analysis of variance. Planning the

experiment, the formulation of hypotheses and

their statistical test.

Correlation analysis. Construction of correlation

fields. Construction of the empirical regression

line. Calculation of the correlation coefficient

estimation and analysis of the significance of the

linear correlation.

Hypothesis Testing

1 INTRODUCTION

Statistics plays an important role in decision making. In statistics, one utilizes random samples to

make inferences about the population from which the samples were obtained. Statistical inference

regarding population parameters takes two forms: estimation and hypothesis testing, although both

hypothesis testing and estimation may be viewed as different aspects of the same general problem of

arriving at decisions on the basis of observed data. We already saw several estimation procedures in

earlier chapters. Hypothesis testing is the subject of this chapter. Hypothesis testing has an important

role in the application of statistics to real-life problems. Here we utilize the sampled data to make

decisions concerning the unknown distribution of a population or its parameters. Pioneering work

on the explicit formulation as well as the fundamental concepts of the theory of hypothesis testing

are due to J. Neyman and E. S. Pearson.

A statistical hypothesis is a statement concerning the probability distribution of a random variable

or population parameters that are inherent in a probability distribution. The following example

illustrates the concept of hypothesis testing. An important industrial problem is that of accepting or

rejecting lots of manufactured products. Before releasing each lot for the consumer, the manufacturer

usually performs some tests to determine whether the lot conforms to acceptable standards. Let us

say that both the manufacturer and the consumer agree that if the proportion of defectives in a lot is

less than or equal to a certaiumber p, the lot will be released. Very often, instead of testing every

item in the lot, we may test only a few items chosen at random from the lot and make decisions

about the proportion of defectives in the lot; that is, we make the decisions about the population

on the basis of sample information. Such decisions are called statistical decisions. In attempting to

reach decisions, it is useful to make some initial conjectures about the population involved. Such

conjectures are called statistical hypotheses. Sometimes the results from the sample may be markedly

different from those expected under the hypothesis. Then we can say that the observed differences

are significant and we would be inclined to reject the initial hypothesis. These procedures that enable

us to decide whether to accept or reject hypotheses or to determine whether observed samples differ

significantly from expected results are called tests of hypotheses, tests of significance, or rules of decision.

In any hypothesis testing problem, we formulate a null hypothesis and an alternative hypothesis such that

if we reject the null, then we have to accept the alternative. The null hypothesis usually is a statement

of either the “status quo” or “no effect.” A guideline for selecting a null hypothesis is that when the

objective of an experiment is to establish a claim, the nullification of the claim should be taken as

the null hypothesis. The experiment is often performed to determine whether the null hypothesis is

false. For example, suppose the prosecution wants to establish that a certain person is guilty. The null

hypothesis would be that the person is innocent and the alternative would be that the person is guilty.

Thus, the claim itself becomes the alternative hypothesis. Customarily, the alternative hypothesis is

the statement that the experimenter believes to be true. For example, the alternative hypothesis is

the reason a person is arrested (police suspect the person is not innocent). Once the hypotheses

1 Introduction

have been stated, appropriate statistical procedures are used to determine whether to reject the null

hypothesis. For the testing procedure, one begins with the assumption that the null hypothesis is true.

If the information furnished by the sampled data strongly contradicts (beyond a reasonable doubt)

the null hypothesis, then we reject it in favor of the alternative hypothesis. If we do not reject the

null, then we automatically reject the alternative. Note that we always make a decision with respect

to the null hypothesis. Note that the failure to reject the null hypothesis does not necessarily mean

that the null hypothesis is true. For example, a person being judged “not guilty” does not mean the

person is innocent. This basically means that there is not enough evidence to reject the null hypothesis

(presumption of innocence) beyond “a reasonable doubt.”

We summarize the elements of a statistical hypothesis in the following.

THE ELEMENTS OF A STATISTICAL HYPOTHESIS

1. The null hypothesis, denoted by H₀, is usually the nullification of a claim. Unless evidence from the

data indicates otherwise, the null hypothesis is assumed to be true.

2. The alternate hypothesis, denoted by H_a (or sometimes denoted by H₁), is customarily the claim

itself.

3. The test statistic, denoted by TS, is a function of the sample measurements upon which the

statistical decision, to reject or not reject the null hypothesis, will be based.

4. A rejection region (or a critical region) is the region (denoted by RR) that specifies the values

of the observed test statistic for which the null hypothesis will be rejected. This is the range of

values of the test statistic that corresponds to the rejection of H₀ at some fixed level of significance,

α, which will be explained later.

5. Conclusion: If the value of the observed test statistic falls in the rejection region, the null hypothesis

is rejected and we will conclude that there is enough evidence to decide that the alternative

hypothesis is true. If the TS does not fall in the rejection region, we conclude that we cannot reject

the null hypothesis.

In practice one may have hypotheses such as H₀ : μ = μ₀ against one of the following alternatives:

⎧

⎪

H_a : μ = μ₀, called a two-tailed alternative

⎨

or H_a : μ < μ₀, called a lower (or left) tailed alternative

⎪or Ha : μ > μ0, called an upper (or right) tailed alternative

⎩

A test with a lower or upper tailed alternative is called a one-tailed test. In an applied hypothesis testing

problem, we can use the following general steps.

GENERAL METHOD FOR HYPOTHESIS TESTING

1. From the (word) problem, determine the appropriate null hypothesis, H₀, and the alternative, H_a.

2. Identify the appropriate test statistics and calculate the observed test statistic from the data.

3. Find the rejection region by looking up the critical value in the appropriate table.

4. Draw the conclusion: Reject or fail to reject the null hypothesis, H₀.

5. Interpret the results: State in words what the conclusion means to the problem we started with.

Hypothesis Testing

It is always necessary to state a null and an alternate hypothesis for every statistical test performed.

All possible outcomes should be accounted for by the two hypotheses.

Example 1.1

In a coin-tossing experiment, let p be the probability of heads. We start with the claim that the coin is fair,

that is, H₀ : p = 1/2. We test this against one of the following alternatives:

(a) H_a: The coin is not fair (p = 1/2). This is a two-tailed alternative.

(b) H_a: The coin is biased in favor of heads (p > 1/2). This is an upper tailed alternative.

It is important to observe that the test statistic is a function of a random sample. Thus, the test statistic

itself is a random variable whose distribution is known under the null hypothesis. The value of a test

statistic when specific sample values are substituted is called the observed test statistic or simply test

statistic.

For example consider the hypothesis H₀ : μ = μ_o versus H_a : μ = μ_o, where μ_o is known. Assume

that the population is normal with a known variance σ². Consider X, an unbiased estimator of μ

based on the random sample X₁, . . . , X_n. Then Z = (X − μ₀)/(σ/^√n) is a function of the random

sample X₁, . . . , X_n, and has a known distribution, a standard normal, under H₀. If x₁, x₂, . . . , x_n are

specific sample values, then z = (x − μ₀)/(σ/^√n) is called the observed sample statistic or simply sample

statistic.

Definition 1.1 A hypothesis is said to be a simple hypothesis if that hypothesis uniquely specifies

the distribution from which the sample is taken. Any hypothesis that is not simple is called a composite

hypothesis.

Example 1.2

Refer to Example 1.1. The null hypothesis p =1/2 is simple, because the hypothesis completely specifies

the distribution, which in this case will be a binomial with p = 1/2 and with n being the number of tosses.

The alternative hypothesis p = 1/2 is composite because the distributioow is not completely specified

(we do not know the exact value of p).

Because the decision is based on the sample information, we are prone to commit errors. In a statistical

test, it is impossible to establish the truth of a hypothesis with 100% certainty. There are two possible

types of errors. On the one hand, one can make an error by rejecting H₀ when in fact it is true. On

the other hand, one can also make an error by failing to reject the null hypothesis when in fact it is

false. Because the errors arise as a result of wrong decisions, and the decisions themselves are based

on random samples, it follows that the errors have probabilities associated with them. We now have

the following definitions.

1 Introduction

Table 1 Statistical Decision and Error Probabilities

Statistical

True state of null hypothesis

decision

H₀ true

H₀ false

Do not reject H₀

Correct decision

Type II error (β)

Reject H₀

Type I error (α)

Correct decision

The decision and the errors are represented in Table 1.

Definition 1.2 (a) A type I error is made if H₀ is rejected when in fact H₀ is true. The probability of

type I error is denoted by α. That is,

P (rejecting H₀ H₀ is true) = α.

The probability of type I error, α, is called the level of significance.

(b) A type II error is made if H₀ is accepted when in fact H_a is true. The probability of a type II error is

denoted by β. That is,

P (not rejecting H₀ H₀ is false) = β.

It is desirable that a test should have a = β = 0 (this can be achieved only in trivial cases), or at least

we prefer to use a test that minimizes both types of errors. Unfortunately, it so happens that for a

fixed sample size, as α decreases, β tends to increase and vice versa. There are no hard and fast rules

that can be used to make the choice of α and β. This decision must be made for each problem based

on quality and economic considerations. However, in many situations it is possible to determine

which of the two errors is more serious. It should be noted that a type II error is only an error in

the sense that a chance to correctly reject the null hypothesis was lost. It is not an error in the sense

that an incorrect conclusion was drawn, because no conclusion is made when the null hypothesis is

not rejected. In the case of type I error, a conclusion is drawn that the null hypothesis is false when,

in fact, it is true. Therefore, type I errors are generally considered more serious than type II errors.

For example, it is mostly agreed that finding an innocent person guilty is a more serious error than

finding a guilty person innocent. Here, the null hypothesis is that the person is innocent, and the

Prob (TYPE II Error) 5 Beta

Prob (TYPE I Error) 5 Alpha

Under H₀

Under H_a

Critical value

Hypothesis Testing

alternate hypothesis is that the person is guilty. “Not rejecting the null hypothesis” is equivalent to

acquitting a defendant. It does not prove that the null hypothesis is true, or that the defendant is

innocent. In statistical testing, the significance level α is the probability of wrongly rejecting the null

hypothesis when it is true (that is, the risk of finding an innocent person guilty). Here the type II risk

is acquitting a guilty defendant. The usual approach to hypothesis testing is to find a test procedure

that limits α, the probability of type I error, to an acceptable level while trying to lower β as much as

possible.

The consequences of different types of errors are, in general, very different. For example, if a doctor

tests for the presence of a certain illness, incorrectly diagnosing the presence of the disease (type I

error) will cause a waste of resources, not to mention the mental agony to the patient. On the other

hand, failure to determine the presence of the disease (type II error) can lead to a serious health risk.

To formulate a hypothesis testing problem, consider the following situation. Suppose a toy store

chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys. We feel

that this claim is inflated. In an attempt to dispose of this claim, we observe the buying pattern of 20

randomly selected girls under 8 years old, and we observe X, the number of girls under 8 years old

who buy stuffed toys or dolls. Now the question is, how can we use X to confirm or reject the store’s

claim? Let p be the probability that a girl under 8 chosen at random prefers stuffed toys or dolls. The

questioow can be reformulated as a hypothesis testing problem. Is p ≥ 0.8 or p < 0.8? Because we

would like to reject the store’s claim only if we are highly certain of our decision, we should choose

the null hypothesis to be H₀ : p ≥ 0.8, the rejection of which is considered to be more serious. The

null hypothesis should be H₀ : p ≥ 0.8, and the alternative H_a : p < 0.8. In order to make the null

hypothesis simple, we will use H₀ : p = 0.8, which is the boundary value with the understanding that

it really represents H₀ : p ≥ 0.8. We note that X, the number of girls under 8 years old who prefer

stuffed toys or dolls, is a binomial random variable. Clearly a large sample value of X would favor

H₀. Suppose we arbitrarily choose to accept the null hypothesis if X >12. Because our decision is

based on only a sample of 20 girls under 8, there is always a possibility of making errors whether

we accept or reject the store chain’s claim. In the following example, we will now formally state this

problem and calculate the error probabilities based on our decision rule.

Example 1.3

A toy store chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys.

After observing the buying pattern of many girls under 8 years old, we feel that this claim is inflated. In an

attempt to dispose of this claim, we observe the buying pattern of 20 randomly selected girls under 8 years

old, and we observe X, the number of girls who buy stuffed toys or dolls. We wish to test the hypothesis

H₀ : p = 0.8 against H_a : p < 0.8. Suppose we decide to accept the H₀ if X > 12 (that is X ≥ 13). This

means that if {X ≤ 12} (that is X < 13) we will reject H₀.

(a) Find α.

(b) Find β for p = 0.6.

(d) Find the rejection region of the form {X ≤ K} so that (i) α = 0.01; (ii) α = 0.05.

(e) For the alternative H_a :p = 0.6, find β for the values of α in part (d).

1 Introduction

Solution

The TS X is the number of girls under 8 years old who buy dolls. X follows the binomial distribution with

n = 20 and p, the unknown population proportion of girls under 8 who prefer dolls. We now calculate α

and β.

(a)

For p = 0.8, the probability of type I error is

α = P{reject H₀ H₀ is true}

= P{X ≤ 12 p = 0.8}

∑

⁽20)(0.8)

^x(0.2)20−x

x=0

= 0.0321.

If we calculate α for any other value of p > 0.8, then we will find that it is smaller than 0.0321.

Hence, there is at most a 3.21% chance of rejecting a true null hypothesis. That is, if the store’s claim

is in fact true, then the chance that our test will erroneously reject that claim is at most 3.21%.

(b)

Here p = 0.6. The probability of type II error is

β = P{accept H₀ H₀ false}

= P{X > 12 p = 0.6}

= 1 − P{X ≤ 12 p = 0.6}

= 1 − 0.584

= 0.416

so there is a 4.2% chance of accepting a false null hypothesis. Thus, in case the store’s claim is not

true, and the truth is that only 60% of girls under 8 years old prefer dolls over other types of toys,

then there is a 4.2% chance that our test will erroneously conclude that the store’s claim is true.

(c)

If p = 0.4, then

β = P{accept H₀ H₀ false}

= P{X > 12 p = 0.4}

= 1 − P{X ≤ 12 p = 0.4}

= 1 − 0.979

= 0.021.

That is, there is a 2.1% chance of accepting a false null hypothesis.

(d)

(i) To find K such that

α = P{X ≤ K p = 0.8} = 0.01

from the binomial table, K = 11. Hence, the rejection region is: Reject H₀ if {X ≤ 11}.

(ii) To find K such that

α=P{X≤K p=0.8}=0.05

Hypothesis Testing

from the binomial table, α = 0.05 falls between K = 12 and K = 13. However, for K = 13, the

value for α is 0.087, exceeding 0.05. If we want to limit α to be no more than 0.05, we will

have to take K = 12. That is, we reject the null hypothesis if X ≤ 12, yielding an α = 0.0321

as shown in (a).

(e)

(i) When a = 0.01, from (d), the rejection region is of the form {X ≤ 11}. For p = 0.6,

β = P{accept H₀ H₀ false}

= P{Y > 11 p = 0.6}

= 1 − P{Y ≤ 11 p = 0.6}

= 1 − 0.404

= 0.596.

(ii) From (a) and (b) for testing the hypothesis H₀ : p = 0.8 against H_a : p < 0.8 with n = 20.

We see that when α is 0.0321, β is 0.416. From (d)(i) and (e)(i) for the same hypothesis, we

see that when α is 0.01, β is 0.596. This holds in general. Thus, we observe that for fixedas

α decreases, β increases and vice versa.

In the next example, we explore what happens to β as the sample size increases, with α fixed.

Example 1.4

Let X be a binomial random variable. We wish to test the hypothesis H₀ : p = 0.8 against H_a : p = 0.6. Let

α = 0.03 be fixed. Find β for n = 10 and n = 20.

Solution

For n = 10, using the binomial tables, we obtain P {X ≤ 5 p = 0.8}= 0.03. Hence the rejection region for

the hypothesis H₀ : p = 0.8 vs. H_a : p = 0.6 is given by reject H₀ if X ≤ 5. The probability of type II error is

β = P{accept H₀ p = 0.6}

β = P{X > 5 p = 0.6} = 1 − P{X ≤ 5 p = 0.6} = 0.733.

For n = 20, as shown in Example 1.3, if we reject H₀ for X ≤ 12, we obtain

P (X ≤ 12 p = 0.8)⁼ 0.03

and

β = P(X > 12 p = 0.6) = 1 − P{X ≤ 12 p = 0.6} = 0.416.

We see that for a fixed α, asincreases β decreases and vice versa. It can be shown that this result holds in

general.

1 Introduction

In order for us to compute the value of β, it is necessary that the alternate hypothesis is simple. Now

we will discuss a three-step procedure to calculate β.

STEPS TO CALCULATE β

1. Decide an appropriate test statistic (usually this is a sufficient statistic or an estimator for the

unknown parameter, whose distribution is known under H₀).

2. Determine the rejection region using a given α, and the distribution of the test statistic (TS).

3. Find the probability that the observed test statistic does not fall in the rejection region assuming

H_a is true. This gives β. That is,

β = P(T.S. falls in the complement of the rejection region H_ais true).

Example 1.5

A random sample of size 36 from a population with known variance, σ² = 9, yields a sample mean of

x = 1 Find β, for testing the hypothesis H₀ : μ = 15 versus H_a : μ = 16. Assume α = 0.05.

Solution

Here n = 36, x = 17, and σ² = 9. In general, to test H₀ : μ = μ₀ versus H_a : μ > μ₀, we proceed as

follows. An unbiased estimator of μ is X. Intuitively we would reject H₀ if X is large, say X > c. Now using

α = 0.05, we will determine the rejection region. By the definition of α, we have

P(X > c μ = μ₀) = 0.05

(

)

P σ/√μ0

> σ/√0

= 0.05

n μ=μ0

But if μ = μ₀, because the sample size n ≥ 30, [(X − μ₀)/(σ/^√n)] ∼ N(0, 1). Therefore, P⁽(σ/√0

n) >

)

(

)

c−μ₀

= 0.05 is equivalent to P Z >(σ/√0

= 0.05. From standard normal tables, we obtain P (Z >

(σ/^√n)

1.645) = 0.05. Hence(σ/√0

n) =1.645orc=μ0+1.645(σ/√n).

Therefore, the rejection region is the set of all sample means x such that

(

)

x > μ0 + 1.645

^√n

Substituting μ₀ = 15, and σ = 3, we obtain

)

μ₀ + 1.645(σ/√n) = 15 + 1.645(3

= 15.8225.

The rejection region is the set of x such that x ≥ 15.8225.

Hypothesis Testing

Then by definition,

β = P (X ≤ 15.8225 when μ = 16).

Consequently, for μ = 16,

(

)

X − 16

15.8225 − 16

β=P

≤

√

σ/^√n

= P (Z ≤ −0.36)

= 0.3594.

That is, under the given information, there is a 35.94% chance of not rejecting a false null hypothesis.

1.1 Sample Size

It is clear from the preceding example that once we are given the sample size n, an α, a simple

alternative H_a, and a test statistic, we have no control over β and it is exactly determined. Hence, for

a given sample size and test statistic, any effort to lower β will lead to an increase in α and vice versa.

This means that for a test with fixed sample size it is not possible to simultaneously reduce both α

and β. We also notice from Example 1.4 that by increasing the sample size n, we can decrease β

(for the same α) to an acceptable level. The following discussion illustrates that it may be possible to

determine the sample size for a given α and β.

Suppose we want to test H₀ : μ = μ₀ versus H_a : μ > μ₀. Given α and β, we want to find n, the

sample size, and K, the point at which the rejection begins. We know that

α = P (X > K when μ = μ₀)

(

)

X−μ₀

> σ/√μ0

when μ = μ₀

(1)

σ/^√n

n ,

= P (Z > z_a)

and

β = P (X ≤ K, when μ = μ_a)

(

)

X−μ_a

≤ σ/√μa

when μ = μ_a

(2)

σ/^√n

n ,

= P (z ≤ −z_β).

From Equations (1) and (2),

K−μ₀

z_α =

σ/^√n

1 Introduction

and

K−μ_a

−z_β =

σ/^√n

This gives us two equations with two unknowns (K and n), and we can proceed to solve them.

Eliminating K, we get

(

)

(

σ )

μ₀ + z_α

=μ_a−z_β

^√n

From this we can derive

(z_α + z_β)σ

^√n =

μ_a − μ₀

Thus, the sample size for an upper tail alternative hypothesis is

)²σ²

(z_α + z_β

(μ_a − μ₀)2.

The sample size increases with the square of the standard deviation and decreases with the square of

the difference between mean value of the alternative hypothesis and the mean value under the null

hypothesis. Note that in real-world problems, care should be taken in the choice of the value of μ_a

for the alternative hypothesis. It may be tempting for a researcher to take a large value of μ_a in order

to reduce the required sample size. This will seriously affect the accuracy (power) of the test. This

alternative value must be realistic within the experiment under study. Care should also be taken in

the choice of the standard deviation σ. Using an underestimated value of the standard deviation to

reduce the sample size will result in inaccurate conclusions similar to overestimating the difference

of means. Usually, the value of σ is estimated using a similar study conducted earlier. The problem

could be that the previous study may be old and may not represent the new reality. When accuracy is

important, it may be necessary to conduct a pilot study only to get some idea on the estimate of σ.

Once we determine the necessary sample size, we must devise a procedure by which the appropriate

data can be randomly obtained. This aspect of the design of experiments is discussed in Chapter 9.

Example 1.6

Let σ = 3.1 be the true standard deviation of the population from which a random sample is chosen. How

large should the sample size be for testing H₀ : μ = 5 versus H_a : μ = 5.5, in order that α = 0.01 and

β = 0.05?

Solution

We are given μ₀ = 5 and μ_a = 5.5. Also, z_α = z0.01 = 2.33 and z_β = z0.05 = 1.645. Hence, the

sample size

(z_α + z_β)2σ2

= (2.33+1.645)2(3.1)2

= 603

(μ_a − μ₀)²

(0.5)²

Hypothesis Testing

So, n = 608 will provide the desired levels. That is, in order for us to test the foregoing hypothesis, we must

randomly select 608 observations from the given population.

From a practical standpoint, the researcher typically chooses α, and the sample size β is ignored.

Because a trade-off exists between α and β, choosing a very small value of α will tend to increase β in

a serious way. A general rule of thumb is to pick reasonable values of α, possibly in the 0.05 to 0.10

range so that β will remain reasonably small.

EXERCISES 1

1.1.

An appliance manufacturer is considering the purchase of a new machine for stamping out

sheet metal parts. If μ₀ (given) is the true average of the number of good parts stamped out

per hour by their old machine and μ is the corresponding true unknown average for the

new machine, the manufacturer wants to test the null hypothesis μ = μ₀ versus a suitable

alternative. What should the alternative be if he does not want to buy the new machine

unless it is (a) more productive than the old one? (b) At least 20% more productive than the

old one?

1.2.

Formulate an alternative hypothesis for each of the following null hypotheses.

(a) H₀: Support for a presidential candidate is unchanged after the start of the use of TV

commercials.

(b) H₀: The proportion of viewers watching a particular local news channel is less

than 30%.

1.3.

It is suspected that a coin is not balanced (not fair). Let p be the probability of tossing a head.

To test H₀ : p = 0.5 against the alternative hypothesis H_a : p > 0.5, a coin is tossed 15 times.

Let Y equal the number of times a head is observed in the 15 tosses of this coin. Assume the

rejection region to be {Y ≥ 10}.

(a) Find α.

(b) Find β for p = 0.

(d) Find the rejection region for {Y ≥K} for α = 0.01, and α = 0.03.

(e) For the alternative H_a : p = 0.7, find β for the values of α given in (d).

1.4.

In Exercise 1.3:

(a) Assume that the rejection region is {Y ≥ 8}. Calculate α and β if p = 0.6. Compare the

results with the corresponding values obtained in Exercise 1.3. (This gives the effect of

enlarging the rejection region on α and β.)

(b) Assume that the rejection region is {Y ≥ 8}. Calculate α and β if p = 0.6 and (i) the coin

is tossed 20 times, or (ii) the coin is tossed 25 times. (This shows the effect of increasing

the sample size on α and β for a fixed rejection region.)

1.5.

Suppose we have a random sample of size 25 from a normal population with an unk-

nown mean μ and a standard deviation of 4. We wish to test the hypothesis H₀ : μ = 10 vs.

2 The Neyman-Pearson Lemma

H_a

: μ

> 10. Let the rejection region be defined by: reject H₀ if the sample mean

X > 11.2.

(a) Find α.

(b) Find β for H_a : μ = 11.

1.6.

A process for making steel pipe is under control if the diameter of the pipe has mean 3.0 in.

with standard deviation of no more than 0.0250 in. To check whether the process is under

control, a random sample of size n = 30 is taken each day and the null hypothesis μ = 3.0

is rejected if X is less than 2.9960 or greater than 3.0040. Find (a) the probability of type I

error; (b) the probability of type II error when μ = 3.0050 in. Assume σ = 0.0250 in.

A bowl contains 20 balls, of which x are green and the remain- der red. To test H₀ : x = 10

versus H_a : x = 15, three balls are selected at random without replacement, and H₀ is rejected

if all three balls are green. Calculate α and β for this test.

1.8.

Suppose we have a sample of size 6 from a population with pdf f (x) = (1/θ)e−x/θ , x > 0, θ >

0. We wish to test H₀ : θ = 1 vs. H_a : θ > 1. Let the rejection region be defined by reject H₀ if

∑₆

θ = 2.

i=1 Xi >8.(a)Findα.(b)FindβforHa :

1.9.

Let σ² = 16 be the variance of a normal population from which a random sample is chosen.

How large should the sample size be for testing H₀ : μ = 25 versus H_a : μ = 24, in order that

α=0.05 and β = 0.05?

2 THE NEYMAN-PEARSON LEMMA

In practical hypothesis testing situations, there are typically many tests possible with significance level

α for a null hypothesis versus alternative hypothesis (see Project 7A). This leads to some important

questions, such as (1) how to decide on the test statistic and (2) how to know that we selected the best

rejection region. In this section, we study the answer to these questions using the Neyman-Pearson

approach.

Definition 2.1 Suppose that W is the test statistic and RR is the rejection region for a test of hypothesis

concerning the value of a parameter θ. Then the power of the test is the probability that the test rejects H₀

when the alternative is true. That is,

π = Power(θ)

= P(W in RR when the parameter value is an alternative θ).

If H₀ : θ = θ₀ and H_a : θ = θ₀, then the power of the test at some θ = θ₁ = θ₀ is

Power(θ₁) = P (reject H₀ θ = θ₁).

But, β(θ₁) = P (accept H₀ θ = θ₁). Therefore,

Power(θ₁) = 1 − β(θ₁).

A good test will have high power.

Hypothesis Testing

Note that the power of a test H₀ cannot be found until some true situation H_a is specified. That is,

the sampling distribution of the test statistic when H_a is true must be known or assumed. Because

β depends on the alternative hypothesis, which being composite most of the time does not specify

the distribution of the test statistic, it is important to observe that the experimenter cannot control

β. For example, the alternative H_a : μ < μ₀ does not specify the value of μ, as in the case of the null

hypothesis, H₀ : μ = μ₀.

Example 2.1

Let X₁, . . . , X_n be a random sample from a Poisson distribution with parameter λ, that is, the pdf is

given by f (x) = e−λλ^x/(x!). Then the hypothesis H₀ : λ = 1 uniquely specifies the distribution, because

f (x) = e−1/(x!) and hence is a simple hypothesis. The hypothesis H_a : λ > 1 is composite, because f (x) is

not uniquely determined.

Definition 2.2 A test at a given α of a simple hypothesis H₀ versus the simple alternative H_a that has

the largest power among tests with the probability of type I error no larger than the given α is called a most

powerful test.

Consider the test of hypothesis H₀ : θ = θ₀ versus H_a : θ = θ₁. If α is fixed, then our interest is to

make β as small as possible. Because β = 1 − Power(θ₁), by minimizing β we would obtain a most

powerful test. The following result says that among all tests with given probability of type I error, the

likelihood ratio test given later minimizes the probability of a type II error, in other words, it is most

powerful.

Theorem 2.1 (Neyman-Pearson Lemma) Suppose that one wants to test a simple hypothesis H₀ :

θ = θ₀ versus the simple alternative hypothesis Ha :θ =θ₁ based on a random sample X₁,…,X_n from a

distribution with parameter θ. Let L(θ) ≡ L(θ; X₁, . . . , X_n) > 0 denote the likelihood of the sample when

the value of the parameter is θ. If there exist a positive constant K and a subset C of the sample space Rⁿ (the

Euclidean n-space) such that

L(θ₀)

≤ K for (x₁,x₂,…,x_n) ∈ C

L(θ₁)

L(θ₀)

≥ K for (x₁,x₂,…,x_n) ∈ C^′, where C^′ is the complement of C, and

L(θ₁)

3. P [(X₁, . . . , X_n) ∈ C; θ₀] = α.

Then the test with critical region C will be the most powerful test for H₀ versus H_a. We call α the size of the

test and C the best critical region of size α.

Proof. We prove this theorem for continuous random variables. For discrete random variables, the

proof is identical with sums replacing the integral. Let S be some region in Rⁿ, an n-dimensional

Euclidean space. For simplicity we will use the following notation:

∫

L(θ) = . . . L(θ; x₁, x₂, . . . , x_n)dx₁dx₂, . . . , dx_n

2 The Neyman-Pearson Lemma

Note that

∫

P ((X₁, . . . , X_n) ∈ C; θ₀) = f (x₁, . . . , x_n; θ₀)dx₁, . . . , dx_n

∫

= L(θ₀; x₁, . . . , x_n)dx₁, . . . , dx_n.

Suppose that there

is another critical region, say B, of size less than or equal

α,

that

∫

B L(θ0) ^≤α. Then

∫

≤ L(θ₀) − L(θ₀), because L(θ₀) = α by assumption 3.

Therefore,

∫

0 ≤ L(θ₀) − L(θ₀)

∫

= L(θ₀) +

L(θ₀) −

L(θ₀)

C∩B

C∩B^′

C∩B

C^′∩B

∫

= L(θ₀) −

L(θ₀).

C∩B^′

C^′∩B

Using assumption 1

of Theorem 2.1, KL(θ₁) ≥ L(θ₀) at each point in the region C and hence in

C ∩ B^′. Thus

∫

L(θ₀) ≤ K

L(θ₁).

C∩B^′

By assumption 2 of the theorem, KL(θ₁) ≤ L(θ₀) at each point in C^′, and hence in C^′ ∩ B. Thus,

∫

L(θ₀) ≥ K

L(θ₁).

C^′∩B

Therefore,

∫

0≤

L(θ₀) −

L(θ₀)

C∩B^′

C^′∩B

⎧

⎫

⎨

∫

⎬

≤K

L(θ₁)

⎩

⎪ L(θ1)−

⎭

C∩B^′

C^′∩B

Hypothesis Testing

That is,

⎧

⎫

⎨

∫

⎬

0≤K

L(θ₁) +

L(θ₁)−

L(θ₁) −

L(θ₁)

⎩

⎭

C∩B

C∩B^′

C∩B

C^′∩B

⎧

⎫

⎨∫

∫

⎬

L(θ₁) − L(θ₁)

= K⎩

⎭^.

As a result,

∫

L(θ₁) ≥ L(θ₁).

Because this is true for every critical region B of size ≤ α, C is the best critical region of size α, and

the test with critical region C is the most powerful test of size α.

When testing two simple hypotheses, the existence of a best critical region is guaranteed by the

Neyman-Pearson lemma. In addition, the foregoing theorem provides a means for determining

what the best critical region is. However, it is important to note that Theorem 2.1 gives only the

form of the rejection region; the actual rejection region depends on the specific value of α.

In real-world situations, we are seldom presented with the problem of testing two simple hypotheses.

There is no general result in the form of Theorem 4.1 for composite hypotheses. However, for

hypotheses of the form H₀ : θ = θ₀ versus H_a : θ > θ₀, we can take a particular value θ₁ > θ₀ and

then find a most powerful test for H₀ : θ = θ₀ versus H_a : θ > θ₁. If this test (that is, the rejection

region of the test) does not depend on the particular value θ₁, then this test is said to be a uniformly

most powerful test for H₀ : θ = θ₀ versus H_a : θ > θ₀.

The following example illustrates the use of the Neyman-Pearson lemma.

Example 2.2

Let X₁, . . . , X_n denote an independent random sample from a population with a Poisson distribution with

mean λ. Derive the most powerful test for testing H₀ : λ = 2 versus H_a : λ = 1/2.

Solution

Recall that the pdf of Poisson variable is

−λλ^x

, λ > 0,x = 0,1,2,…

p(x) =

otherwise.

Thus, the likelihood function is

[

∑

]

(

x_i)

λi=1

e−λn

(x_i!)

i=1

2 The Neyman-Pearson Lemma

For λ = 2,

)

]

^∑x

2 i=1

e−2n

L(θ₀) = L(λ = 2) =

(x_i!)

i=1

and for λ = 1/2,

⎡

(

)

⎤

∑

⎣(1/2) i=1

ⁱ e−(1/2)n⎦

L(θ₁) = L(λ = 1/2) =

(x_i!)

i=1

Thus,

(

)

x_i)

2(∑

e−n2

L(θ₀)

(

)∑xi

L(θ₁) =

e−2

which implies

∑

(

)

(4)

xi e⁻

or, taking natural logarithm,

(∑ )

x_i ln 4 −

ln K.

2 <

Solving for (∑xi) and letting {[ln K + (3n/2)]/ln 4} = K′, we will reject H0 whenever (

^∑x_i) < K′.

A step-by-step procedure in applying the Neyman-Pearson lemma is now given.

PROCEDURE FOR APPLYING THE NEYMAN-PEARSON LEMMA

1. Determine the likelihood functions under both null and alternative hypotheses.

2. Take the ratio of the two likelihood functions to be less than a constant K .

3. Simplify the inequality in step 2 to obtain a rejection region.

Example 2.3

Suppose X₁, . . . , X_n is a random sample from a normal distribution with a known mean of μ and an

unknown variance of σ². Find the most powerful α-level test for testing H₀

: σ²

= σ²

0 versusHa :

σ² = σ²(σ2

1>σ0).Showthatthistestisequivalenttotheχ2-test.Isthetestuniformlymostpowerfulfor

H_a : σ² > σ²

Hypothesis Testing

Solution

To test H₀ : σ² = σ²

σ² > σ²

0 versusHa :

1.Wehave

(x_i

− μ)²

−

∏

2σ²

L(σ²

√

0)⁼

2πσne

i=1

^∑(x

− i−μ)2

2σⁿ

√

(

2π)ⁿσne

Similarly,

^∑(x

− i−μ)2

2σ²

L(σ²

√

1)⁼

(

2π)ⁿσne

Therefore, the most powerful test is, reject H₀ if,

[

]

(

)_n

L(σ²

σ²

−(σ1−σ0)2

^∑(x_i − μ)²

2σ²

1σ⁰

≤K

L(σ²

σ²

for some K.

Taking the natural logarithms, we have

)

⁽σ₁

(σ²

∑

1 −σ0)

n ln

−

(x_i − μ)² ≤ ln K

σ₀

2σ²

1σ⁰

[

)

](

)

∑

⁽σ₁

2σ²

1σ⁰

(x_i − μ)² ≥ n ln

− ln K

= C.

σ₀

σ²

1 −σ0

To find the rejection region for a fixed value of α, write the region as

^∑(x_i − μ)²

≥ C

= C^′.

σ²

Note that^∑(x_i − μ)²/σ²

because the same

0 hasaχ2-distributionwithndegreesoffreedom.UndertheH0

rejection region (does not depend upon the specific value of σ²

1 inthealternative)wouldbeusedforany

σ²

the test is uniformly most powerful.

1 >

The foregoing example shows that, in order to test for variance using a sample from a normal

distribution, we could use the chi-square table to obtain the critical value for the rejection region

given α.

3 Likelihood Ratio Tests

EXERCISES 2

2.1.

Suppose X₁, . . . , X_n is a random sample from a normal distribution with a known variance

of σ² and an unknown mean of μ. Find the most powerful α-level test of H₀ : μ = μ₀ versus

H_a : μ = μ_a if (a) μ₀ > μ_a, and (b) μ_a > μ₀.

2.2.

Show that the most powerful test obtained in Example 2.1 is uniformly most powerful for

testing H₀ : μ ≤ μ₀ versus H_a : μ > μ_a, but not uniformly most powerful for testing H₀ : μ = μ₀

versus H_a : μ = μ₀.

2.3.

Suppose X₁, . . . , X_n is a random sample from a U(0, θ) distribution. Find the most powerful

α-level test for testing H₀ : θ = θ₀ versus H_a : θ = θ₁, where θ₀ < θ₁.

2.4.

Let X₁, . . . , X_n be a random sample from a geometric distribution with parameter p. Find the

most powerful test of H₀ : p = p₀ versus H_a : p = p_a(> p₀). Is this uniformly most powerful

test for H₀ : p = p₀ versus H_a : p > p₀?

2.5.

Let X₁, . . . , X_n be a random sample from a distribution having a pdf of

⎧

y²

⎨2y

η² , ifx > 0

f (y) =

⎩η2 e⁻

otherwise.

Find a uniformly most powerful test for testing H₀ : η = η₀ versus H_a : η < η₀.

2.6.

Let X be a single observation from the pdf

θxθ−1,

0<x<1

f (x) =

otherwise.

Find the most powerful test with a level of significance α = 0.01 to test H₀ : θ = 3 versus

H_a : θ = 4.

Let X₁, . . . , X_n be a random sample from a Bernoulli distribution with parameter p. Find the

most powerful test of H₀ : p = p₀ versus H_a : p = p_a, where p_a > p₀.

2.8.

Let X₁, . . . , X_n be a random sample from a Poisson distribution with mean λ. Find a best

critical region for testing H₀ : λ = 3 against H_a : λ = 6.

3 LIKELIHOOD RATIO TESTS

The Neyman-Pearson lemma provides a method for constructing most powerful tests for simple

hypotheses. We also have seen that in some instances when a hypothesis is not simple, it is pos-

sible to find uniformly most powerful tests. In general, uniformly most powerful (UMP) tests do

not exist for composite hypotheses. As an example, consider the two-sided hypothesis, at level α,

given by

H₀ : μ = μ₀

vs. H_a : μ = μ₀

where μ is the mean of a normal population with known variance σ². If X is the sample mean of a

random sample of size n, then as shown earlier, we can use the test statistic

Hypothesis Testing

X−μ₀

∕

^√n

For H_a : μ = μ₁ > μ₀, the rejection region for the most powerful test would be

Reject H₀ if z > z_α.

On the other hand for H_a : μ = μ₂ < μ₀, the rejection region for the most powerful test would be

Reject H₀ if z < −z_α.

Thus, the rejection region depends on the specific alternative. Consequently, the two-sided hypothesis

just given has no UMP test.

In this section, we shall study a general procedure that is applicable when one or both H₀ and H_a are

composite. In fact, this procedure works for simple hypotheses as well. This method is based on the

maximum likelihood estimation and the ratio of likelihood functions used in the Neyman-Pearson

lemma. We assume that the pdf or pmf of the random variable X is f (x, θ), where θ can be one or

more unknown parameters. Let represent the total parameter space that is the set of all possible

values of the parameter θ given by either H₀ or H₁.

Consider the hypotheses

H₀ : θ ∈

0 vs. Ha : θ ∈ a =

−

where θ is the unknown population parameter (or parameters) with values in

, and

0 is a subset

Let L(θ) be the likelihood function based on the sample X₁, . . . , X_n. Now we define the likelihood

ratio corresponding to the hypotheses H₀ and H_a. This ratio will be used as a test statistic for the

testing procedure that we develop in this section. This is a natural generalization of the ratio test used

in the Neyman-Pearson lemma when both hypotheses were simple.

Definition 3.1 The likelihood ratio λ is the ratio

max L(θ; x₁, . . . , x_n)

θ∈

L^∗

λ=

max

L(θ; x₁, . . . , x_n)

L∗.

θ∈

We note that 0 ≤ λ ≤ 1. Because λ is the ratio of nonnegative functions, λ ≥ 0. Because

0 is a subset

, we know that max

L(θ) ≤ max L(θ). Hence, λ ≤ 1.

θ∈

If the maximum of L in

0 is much smaller as compared with the maximum of L in

, that is, if

λ is small, it would appear that the data X₁, . . . , X_n do not support the null hypothesis θ ∈

0. On

the other hand, if λ is close to 1, one could conclude that the data support the null hypothesis, H₀.

Therefore, small values of λ would result in rejection of the null hypothesis, and large values nearer

to 1 will result a decision in support of the null hypothesis.

3 Likelihood Ratio Tests

For the evaluation of λ, it is important to note that maxθ∈ L(θ) = L(θ_ml.), where θ_ml. is the maximum

likelihood estimator of θ ∈

, and maxθ∈

0 L(θ)isthelikelihoodfunctionwithunknownparameters

replaced by their maximum likelihood estimators subject to the condition that θ ∈

0. We can

summarize the likelihood ratio test as follows.

LIKELIHOOD RATIO TESTS (LRTs)

To test

H₀ : θ ∈

0 vs. Ha : θ ∈ a

max L(θ; x₁, . . . , x_n )

θ∈

L^∗

λ=

maxL(θ; x1, . . . , xn )

L∗

θ∈

will be used as the test statistic.

The rejection region for the likelihood ratio test is given by

Reject H₀ if λ ≤ K .

K is selected such that the test has the given significance level α.

Example 3.1

Let X₁, . . . , X_n be a random sample from an N(μ, σ²). Assume that σ² is known. We wish to test, at level

α, H₀ : μ = μ₀ vs. H_a : μ = μ₀. Find an appropriate likelihood ratio test.

Solution

We have seen that to test

H₀ : μ = μ₀

vs. H_a : μ = μ₀

there is no uniformly most powerful test for this case. The likelihood function is

∑

(x_i − μ)²

(

)_n

−i=1

2σ²

L(μ) =

√

2πσ

Here,

0 = {μ0} and a = R − {μ0}.

Hence,

∑

(x_i − μ)²

(

)_n

−i=1

2σ²

L^∗

max

√

0 =

μ=μ₀

2πσ

∑

(x_i − μ₀)²

(

)_n

−i=1

2σ²

√

2πσ

Hypothesis Testing

Similarly,

∑

(x_i − μ)²

(

)_n

−i=1

2σ²

L^∗ = max

√

−∞<μ<∞

2πσ

Because the only unknown parameter in the parameter space is μ, −∞ < μ < ∞, the maximum of the

likelihood function is achieved when μ equals its maximum likelihood estimator, that is,

μ_ml. = X.

Therefore, with a simple calculation we have

(

)

∑

−

(x_i−μ₀)²

/2σ²

i=1

λ=

(

)

=e−n(x−μ0)2/2σ2.

∑

−

(x_i−x)²

/2σ²

e i=1

Thus, the likelihood ratio test has the rejection region

Reject H₀

if λ ≤ K

which is equivalent to

− n

2σ2(X−μ0)2≤lnK⇔

(X − μ₀)²

≥ 2lnK ⇔

σ²/n



X − μ0

 σ/^√n ≥2lnK=c1,say.

Note that we use the symbol ⇔ to mean ‘‘if and only if.’’ We now compute c₁. Under H₀

[(X − μ₀

(σ/^√n)] ∼ N(0, 1).

Observe that



}

X − μ0

α=P

σ∕√n ≥c1

gives a possible value of c₁ as c₁

= zα/2. Hence, LRT for the given hypothesis is



X − μ0

Reject H₀ if

 σ/^√n ≥za/2.

Thus, in this case, the likelihood ratio test is equivalent to the z-test for large random samples.

In fact, when both the hypotheses are simple, the likelihood ratio test is identical to the Neyman-

Pearson test. We caow summarize the procedure for the likelihood ratio test, LRT.

3 Likelihood Ratio Tests

PROCEDURE FOR THE LIKELIHOOD RATIO TEST (LRT)

1. Find the largest value of the likelihood L(θ) for any θ₀ ∈

0 by finding the maximum likelihood

estimate within

0 and substituting back into the likelihood function.

2. Find the largest value of the likelihood L(θ) for any θ ∈ by finding the maximum likelihood

estimate within and substituting back into the likelihood function.

3. Form the ratio

L(θ) in

λ = λ(x₁,x₂,…,x_n) =

L(θ) in

4. Determine a K so that the test has the desired probability of type I error, α.

5. Reject H₀ if λ ≤ K .

In the next example, we find a LRT for a testing problem when both H₀ and H_a are simple.

Example 3.2

Machine I produces 5% defectives. Machine 2 produces 10% defectives. Ten items produced by each of

the machines are sampled randomly; X = number of defectives. Let θ be the true proportion of defectives.

Test H₀ : θ = 0.05 versus H_a : θ = 0.1. Use α = 0.05.

Solution

We need to test H₀

θ=

0.05 vs. H_a : θ = 0.1. Let

⎧

⎪(10)

(0.05)^x(0.95)10−x, if θ = 0.05

⎨

L(θ) =

⎪

)

⁽10

⎩

(0.1)^x(0.90)10−x, if θ = 0.10.

And

)

⁽10

L₁ = L(0.05) =

(0.05)^x(0.95)10−x

and

)

⁽10

L₂ = L(0.1) =

(0.1)^x(0.90)10−x.

Thus, we have

L₁

0.05^x (0.95)10−x

⁽1)x(19)10−x

L₂

0.1^x

(0.9)10−x

Hypothesis Testing

The ratio

L₁

λ=

max(L₁, L₂).

Note that if max(L₁, L₂) = L₁, then λ = 1. Because we want to reject for small values of λ, max(L₁, L₂) =

L₂, and we reject H₀ if (L₁/L₂) ≤ K or (L₂/L₁) > K (note thatL2

2^x(¹⁸

L₁ =

19 )10−x).

That is, reject H₀ if

)10−x

⁽ 18

2^x

(

)_x

⇔

>K₁

¹⁹ )_x

⁽19

⇔

>K₁.

Hence, reject H₀ if X > C; P (X > C H₀ : θ = 0.05) ≤ 0.05.

Using the binomial tables, we have

P (X > 2 θ = 0.05) = 0.0116

and

P (X ≥ 2 θ = 0.05) = 0.0862.

Reject H₀ if X > 2. If we want α to be exactly 0.05, we have to use randomized test. Reject with

probability0.0384

0.0762 =0.5039ifX=2.

The likelihood ratio tests do not always produce a test statistic with a known probability distribu-

tion such as the z-statistic of Example 3.1. If we have a large sample size, then we can obtain an

approximation to the distribution of the statistic λ, which is beyond the level of this book.

EXERCISES 3

3.1. Let X₁, . . . , X_n be a random sample from an N(μ, σ²). Assume that σ² is unknown. We wish

to test, at level α, H₀ : μ = μ₀ vs H_a : μ < μ₀. Find an appropriate likelihood ratio test.

3.2.

Let X₁, . . . , X_n be a random sample from an N(μ, σ²). Assume that both μ and σ² are

unknown. We wish to test, at level α, H₀ : σ² = σ²

vs. H_a : σ² > σ²

Find an appropriate

likelihood ratio test.

3.3.

Let X₁, . . . , X_n be a random sample from an N(μ₁, σ²) and let Y₁, Y₂, . . . , Y_n be an indepen-

dent sample from an N(μ₂, σ²), where σ² is unknown. We wish to test, at level α, H₀ : μ₁ =

μ₂ vs. H_a : μ₁ = μ₂. Find an appropriate likelihood ratio test.

3.4.

Let X₁, . . . , X_n be a sample from a Poisson distribution with parameter λ. Show that a like-

lihood ratio test of H₀ : λ = λ₀ vs. H_a : λ = λ₀ rejects the null hypothesis if X ≥ m₁ or

X≤m₂.

4 Hypotheses for a Single Parameter

3.5.

Let X₁, . . . , X_n be a sample from an exponential distribution with parameter θ. Show that a

likelihood ratio test of H₀ : θ = θ₀ vs. H_a : θ = θ₀ rejects the null hypothesis if

∑_n

i=1 Xi ≥^m1

or∑n

i=1 Xi ≤^m2.

3.6.

A clinical oncology program developed a set of guidelines for their cancer patients to follow.

It is believed that the proportion of patients who are still living after 24 months is greater

for those who follow the guidelines. Of the 40 patients who followed the guidelines, 30 are

still living after 24 months, whereas of 32 patients who did not follow the guidelines, 21 are

living after 24 months. Find a likelihood ratio test at α = 0.01 to decide whether the program

is effective.

4 HYPOTHESES FOR A SINGLE PARAMETER

In this section, we first introduce the concept of p-value. After that, we study hypothesis testing

concerning a single parameter.

4.1 The p-Value

In hypothesis testing, the choice of the value of α is somewhat arbitrary. For the same data, if the test

is based on two different values of α, the conclusions could be different. Many statisticians prefer to

compute the so-called p-value, which is calculated based on the observed test statistic. For computing

the p-value, it is not necessary to specify a value of α. We can use the given data to obtain the

p-value.

Definition

4.1 Corresponding to an observed value of a test statistic, the p-value

(or attained

significance level) is the lowest level of significance at which the null hypothesis would have been

rejected.

For example, if we are testing a given hypothesis with α = 0.05 and we make a decision to reject H₀

and we proceeded to calculate the p-value equal to 0.03, this means that we could have used an α as

low as 0.03 and still maintain the same decision, rejecting H₀.

Based on the alternative hypothesis, one can use the following steps to compute the p-value.

STEPS TO FIND THE p-VALUE

1. Let TS be the test statistic.

2. Compute the value of TS using the sample X₁, . . . , X_n . Say it is a.

3. The p-value is given by

⎧

⎪P (T S < a H0 ),

if lower tail test

⎨

p–value =

P (T S > a H₀ ),

if upper tail test

⎪

^⎩P ( T S > a H₀ ), if two tail test.

Hypothesis Testing

Example 4.1

To test H₀ : μ = 0 vs. H_a : μ = 0, suppose that the test statistic Z results in a computed value of 1.58.

Then, the p-value = P ( Z > 1.58) = 2(0.0571) = 0.1142. That is, we must have a type I error of 0.1142 in

order to reject H₀. Also, if H_a : μ > 0, then the p-value would be P (Z > 1.58) = 0.0582. In this case we

must have an α of 0.0582 in order to reject H₀.

The p-value can be thought of as a measure of support for the null hypothesis: The lower its value,

the lower the support. Typically one decides that the support for H₀ is insufficient when the p-value

drops below a particular threshold, which is the significance level of the test.

REPORTING TEST RESULT AS p-VALUES

1. Choose the maximum value of α that you are willing to tolerate.

2. If the p-value of the test is less than the maximum value of α, reject H₀.

If the exact p-value cannot be found, one can give an interval in which the p-value can lie. For example,

if the test is significant at α = 0.05 but not significant for α = 0.025, report that 0.025 ≤ p-value ≤

0.05. So for α > 0.05, reject H₀, and for α < 0.025, do not reject H₀.

In another interpretation, 1−(p-value) is considered as an index of the strength of the evidence against

the null hypothesis provided by the data. It is clear that the value of this index lies in the interval

[0, 1]. If the p-value is 0.02, the value of index is 0.98, supporting the rejection of the null hypothesis.

Not only do p-values provide us with a yes or no answer, they provide a sense of the strength of the

evidence against the null hypothesis. The lower the p-value, the stronger the evidence. Thus, in any

test, reporting the p-value of the test is a good practice.

Because most of the outputs from statistical software used for hypothesis testing include the p-value,

the p-value approach to hypothesis testing is becoming more and more popular. In this approach,

the decision of the test is made in the following way. If the value of α is given, and if the p-value of the

test is less than the value of α, we will reject H₀. If the value of α is not given and the p-value associated

with the test is small (usually set at p-value < 0.05), there is evidence to reject the null hypothesis in

favor of the alternative. In other words, there is evidence that the value of the true parameter (such as

the population mean) is significantly different (greater, or lesser) than the hypothesized value. If the

p-value associated with the test is not small (p > 0.05), we conclude that there is not enough evidence

to reject the null hypothesis. In most of the examples in this chapter, we give both the rejection region

and p-value approaches.

Example 4.2

The management of a local health club claims that its members lose on the average 15 pounds or more

within the first 3 months after joining the club. To check this claim, a consumer agency took a random

sample of 45 members of this health club and found that they lost an average of 13.8 pounds within the

first 3 months of membership, with a sample standard deviation of 4.2 pounds.

4 Hypotheses for a Single Parameter

(a) Find the p-value for this test.

(b) Based on the p-value in (a), would you reject the null hypothesis at α = 0.01?

Solution

(a) Let μ be the true mean weight loss in pounds within the first 3 months of membership in this club.

Then we have to test the hypothesis

H₀ : μ = 15 versus H_a : μ < 15

Here n = 45, x = 13.8, and s = 4.2. Because n = 45 > 30, we can use normal approximation.

Hence, the test statistic is

13.8 − 15

√

= −1.9166

4.2/

and

p-value = P (Z < −1.9166) ≃ P (Z < −1.92) = 0.0274.

Thus, we can use an α as small as 0.0274 and still reject H₀.

(b) No. Because the p-value = 0.0274 is greater than α = 0.01, one cannot reject H₀.

In any hypothesis testing, after an experimenter determines the objective of an experiment and decides

on the type of data to be collected, we recommend the following step-by-step procedure for hypothesis

testing.

STEPS IN ANY HYPOTHESIS TESTING PROBLEM

1. State the alternative hypothesis, H_a (what is believed to be true).

2. State the null hypothesis, H₀ (what is doubted to be true).

3. Decide on a level of significance α.

4. Choose an appropriate TS and compute the observed test statistic.

5. Using the distribution of TS and α, determine the rejection region(s) (RR).

6. Conclusion: If the observed test statistic falls in the RR, reject H₀ and conclude that based on the

sample information, we are (1 − α)100% confident that H_a is true. Otherwise, conclude that there is

not sufficient evidence to reject H₀. In all the applied problems, interpret the meaning of your

decision.

State any assumptions you made in testing the given hypothesis.

8. Compute the p–value from the null distribution of the test statistic and interpret it.

4.2 Hypothesis Testing for a Single Parameter

Now we study the testing of a hypothesis concerning a single parameter, θ, based on a random sample

X₁,…,X_n. Let θ be the sample statistic. First, we deal with tests for the population mean μ for large

and small samples. Next, we study procedures for testing the population variance σ². We conclude

the section by studying a test procedure for the true proportion p.

Hypothesis Testing

To test the hypothesis H : μ = μ₀ concerning the true population mean μ, when we have a large

sample (n ≥ 30) we use the test statistic Z given by

X−μ₀

S/^√n

where S is the sample standard deviation and μ₀ is the claimed mean under H₀ (if the population

variance is known, we replace S with σ.

For a small random sample (n < 30), the test statistic is

X−μ₀

T =

S/^√n

where μ₀ is the claimed value of the true mean, and X and S are the sample mean and standard

deviation, respectively. Note that we are using the lowercase letters, such as z and t, to represent the

observed values of the test statistics Z and T , respectively.

In practice, with raw data, it is important to verify the assumptions. For example, in the small sample

case, it is important to check for normality by using normal plots. If this assumption is not satisfied,

the nonparametric methods described in Chapter 12 may be more appropriate. In addition, because

the sample statistic such as X and S will be greatly affected by the presence of outliers, drawing a box

plot to check for outliers is a basic practice we should incorporate in our analysis.

We now summarize the typical test of hypothesis for tests concerning population (true) mean.

In order to compute the observed test statistic, z in the large sample case and t in the small sample

case, calculate the values of z = (x − μ₀)/(s/^√n) and t = [(x − μ₀)/(s/^√n)], respectively.

SUMMARY OF HYPOTHESIS TESTS FOR μ

Large Sample (n ≥ 30)

Small Sample (n < 30)

To test

H₀ : μ = μ₀

versus

μ > μ₀, upper tail test

μ < μ₀, lower tail test

H_a :

H_a : μ < μ₀, lower tail test

μ = μ₀, two-tailed test

X −μ₀

Test statistic: Z =

Test statistic: T =

σ/^√n

S/^√n

Replace σ by S, if σ is unknown.

⎧

⎪z >zα,

upper tail RR

⎪t >tα,n−1,

upper tail RR

⎨

Rejection region :

z < −z_α, lower tail RR

RR :

t < −tα,n−1,

lower tail RR

⎪

^⎩ z > zα/2, two tail RR

^⎩ t > tα/2,n−1, two tail RR

4 Hypotheses for a Single Parameter

Assumption: n ≥ 30

Assumption: Random sample

comes from a normal

population

Decision: Reject H₀, if the observed test statistic falls in the RR and conclude that H_a is true with

(1 − α)100% confidence. Otherwise, keep H₀ so that there is not enough evidence to conclude that

H_a is true for the given α and more experiments may be needed.

Example 4.3

It is claimed that sports-car owners drive on the average 18,000 miles per year. A consumer firm believes that

the average mileage is probably lower. To check, the consumer firm obtained information from 40 randomly

selected sports-car owners that resulted in a sample mean of 17,463 miles with a sample standard deviation

of 1348 miles. What can we conclude about this claim? Use α = 0.01.

Solution

Let μ be the true population mean. We can formulate the hypotheses as H₀

: μ

= 18,000 versus

H_a : μ < 18,000.

The observed test statistic (for n ≥ 30) is

x−μ

17,463 − 18,000

√

σ/^√n

1348/

= −2.52.

Rejection region is {z < −z0.01} = {z < −2.33}.

Decision: Because z = −2.52 is less than −2.33, the null hypothesis is rejected at α = 0.01. There is

sufficient evidence to conclude that the mean mileage on sport cars is less than 18,000 miles per year.

Example 4.4

In a frequently traveled stretch of the I-75 highway, where the posted speed is 70 mph, it is thought that

people travel on the average of at least 75 mph. To check this claim, the following radar measurements of

the speeds (in mph) is obtained for 10 vehicles traveling on this stretch of the interstate highway.

Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this

stretch of highway is at most 75 mph? Test the appropriate hypothesis using α = 0.01. Draw a box plot and

normal plot for this data, and comment.

Solution

We need to test

H₀ : μ = 75 vs. H_a : μ > 75

Hypothesis Testing

■ FIGURE 1

Box plot of speed data.

For this sample, the sample mean is x = 74.8 mph and the standard deviation is σ = 5.9963 mph. Hence,

the observed test statistic is

x−μ₀

74.8 − 75

√

σ/^√n

5.9963/

= −0.1054

From the t-table, t0.019 = 2.821. Hence, the rejection region is {t > 2.821}.

Because, t = −0.10547 does not fall in the rejection region, we do not reject the null hypothesis at α = 0.01.

Note that we assumed that the vehicles were randomly selected and that collected data follow the normal

distribution, because of the small sample size,< 30, we use the t-test.

Figures 1 and 2 are the box plot and the normal plot of the data, respectively.

ML Estimates

Mean : 74.8

Std Dev: 5.68858

Data

■ FIGURE 2

Normal probability plot for speed.

The box plot suggests that there are no outliers present. However, the normal plot indicates that the normality

assumption for this data set is not justified. Hence, it may be more appropriate to do a nonparametric test.

4 Hypotheses for a Single Parameter

Example 4.5

In attempting to control the strength of the wastes discharged into a nearby river, an industrial firm has

taken a number of restorative measures. The firm believes that they have lowered the oxygen consuming

power of their wastes from a previous mean of 450 manganate in parts per million. To test this belief,

readings are taken on n = 20 successive days. A sample mean of 312.5 and the sample standard deviation

106.23 are obtained. Assume that these 20 values can be treated as a random sample from a normal

population. Test the appropriate hypothesis. Use α = 0.05.

Solution

Here we need to test the following hypothesis:

H₀ : μ = 450 vs. H_a : μ < 450

Give = 20, x = 312.5, and s = 106.23. The observed test statistic is

312.5 − 450

√

= −5.79.

106.23/

The rejection region for α = 0.05 and with 19 degrees of freedom is the set of t-values such that

{t < −t0.05,19} = {t < −1.729}.

Decision: Because t = −5.79 is less than −1.729, reject H₀. There is sufficient evidence to confirm the

firm’s belief.

For large random samples, the following procedure is used to perform tests of hypotheses about the

population proportion, p.

Example 4.6

A machine is considered to be unsatisfactory if it produces more than 8% defectives. It is suspected that the

machine is unsatisfactory. A random sample of 120 items produced by the machine contains 14 defectives.

Does the sample evidence support the claim that the machine is unsatisfactory? Use α = 0.01.

Solution

Let Y be the number of observed defectives. This follows a binomial distribution. However, because np₀ and

nq₀ are greater than 5, we can use a normal approximation to the binomial to test the hypothesis. So we

need to test H₀ : p = 0.08 versus H_a : p > 0.08. Let the point estimate of p be p = (Y /n) = 0.117, the

sample proportion. Then the value of the TS is

p−p₀

0.117 − 0.08

√

= 0.13

^√p₀q0=

(0.08)(0.92)

120

For α = 0.01, z0.01 = 2.33. Hence, the rejection region is {z > 2.33}.

Hypothesis Testing

Decision: Because 0.137 is not greater than 2.33, we do not reject H₀. We conclude that the evidence does

not support the claim that the machine is unsatisfactory.

SUMMARY OF LARGE SAMPLE HYPOTHESIS TEST FOR p

To test

H₀ : p = p₀

versus

p > p₀, upper tail test

H_a : p < p₀, lower tail test.

Test statistic:

p−p₀

^√p₀q₀

Z =

where σ_ˆ

p =

where q₀ = 1 − p₀.

σ_ˆ

⎧

⎨

z >z_α,

upper tail RR

Rejection region :

z < −z_α, lower tail RR

^⎩ z > zα/2, two tail RR,

where z is the observed test statistic.

Assumption: n is large. A good rule of thumb is to use the normal approximation to the binomial

distribution only when np₀ and n(1 − p₀) are both greater than 5.

Decision: Reject H₀, if the observed test statistic falls in the RR and conclude that H_a is true with

(1 − α)100% confidence. Otherwise, do not reject H₀ because there is not enough evidence to

conclude that H_a is true for given α and more data are needed.

Note that this an approximate test, and the test can be improved by increasing the sample size.

Now we give the procedure for testing the population variance when the samples come from a normal

population.

SUMMARY OF HYPOTHESIS TEST FOR THE VARIANCE σ²

To test

H₀ : σ² = σ²

versus

σ² > σ²

upper tail test

H_a : σ² < σ²

lower tail test

σ² = σ²

two-tailed test.

4 Hypotheses for a Single Parameter

Test statistic:

(n − 1)S

χ² =

σ²

where S² is the sample variance.

Observed value of test statistic:

(n − 1)s²

σ²

⎧

⎪

χ² > χ²

upper tail RR

α,n−1 ,

⎨

Rejection region :

χ² < χ²

1−α,n−1 ,

lower tail RR

⎪

⎩_χ2 > χ²

two tail RR

α/2,n−1 orχ2<χ1−α/2,n−1 ,

where χ²

α,n−1 issuchthattheareaunderthechi-squaredistributionwith(n−1)degreesoffreedomtoits

right is equal to α.

Assumption: Sample comes from a normal population.

Decision: Reject H₀, if the observed test statistic falls in the RR and conclude that H_a is true with

(1 − α)100% confidence. Otherwise, do not reject H₀ because there is not enough evidence to conclude

that H_a is true for given α and more data are needed.

Because the chi-square distribution is not symmetric, the “equal tails” used for the two-sided alter-

native may not be the best procedure. However, in real-world problems we seldom use a two tail test

for the population variance.

Example 4.7

A physician claims that the variance in cholesterol levels of adult men in a certain laboratory is at least 100.

A random sample of 25 adult males from this laboratory produced a sample standard deviation of

cholesterol levels as 12. Test the physician’s claim at 5% level of significance.

Solution

To test

H₀ : σ² = 100 versus H_a : σ² < 100

for α = 0.05, and 24 degrees of freedom, the rejection region is

RR = {χ² < χ²

1−α,n−1}={χ2<13.484}.

The observed value of the TS is

(n − 1)S

χ² =

= (24)(144)

= 34.56.

σ²

100

Hypothesis Testing

Because the value of the test statistic does not fall in the rejection region, we cannot reject H₀ at 5% level

of significance. Here, we assumed that the 25 cholesterol measurements follow the normal distribution.

EXERCISES 4

4.1.

A random sample of 50 measurements resulted in a sample mean of 62 with a sample

standard deviation 8. It is claimed that the true population mean is at least 64.

(a) Is there sufficient evidence to refute the claim at the 2% level of significance?

(b) What is the p-value?

4.2.

A machine in a certain factory must be repaired if it produces more than 12% defectives

among the large lot of items it produces in a week. A random sample of 175 items from

a week’s production contains 45 defectives, and it is decided that the machine must be

repaired.

(a) Does the sample evidence support this decision? Use α = 0.02.

(b) Compute the p-value.

4.3.

A random sample of 78 observations produced the following sums:

∑

x_i = 22.8,

(x_i − x)² = 2.05.

i=1

(a) Test the null hypothesis that μ = 0.45 against the alternative hypothesis that μ < 0.45

using α = 0.01. Also find the p-value.

(b) Test the null hypothesis that μ = 0.45 against the alternative hypothesis that μ = 0.45

using α = 0.01. Also find the p-value.

4.4.

Consider the test H₀ : μ = 35 vs. H_a : μ > 35 for a population that is normally distributed.

(a) A random sample of 18 observations taken from this population produced a sample

mean of 40 and a sample standard deviation of 5. Using α = 0.025, would you reject

the null hypothesis?

(b) Another random sample of 18 observations produced a sample mean of 36.8 and

a sample standard deviation of 6.9. Using α

= 0.025, would you reject the null

hypothesis?

4.5.

According to the information obtained from a large university, professors there earned an

average annual salary of $55,648 in 1998. A recent random sample of 15 professors from

this university showed that they earn an average annual salary of $58,800 with a sample

standard deviation of $8300. Assume that the annual salaries of all the professors in this

university are normally distributed.

4 Hypotheses for a Single Parameter

(a) Suppose the probability of making a type I error is chosen to be zero. Without perform-

ing all the steps of test of hypothesis, would you accept or reject the null hypothesis

that the current mean annual salary of all professors at this university is $55,648?

(b) Using the 1% significance level, can you conclude that the current mean annual salary

of professors at this university is more than $55,648?

4.6.

A check-cashing service company found that approximately 7% of all checks submitted to the

service were without sufficient funds. After instituting a random check verification system to

reduce its losses, the service company found that only 70 were rejected in a random sample of

1125 that were cashed. Is there sufficient evidence that the check verification system reduced

the proportion of bad checks at α = 0.01? What is the p-value associated with the test? What

would you conclude at the α = 0.05 level?

A manufacturer of washers provides a particular model in one of three colors, white, black,

or ivory. Of the first 1500 washers sold, it is noticed that 550 were of ivory color. Would

you conclude that customers have a preference for the ivory color? Justify your answer. Use

α = 0.01.

4.8.

A test of the breaking strength of six ropes manufactured by a company showed a mean

breaking strength of 6425 lb and a standard deviation of 120 lb. However, the manufacturer

claimed a mean breaking strength of 7500 lb.

(a) Can we support the manufacturer’s claim at a level of significance of 0.10?

(b) Compute the p-value. What assumptions did you make for this problem?

4.9.

A sample of 10 observations taken from a normally distributed population produced the

following data:

(a) Test the hypothesis that H₀ : μ = 44 vs. H_a : μ = 44 using α = 0.10. Draw a box plot

and normal plot for this data, and comment.

(b) Find a 90% confidence interval for the population mean μ.

4.10.

The principal of a charter school in Tampa believes that the IQs of its students are above

the national average of 100. From the past experience, IQ is normally distributed with a

standard deviation of 10. A random sample of 20 students is selected from this school and

their IQs are observed. The following are the observed values.

110

133

119

113

107

110

113

100

124

116

113

110

106

115

113

(a) Test for the normality of the data

(b) Do the IQs of students at the school run above the national average at α = 0.01?

4.11.

In order to find out whether children with chronic diarrhea have the same average hemo-

globin level (Hb) that is normally seen in healthy children in the same area, a random

Hypothesis Testing

sample of 10 children with chronic diarrhea are selected and their Hb levels (g/dL) are

obtained as follows.

12.3

11.4

14.2

15.3

14.8

13.8

11.1

15.1

15.8

13.2

Do the data provide sufficient evidence to indicate that the mean Hb level for children with

chronic diarrhea is less than that of the normal value of 14.6 g/dL? Test the appropriate

hypothesis using α = 0.01. Draw a box plot and normal plot for this data, and comment.

4.12.

A company that manufactures precision special-alloy steel shafts claims that the variance in

the diameters of shafts is no more than 0.0003. A random sample of 10 shafts gave a sample

variance of 0.0002 At the 5% level of significance, test whether the company’s claim can

be substantiated.

4.13.

It was claimed that the average annual expenditures per consumer unit had continued to

rise, as measured by the Consumer Price Index annual averages (Bureau of Labor Statistics

report, 1995). To test this claim, 100 consumer units were randomly selected in 1995 and

found to have an average annual expenditure of $32,277 with a standard deviation of $1200.

Assuming that the average annual expenditure of all consumer units was $30,692 in 1994,

test at the 5% significance level whether the annual expenditure per consumer unit had

really increased from 1994 to 1995.

4.14.

It is claimed that two of three Americans say that the chances of world peace are seriously

threatened by the nuclear capabilities of other countries. If in a random sample of 400

Americans, it is found that only 252 hold this view, do you think the claim is correct? Use

α = 0.05. State any assumptions you make in solving this problem.

4.15.

According to the Bureau of Labor Statistics (1996), the average price of a gallon of gasoline

in all U.S. cities in the United States in January 1996 was $1.129. A later random sample in

24 cities found the mean price to be $1.24 with a standard deviation of 0.01. Test at α = 0.05

to see whether the average price of a gallon of gas in the cities had recently changed.

4.16.

A manufacturer claims that the mean life of batteries manufactured by his company is at

least 44 months. A random sample of 40 of these batteries was tested, resulting in a sample

mean life of 41 months with a sample standard deviation of 16 months. Test at α = 0.01

whether the manufacturer’s claim is correct.

5 TESTING OF HYPOTHESES FOR TWO SAMPLES

In this section we study the hypothesis testing procedures for comparing the means and variances

of two populations. For example, suppose that we want to determine whether a particular drug is

effective for a certain illness. The sample subjects will be randomly selected from a large pool of

people with that particular illness and will be assigned randomly to the two groups. To one group

we will administer a placebo; to the other we will administer the drug of interest. After a period of

time, we measure a physical characteristic, say the blood pressure, of each subject that is an indicator

of the severity of the illness. The question is whether the drug can be considered effective on the

population from which our samples have been selected. We will consider the cases of independent

and dependent samples.

5 Testing of Hypotheses for Two Samples

5.1 Independent Samples

Two random samples are drawn independently of each other from two populations, and the sample

information is obtained. We are interested in testing a hypothesis about the difference of the true

means. Let X₁₁, . . . , X1n be a random sample from population 1 with mean μ₁ and variance σ²

1,and

X₂₁,…,X2n be a random sample from population 2 with mean μ₂ and variance σ²

2.LetXi,

i = 1,2,

represent the respective sample means and S²,i = 1,2, represent the sample variances. In this case,

we shall consider following three cases in testing hypotheses about μ₁ and μ₂: (i) when σ²

1 andσ2

are known, (ii) when σ²

1 andσ2areunknownandn1 ≥30andn2 ≥30,and(iii)whenσ1andσ2are

unknown and n₁ < 30 and n₂ < 30. In case (iii) we have the following two possibilities, (a) σ²

σ²

1 =

and (b) σ²

σ²

1 =

In the large sample case, knowledge of population variances σ²

1 andσ2doesnotmakemuchdiffer-

ence. If the population variances are unknown, we could replace them with sample variances as an

approximation. If both n₁ ≥ 30 and n₂ ≥ 30 (large sample case), we can use normal approximation.

The following box sums up a large sample hypothesis testing procedure for the difference of means

for the large sample case.

SUMMARY OF HYPOTHESIS TEST FOR μ₁ − μ₂ FOR LARGE SAMPLES (n₁& n₂

≥ 30)

To test

H₀ : μ₁ − μ₂ = D₀

versus

⎧

⎨μ1 − μ2 > D0, upper tailed test

H_a :

μ₁ − μ₂ < D₀, lower tailed test

⎩

μ₁ − μ₂ = D₀, two-tailed test.

The test statistic is

X₁ − X₂ − D₀

Z =

√

σ²

+ σ²

n₂

n₁

Replace σ_i by S_i , if σ_i ,i = 1,2 are not known.

Rejection region is

⎧

⎪z >zα,

upper tail RR

⎨

RR :

z < −z_α, lower tail RR

⎪

^⎩ z > zα/2, two tail RR,

where z is the observed test statistic given by

x₁ − x₂ − D₀

z =

√

σ²

n₁

n₂

Hypothesis Testing

Assumption: The samples are independent and n₁ and n₂ ≥ 30.

Decision: Reject H₀, if test statistic falls in the RR and conclude that H_a is true with (1 − a)100% confidence.

Otherwise, do not reject H₀ because there is not enough evidence to conclude that H_a is true for given α

and more experiments are needed.

Example 5.1

In a salary equity study of faculty at a certain university, sample salaries of 50 male assistant professors and

50 female assistant professors yielded the following basic statistics.

Sample mean

Sample standard

salary

deviation

Male assistant professor

$36,400

360

Female assistant professor

$34,200

220

Test the hypothesis that the mean salary of male assistant professors is more than the mean salary of female

assistant professors at this university. Use α = 0.05.

Solution

Let μ₁ be the true mean salary for male assistant professors and μ₂ be the true mean salary for female

assistant professors at this university. To test

H₀ : μ₁ − μ₂ = 0 vs. H_a : μ₁ − μ₂ > 0

the test statistic is

x₁ − x₂ − D₀

36,400 − 34,200

√

= 36.872.

s²

(360)²

+ (220)2

n₁

n₂

The rejection region for α = 0.05 is {z > 1.645}.

Because z = 36.872 > 1.645, we reject the null hypothesis at α = 0.05. We conclude that the salary of

male assistant professors at this university is higher than that of female assistant professors for α = 0.05.

Note that even though σ²

30 and n₂ ≥ 30, we could replace σ²

1 andσ2areunknown,becausen1 ≥

1 and

σ²

2 bytherespectivesamplevariances.Weareassumingthatthesalariesofmaleandfemalearesampled

independently of each other.

Giveext is the procedure we follow to compare the true means from two independent normal

populations when n₁ and n₂ are small (n₁ < 30 or n₂ < 30) and we can assume homogeneity in the

population variances, that is, σ²

σ²

In this case, we pool the sample variances to obtain a point

1 =

estimate of the common variance.

5 Testing of Hypotheses for Two Samples

COMPARISON OF TWO POPULATION MEANS, SMALL SAMPLE CASE (POOLED t-TEST)

To test

H₀ : μ₁ − μ₂ = D₀

versus

μ₁ − μ₂ > D₀, upper tailed test

H_a : μ₁ − μ₂ < D₀, lower tailed test

μ₁ − μ₂ = D₀, two-tailed test.

The test statistic is

X₁ − X₂ − D₀

T =

√

S_p

n₁ +

n₂

Here the pooled sample variance is

(n₁ − 1)S²

(n₂ − 1)S²

1 +

S²

p =

n₁ + n₂ − 2

Then the rejection region is

⎧

⎪

t >t_α,

upper tailed test

⎨

RR :

t < −t_α, lower tail test

⎪

^⎩ t > tα/2, two-tailed test

where t is the observed test statistic and t_α is based on (n₁ + n₂ − 2) degrees of freedom, and such that

P(T > t_α) = α.

Decision: Reject H₀, if test statistic falls in the RR and conclude that H_a is true with (1 − α)100% confidence.

Otherwise, do not reject H₀ because there is not enough evidence to conclude that H_a is true for given α.

Assumptions: The samples are independent and come from normal populations with means μ₁ and μ₂,

and with the (unknown) but equal variances, that is, σ²

σ²

1 =

Now we shall consider the case where σ²

1 andσ2areunknownandcannotbeassumedtobeequal.

In such a case the following test is often used. For the hypothesis

⎧

⎨μ1 − μ2 > D0

H₀ : μ₁ − μ₂ = D₀ vs. H₀ :

μ₁ − μ₂ < D₀

⎩

μ₁ − μ₂ = D₀

Hypothesis Testing

define the test statistic T_ν as

X₁ − X₂ − D₀

T_ν =

√

S²

n₁ +

n₂

where T_ν has a t-distribution with ν degrees of freedom, and

[

(s²

1/n1)+(s2/n2)]2

ν=

(s²

1/n1)²

2/n2)²

n₁ − 1

n₂ − 1

The value of ν will not necessarily be an integer. In that case, we will round it down to the nearest

integer. This method of hypothesis testing with unequal variances is called the Smith-Satterthwaite

procedure. Even though this procedure is not widely used, some simulation studies have shown that

the Smith-Satterthwaite procedure perform well when variances are unequal and it gives results that

are more or less equivalent to those obtained with the pooled t-test when the variances are equal.

However, when the sample sizes are approximately equal, the pooled t-test may still be used. Note

that in addressing the question which of the cases (iii)(a) or (iii)(b) to use in a given problem, we

suggest that if the point estimates S²

1 ofσ1,andS2ofσ2areapproximatelythesame,thenitislogical

to assume homogeneity, σ²

σ²

1 =

2 anduse(iii)(a),whereasifS1andS2aresignificantlydifferentwe

use (iii)(b). More appropriately, we have tests that can be used to test hypotheses concerning σ²

1 =

σ22

or σ²

σ²

1 =

2,knownastheF-test,whichwediscussattheendofthissubsection.

Example 5.2

The intelligence quotients (IQs) of 17 students from one area of a city showed a sample mean of 106 with a

sample standard deviation of 10, whereas the IQs of 14 students from another area chosen independently

showed a sample mean of 109 with a sample standard deviation of Is there a significant difference

between the IQs of the two groups at α = 0.02? Assume that the population variances are equal.

Solution

We test

H₀ : μ₁ − μ₂ = 0 vs. H_a : μ₁ − μ₂ = 0

Here n₁ = 17, x₁ = 106, and s₁ = 10. Also, n₂ = 14, x₂ = 109, and s₂ =

We have

(n₁ − 1)s²

(n₂ − 1)s²

1 +

s²

p =

n₁ + n₂ − 2

= (16)(10)2+(13)(7)

= 7138.

5 Testing of Hypotheses for Two Samples

The test statistic is

X₁ − X₂ − D₀

106 − 109

T =

= −0.94644.

)√1

^√ 1

(√

s_p

7138

n₁

n₂

17⁺11

For α = 0.02, t0.01,29 = 2.462. Hence, the rejection region is t < − 2.462 or t > 2.462.

Because the observed value of the test statistic, T = −0.94644, does not fall in the rejection region, there is

not enough evidence to conclude that the mean IQs are different for the two groups. Here we assume that

the two samples are independent and taken from normal populations.

Example 5.3

Assume that two populations are normally distributed with unknown and unequal variances. Two inde-

pendent samples were drawn from these populations and the data obtained resulted in the following basic

statistics:

n₁ = 18

x₁ = 20.17

s₁ = 4.3

n₂ = 12

x₂ = 19.23

s₂ = 3.8

Test at the 5% significance level whether the two population means are different.

Solution

We need to test the hypothesis

H₀ : μ₁ − μ₂ = 0 versus H_a : μ₁ − μ₂ = 0.

Here n₁ = 18, x₁ = 20.17, and s₁ = 4.3. Also, n₂ = 12, x₂ = 19.23, and s₂ = 3.8.

The degrees of freedom for the t-distribution are given by

(

)₂

s²

1/n1 +

2/n2

ν=

(s²

1/n1)²

2/n2)²

n₁ − 1

n₂ − 1

(

(4.3)²

+ (3.8)2)²

(

= 25.685.

(4.3)2)2

(3.8)2)2

Hence, we have ν = 25 degrees of freedom. For α = 0.05, t0.025,25 = 2.060. Thus, the rejection region is

t < −2.060 or t > 2.060.

The test statistic is given by

x₁ − x₂ − D₀

T_ν =

√

S²

n₁ +

n₂

Hypothesis Testing

20.17 − 19.23

√

= 0.62939.

(4.3)²

+ (3.8)2

Because the observed value of the test statistic, T_ν = 0.62939, does not fall in the rejection region, we do not

reject the null hypothesis. At α = 0.05 there is not enough evidence to conclude that the population means

are different. Note that the assumptions we made are that the samples are independent and came from two

normal populations. No homogeneity assumption is made.

Example 5.4

Infrequent or suspended menstruation can be a symptom of serious metabolic disorders in women. In a

study to compare the effect of jogging and running on the number of menses, two independent subgroups

were chosen from a large group of women, who were similar in physical activity (aside from running),

heights, occupations, distribution of ages, and type of birth control methods being used. The first group

consisted of a random sample of 26 women joggers who jogged “slow and easy” 5 to 30 miles per week,

and the second group consisted of a random sample of 26 women runners who ran more than 30 miles per

week and combined long, slow distance with speed work. The following summary statistics were obtained

(E. Dale, D. H. Gerlach, and A. L. Wilhite, “Menstrual Dysfunction in Distance Runners,” Obstet. Gynecol. 54,

47-53, 1979).

Joggers x₁ = 10.1, s₁ = 2.1

Runners x₂ = 9.1, s₂ = 2.4

Using α = 0.05, (a) test for differences in meaumber of menses for each group assuming equality of

population variances, and (b) test for differences in meaumber of menses for each group assuming

inequality of population variances.

Solution

Here we need to test

H₀ : μ₁ − μ₂ = 0 versus H_a : μ₁ − μ₂ = 0.

Here, n₁ = 26, x₁ = 10.1, and s₁ = 2.1. Also, n₂ = 26, x₂ = 9.1, and s₂ = 2.4.

(a) Under the assumption σ²

σ²

1 =

2,wehave

(n₁ − 1)s²

(n₂ − 1)s²

1 +

s²

p =

n₁ + n₂ − 2

= (25)(2.1)2+(25)(2.4)

= 5.085.

The test statistic is

X₁ − X₂ − D₀

T =

√

s_p

n₁ +

n₂

5 Testing of Hypotheses for Two Samples

10.1 − 9.1

)√

= 1.5989.

(√

5.085

26 +

For α = 0.05, t0.025,50 ≈ 1.96. Hence, the rejection region is t < −1.96 and t > 1.96. Because

T = 1.589 does not fall in the rejection region, we do not reject the null hypothesis. At α = 0.05

there is not enough evidence to conclude that the population meaumber of menses for joggers

and runners are different.

(b)

Under the assumption σ²

σ²

1 =

2,wehave

(

)₂

s²

1/n1 +

2/n2

ν=

(s²

1/n1)²

+ (s2/n2)2

n₁−1

n₂−1

(

(2.1)²

+ (2.4)2)²

(

= 49.134.

(2.1)2)2

(2.4)2)2

Hence, we have ν = 49 degrees of freedom. Because this value is large, the rejection region is still

approximately t < − 1.96 and t > 1.96. Hence, the conclusion is the same as that of part (a). In

both parts (a) and (b), we assumed that the samples are independent and came from two normal

populations.

Now we present the summary of the test procedure for testing the difference of two proportions,

inherent in two binomial populations. Here, again we assume that the binomial distribution is

approximated by the normal distribution and thus it is an approximate test.

SUMMARY OF HYPOTHESIS TEST FOR (p₁ − p₂) FOR LARGE SAMPLES (n_ip_i > 5 AND n_iq_i > 5,

FOR i = 1, 2)

To test

H₀ : p₁ − p₂ = D₀

versus

p₁ − p₂ < D₀,

upper tailed test

H_a : p₁ − p₂ > D₀, lower tailed test

p₁ − p₂ = D₀,

two-tailed test

at significance level α, the test statistic is

p₁ − p₂ − D₀

Z =

√

p₁ q₁

+ p2q2

n₁

n₂

where z is the observed value of Z .

Hypothesis Testing

The rejection region is

⎧

⎨

z >z_α,

upper tailed RR

RR :

z < −z_α, lower tailed RR

^⎩ z > zα/2, two-tailed RR

Assumption: The samples are independent and

n_ip_i > 5 and n_iq_i > 5, for i = 1,2.

Decision: Reject H₀ if the test statistic falls in the RR and conclude that H_a is true with (1 − a)100%

confidence. Otherwise, do not reject H₀, because there is not enough evidence to conclude that H_a is true

for given α and more experiments are needed.

Example 5.5

Because of the impact of the global economy on a high-wage country such as the United States, it is claimed

that the domestic content in manufacturing industries fell between 1977 and 199 A survey of 36 randomly

picked U.S. companies gave the proportion of domestic content total manufacturing in 1977 as 0.37 and in

1997 as 0.36. At the 1% level of significance, test the claim that the domestic content really fell during the

period 1977-199

Solution

Let p₁ be the domestic content in 1977 and p₂ be the domestic content in 199

Give₁ = n₂ = 36, p₁ = 0.37 and p₂ = 0.36. We need to test

H₀ : p₁ − p₂ = 0 vs. H_a : p₁ − p₂ > 0.

The test statistic is

p₁ − p₂

√

p₁ q₂

+ ˆp1q2

n₁

n₂

0.37 − 0.36

√

= 0.08813.

(0.37)(0.63)

+ (0.36)(0.64)

For α = 0.01, z0.01 = 2.325. Hence, the rejection region is z > 2.325.

Because the observed value of the test statistic does not fall in the rejection region, at α = 0.01, there is not

enough evidence to conclude that the domestic content in manufacturing industries fell between 1977 and

199

Let X₁, . . . , X_n and Y₁, . . . , Y_n be two independent random samples from two normal populations

with sample variances s²

1 ands2,respectively.Theproblemhereisoftestingfortheequalityofthe

5 Testing of Hypotheses for Two Samples

variances, H₀ : σ²

σ²

1 =

2.WehavealreadyseeninChapter4that

S²

1/σ1

F =

S²

2/σ2

follows the F -distribution with ν₁ = n₁ − 1 numerator and ν₂ = n₂ − 1 degrees of freedom. Under

the assumption H₀ : σ²

σ²

1 =

2,wehave

S²

F =

S²

which has an F -distribution with (ν₁, ν₂) degrees of freedom. We summarize the test procedure for

the equality of variances.

TESTING FOR THE EQUALITY OF VARIANCES

To test

H₀ : σ²

σ²

1 =

versus

σ²

lower tailed test

1 >

H_a : σ²

σ²

upper tailed test

1 <

σ²

two-tailed test

1 =

at significance level α, the test statistic is

S²

F =

S2.

The rejection region is

⎧

⎨

> F_α(ν₁,ν₂),

upper tailed RR

RR :

< F1−α(ν₁,ν₂),

lower tailed RR

⎩

> Fα/2(ν₁,ν₂) or f < F1−α/2(ν₁,ν₂), two-tailed RR

where f is the observed test statistic given by f =s1

s² .

Decision: Reject H₀ if the test statistic falls in the RR and conclude that H_a is true with (1 − α)100%

confidence. Otherwise, keep H₀, because there is not enough evidence to conclude that H_a is true for

a given α and more experiments are needed.

Assumption:

(i) The two random samples are independent.

(ii) Both populations are normal.

Recall from Section 4.2 that in order to find F1−α(ν₁, ν₂), we use the identity F1−α(ν₁, ν₂

)

(1/F_α(ν₂, ν₁)).

Hypothesis Testing

Example 5.6

Consider two independent random samples X₁, . . . , X_n from an N(μ₁, σ²

distribution and Y₁, . . . , Y_n

from an N(μ₂, σ²

σ²

2)distribution.TestH0 :

1 =

2 versusHa :

1 =

2 forthefollowingbasicstatistics:

n₁ = 25,x₁ = 410,s²

95, and n₂ = 16, x₂ = 390, s²

300

1 =

2 =

Use α = 0.20.

Solution

Test H₀ : σ²

σ²

This is a two-tailed test.

1 =

2 versusHa :

1 =

Here the degrees of freedom are ν₁ = 24 and ν₂ = 15. The test statistic is

s²

F =

= 0.31

s²

300

From the F -table, F0.10(24, 15) = 1.90 and F0.90(24, 15) =(1/F0.10(15, 24)) = 0.50.

Hence, the rejection region is F > 1.90 or F < 0.56. Because the observed value of the test statistic, 0.317,

is less than 0.56, we reject the null hypothesis. There is evidence that the population variances are not equal.

5.2 Dependent Samples

We now consider the case where the two random samples are not independent. When two samples

are dependent (the samples are dependent if one sample is related to the other), then each data

point in one sample can be coupled in some natural, nonrandom fashion with each data point in

the second sample. This situation occurs when each individual data point within a sample is paired

(matched) to an individual data point in the second sample. The pairing may be the result of the

individual observations in the two samples: (1) representing before and after a program (such as

weight before and after following a certain diet program), (2) sharing the same characteristic, (3)

being matched by location, (4) being matched by time, (5) control and experimental, and so forth.

Let (X1i, X2i), for i = 1, 2, . . . , n, be a random sample. X1i, and X2j (i = j) are independent. To test

the significance of the difference between two population means when the samples are dependent,

we first calculate for each pair of scores the difference, D_i = X1i − X2i, i = 1, 2, . . . , n, between the

two scores. Let μ_D = E(D_i). Because pairs of observations form a random sample D₁, . . . , D_n are

independent and identically distributed random variables, if d₁, . . . , d_n are the observed values of

D₁,…,D_n, then we define

∑

∑ )2

d²

−

d_i

∑

i=1

d =

d_i and s²

(d_i − d)² =i=1

d =

n−1

i=1

Now the testing for these n observed differences will proceed as in the case of a single sample. If the

number of differences is large (n ≥ 30), large sample inferential methods for one sample case can

be used for the paired differences. We now summarize the hypothesis testing procedure for small

samples.

5 Testing of Hypotheses for Two Samples

SUMMARY OF TESTING FOR MATCHED PAIRS EXPERIMENT

To test

μ_D > d₀,

upper tail test

H₀ : μ_D = d₀ versus H_a :

μ_D < d₀, lower tail test

μ_D = d₀, two-tailed test

the test statistic: T =^SD/√

n (thisapproximatelyfollowsaStudentt-distributionwith(n−1)degreesof

freedom).

The rejection region is

⎧

⎨

t >tα,n−1,

upper tail RR

t < −tα,n−1, lower tail RR

⎩

t > tα/2,n−1, two-tailed RR

where t is the observed test statistic.

Assumptions: The differences are approximately normally distributed.

Decision: Reject H₀ if the test statistic falls in the RR and conclude that H_a is true with (1 − α)100%

confidence. Otherwise, do not reject H₀, because there is not enough evidence to conclude that H_a is true

for a given α and more data are needed.

Example 5.7

A new diet and exercise program has been advertised as remarkable way to reduce blood glucose levels in

diabetic patients. Ten randomly selected diabetic patients are put on the program, and the results after 1

month are given by the following table:

Before

268

225

252

192

307

228

246

298

231

185

After

106

186

223

110

203

101

211

176

194

203

Do the data provide sufficient evidence to support the claim that the new program reduces blood glucose

level in diabetic patients? Use α = 0.05.

Solution

We need to test the hypothesis

H₀ : μ_D = 0

vs. H_a : μ_D < 0.

First we calculate the difference of each pair given in the following table.

Before

268

225

252

192

307

228

246

298

231

185

After

106

186

223

110

203

101

211

176

194

203

Difference

−162

−39

−29

−82

−104

−127

−35

−122

−37

(after−before)

Hypothesis Testing

From the table, the mean of the differences is d = −71.9 and the standard deviation s_d = 56.2.

The test statistic is

d−d₀

−71.9

√

= −4.0457 ≈ −4.05.

s_d/^√n

56.2/

From the t-table, t0,05,9 = 1.833. Because the observed value of t = − 4.05 < −t0,05,9 = −1.833, we reject

the null hypothesis and conclude that the sample evidence suggests that the new diet and exercise program

is effective.

We can also obtain a (1 − α)100% confidence interval for μ_D using the formula

(

)

S_d

D−tα/2

^√n,D+tα/2√n

where tα/2 is obtained from the t-table with (n − 1) degrees of freedom. The interpretation of the

confidence interval is identical to the earlier interpretation.

Example 5.8

For the data in Example 5.7, obtain a 95% confidence interval for μ_D and interpret its meaning.

Solution

We have already calculated d = − 71.9 and s_d = 56.2. From the t-table, t0.025,9 = 2.262. Hence, a 95%

confidence interval for μ_D is (−112.1, −31.7). That is, P (−112.1 ≤ μ_D ≤ −31.7) = 0.95. Note that

μ_D = μ₁ − μ₂, and from the confidence limits we can conclude with 95% confidence that μ₂ is always

greater than μ₁, that is, μ₂ > μ₁.

It is interesting to compare the matched pairs test with the corresponding two independent sample

test. One of the natural questions is, why must we take paired differences and then calculate the mean

and standard deviation for the differences—why can’t we just take the difference of means of each

sample, as we did for independent samples? The answer lies in the fact that σ²

D neednotbeequalto

σ²

(X₁−X₂).Assumethat

E(X_ji) = μ_j , Var(X_ji) = σ²

for j = 1, 2,

and

Cov(X1i, X2i) = ρσ₁σ₂

where ρ denotes the assumed common correlation coefficient of the pair (X1i, X2i) for i = 1, 2, . . . , n.

Because the values of D_i, i = 1, 2, . . . , n, are independent and identically distributed,

μ_D = E(D_i) = E(X1i) − E(X2i) = μ₁ − μ₂

5 Testing of Hypotheses for Two Samples

and

σ²

D =^Var(Di)=Var(X1i)+Var(X2i)−2Cov(X1i,X2i)

=σ²

1 +σ2−2ρσ1σ2.

From these calculations,

E(D) = μ_D = μ₁ − μ₂

and

σ²

D =Var(D)=

n(σ1+σ2−2ρσ1σ2).

Now, if the samples were independent with n₁ = n₂ = n,

E(X₁ − X₂) = μ₁ − μ₂

and

σ²

(X₁−X₂) =

n(σ1+σ2).

Hence, if ρ > 0, then σ²

σ²

D <

(X₁−X₂).Asaresult,wecanseethatthematchedpairstestreducesany

variability introduced by differences in physical factors in comparison to the independent samples

test when ρ > 0. It is also important to observe that normality assumption for the difference does not

imply that the individual samples themselves are normal. Also, in a matched pairs experiment, there

is no need to assume the equality of variances for the two populations. Matching also reduces degrees

of freedom, because in case of two independent samples, the degrees of freedom is (n₁ + n₂ − 2),

whereas for the case of two dependent samples it is only (n − 1).

EXERCISES 5

5.1. Two sets of elementary school children were taught to read by different methods, 50 by each

method. At the conclusion of the instructional period, a reading test gave results y₁ = 74,

y₂ = 71, s₁ = 9, and s₂ = 10. What is the attained significance level if you wish to see if

there is evidence of a real difference between the two population means? What would you

conclude if you desired an α-value of 0.05?

5.2. The following information was obtained from two independent samples selected from two

normally distributed populations with unknown but equal variances.

Sample 1

Sample 2

Test at the 2% significance level whether μ₁ is lower than μ₂.

Hypothesis Testing

5.3.

In the academic year 1997-1998, two random samples of 25 male professors and 23 female

professors from a large university produced a mean salary for male professors of $58,550

with a standard deviation of $4000 and an average for female professors of $53,700 with a

standard deviation of $3200. At the 5% significance level, can you conclude that the mean

salary of all male professors for 1997-1998 was higher than that of all female professors?

Assume that the salaries of male and female professors are both normally distributed with

equal standard deviations.

5.4.

It is believed that the effects of smoking differ depending on race. The following table gives

the results of a statistical study for this question.

Number in the Average number of Number of lung

study

cigarettes per day

cancer cases

Whites

400

African

280

Americans

Do the data indicate that African Americans are more likely to develop lung cancer due to

smoking? Use α = 0.05.

5.5.

A supermarket chain is considering two sources A and B for the purchase of 50-pound bags

of onions. The following table gives the results of a study.

Source A

Source B

Number of bags weighed

100

Mean weight

105.9

100.5

Sample variance

0.21

0.19

Test at α = 0.05 whether there is a difference in the mean weights.

5.6.

In order to compare the mean Hemoglobin (Hb) levels of well-nourished and undernour-

ished groups of children, random samples from each of these groups yielded the following

summary.

Number of Sample Sample standard

children

mean

deviation

Well nourished

11.2

0.9

Undernourished

9.8

1.2

Test at α = 0.01 whether the mean Hb levels of well-nourished children were higher than

those of undernourished children.

An aquaculture farm takes water from a stream and returns it after it has circulated through

the fish tanks. In order to find out how much organic matter is left in the waste water after

the circulation, some samples of the water are taken at the intake and other samples are

taken at the downstream outlet and tested for biochemical oxygen demand (BOD). BOD is

a common environmental measure of the quantity of oxygen consumed by microorganisms

during the decomposition of organic matter. If BOD increases, it can be said that the waste

5 Testing of Hypotheses for Two Samples

matter contains more organic matter than the stream can handle. The following table gives

data for this problem.

Upstream

9.0

6.8

6.5

8.0

8.6

6.8

8.9

Downstream

10.2

9.9

11.1

9.6

8.7

9.6

9.7

10.4

8.1

Assuming that the samples come from a normal distribution,

(a) Test that the mean BOD for the downstream samples is less than for the samples

upstream at α = 0.05. Assume that the variances are equal.

(b) Test for the equality of the variances at α = 0.05.

tion is not reasonable. Assuming that the difference of each pair is approximately

normal, test that the mean BOD for the downstream samples is less than for the

upstream samples at α = 0.05.

5.8.

Suppose we want to know the effect on driving of a drug for cold and allergy, in a study

in which the same people were tested twice, once after 1 hour of taking the drug and once

wheo drug is taken. Suppose we obtain the following data, which represent the number

of cones (placed in a certain pattern) knocked down by each of the nine individuals before

taking the drug and after an hour of taking the drug.

No drug

After drug

Assuming that the difference of each pair is coming from an approximately normal distribu-

tion, test if there is any difference in the individuals’ driving ability under the two conditions.

Use α = 0.05.

5.9.

Suppose that we want to evaluate the role of intravenous pulse cyclophosphamide (IVCP)

infusion in the management of nephrotic syndrome in children with steroid resistance.

Children were given a monthly infusion of IVCP in a dose of 500 to 750 mg/m². The

following data (source: S. Gulati and V. Kher, “Intravenous pulse cyclophosphamide—A new

regime for steroid resistant focal segmental glomerulosclerosis,” Indian Pediatr. 37, 2000)

represent levels of serum albumin (g/dL) before and after IVCP in 14 randomly selected

children with nephrotic syndrome.

Pre-IVCP

2.0

2.5

1.5

2.0

2.3

2.1

2.3

1.0

2.2

1.8

2.0

1.5

3.4

Post-IVCP

3.5

4.3

4.0

3.8

2.4

3.5

1.7

3.8

3.6

3.8

4.1

3.4

Assuming that the samples come from a normal distribution:

(a) Test whether the mean Pre-IVCP is less than the mean Post-IVCP at α = 0.05. Assume

that the variances are equal.

(b) Test for the equality of the variances at α = 0.05.

this assumption is not reasonable. Assuming that the difference of each pair is

approximately normal, test that the mean Pre-IVCP is less than the Post-IVCP at

α=0.05.

Hypothesis Testing

5.10.

Show that S²

D isanunbiasedestimatorofσD.

5.11.

Test H₀ : σ²¹ = σ²

σ²

2 versusHa :

1 =

2 forthefollowingdata.

n₁ = 10,x₁ = 71,s²

and n₂ = 25, x₂ = 131, s²

96.

1 =

2 =

Use α = 0.10.

5.12.

The IQs of 17 students from one area of a city showed a mean of 106 with a standard

deviation of 10, whereas the IQs of 14 students from another area showed a mean of 109 with

a standard deviation of Test for equality of variances between the IQs of the two groups at

α = 0.02.

5.13.

The following data give SAT mean scores for math by state for 1989 and 1999 for 20 randomly

selected states (source: The World Almanac and Book of Facts 2000).

State

1989

1999

Arizona

523

525

Connecticut

498

509

Alabama

539

555

Indiana

487

498

Kansas

561

576

Oregon

509

525

Nebraska

560

571

New York

496

502

Virginia

507

499

Washington

515

526

Illinois

539

585

North Carolina

469

493

Georgia

475

482

Nevada

512

517

Ohio

520

568

New Hampshire

510

518

Assuming that the samples come from a normal distribution:

(a) Test that the mean SAT score for math in 1999 is greater than that in 1989 at α =

0.05.

Assume the variances are equal.

(b) Test for the equality of the variances at α = 0.05.

6 CHI-SQUARE TESTS FOR COUNT DATA

In this section, we study several commonly used tests for count data. These are basically large sample

tests based on a χ²-approximation. Suppose that we have outcomes of a multinomial experiment that

consists of K mutually exclusive and exhaustive events A₁, . . . , A_k . Let P (A_i) = p_i, i = 1, 2, . . . , k.

Then∑n

i=1 pi

= 1. Let the experiment be repeated n times, and let X_i(i = 1, 2, . . . , k) represent

the number of times the event A_i occurs. Then (X₁, . . . , X_k ) have a multinomial distribution with

parameters n, p₁, . . . , p_k .

6 Chi-Square Tests for Count Data

Let

∑

(X_i − np_i)²

Q² =

i=1 (Xi−npi)

It can be shown that for large n, the random variable Q² is approximately χ²-distributed with (k − 1)

degrees of freedom. It is usual to demand np_i ≥ 5 (i = 1, 2, . . . , k) for the approximation to be valid,

although the approximation generally works well if for only a few values of i (about 20%), np_i ≥ 1

and the rest (about 80%) satisfy the condition np_i ≥ 5. This statistic was proposed by Karl Pearson

in 1900.

It should be noted that the χ²-tests that we discuss in this section are approximate tests valid for

large samples. Often X_i is called the observed frequency and is denoted by O_i (this is the observed

value in class i), and np_i is called the expected frequency and is denoted by E_i (this is the theoretical

distribution frequency under the null hypothesis). Thus, with these notations, we get

∑

(O_i − E_i)²

Q² =

E_i

i=1

Example 6.1

A plant geneticist grows 200 progeny from a cross that is hypothesized to result in a 3 : 1 phenotypic

ratio of red-flowered to white-flowered plants. Suppose the cross produces 170 red- to 30 white-flowered

plants. Calculate the value of Q² for this experiment.

Solution

There are two categories of data totaling n = 200. Hence, k = 2. Let i = 1 represent red-flowered and i = 2

represent white-flowered plants. Then O₁ = 170, and O₂ = 30.

Here, H₀ : The flower color population ratio is not different from 3 : 1, and the alternate is H_a : The flower

color population sampled has a flower color ratio that is not 3 red : 1 white.

Under the null hypothesis, the expected frequencies are E₁ = (200)(3/4) = 150, and E₂ = (200)(1/4) = 50.

Hence,

∑

(O_i − E_i)²

Q² =

E_i

i=1

= (170−150)

+ (30−50)2

= 10.66

150

The type of calculation in Example 6.1 gives a measure of how close our observed frequencies come

to the expected frequencies and is referred to as a measure of goodness of fit. Smaller values of Q²

values indicate better fit.

One of the most frequent uses of the χ²-test is in comparison of observed frequencies. Unless the

sample size is exactly 100, percentages cannot be used. These are approximate tests. Let the random

Hypothesis Testing

variables (X₁, . . . , X_k ) have a multinomial distribution with parameters n, p₁, . . . , p_k . Let n be known.

We will now present some important tests based on the chi-square statistic.

6.1 Testing the Parameters of Multinomial Distribution: Goodness-of-Fit

Test

Let an experiment have k mutually exclusive and exhaustive outcomes A₁, A₂, . . . , A_k . We would

like to test the null hypothesis that all the p_i = p(A_i), i = 1, 2, . . . , k are equal to knowumbers

pi0,i = 1,2,…,k. We now summarize the test procedure.

TESTING THE PARAMETERS OF A MULTINOMIAL DISTRIBUTION (SUMMARY)

To test

H₀ : p₁ = p₁₀,…,p_k = pk0

versus

H_a : At least one of the probabilities is different from the hypothesized value.

The test is always a one-sided upper tail test.

Let O_i be the observed frequency, E_i = npi0 be the expected frequency (frequency under the null

hypothesis), and k be the number of classes. The test statistic is

∑

(O_i − E_i )²

Q² =

E_i

i=1

The test statistic Q² has an approximate chi-square distribution with k − 1 degrees of freedom.

The rejection region is

Q² ≥ χ²

α,k −1.

Assumption: E_i ≥ 5: Exact methods are available. Computing the power of this test is difficult.

This test is known as the goodness-of-fit test. It implies that if the observed data are very close to the

expected data, we have a very good fit and we accept the null hypothesis. That is, for small Q² values,

we accept H₀.

Example 6.2

A TV station broadcasts a series of programs on the ill effects of smoking marijuana. After the series, the

station wants to know whether people have changed their opinion about legalizing marijuana. Given in the

following tables are the data based on a survey of 500 randomly chosen people:

6 Chi-Square Tests for Count Data

Before the Series Was Shown

For legalization

Decriminalization

Existing law

No opinion

(fine or imprisonment)

18%

65%

10%

After the Series Was Shown

For legalization

Decriminalization

Existing law

No opinion

(fine or imprisonment)

39%

36%

16%

Here, n = 4, and we wish to test

H₀ : p₁ = 0.07; p₂ = 0.18; p₃ = 0.65; p₄ = 0.1

versus

H_a : At least one of the probabilities is different from the hypothesized value.

The test is always an upper tail test. Test this hypothesis using α = 0.01.

Solution

We have

E₁ = (500)(0.07) = 35;E₂ = 90;E₃ = 325;E₄ = 50.

The observed frequencies are

O₁ = (500)(0.39) = 195;O₂ = 45;O₃ = 180;O₄ = 80.

The test statistic is

∑

(O_i − E_i)²

Q² =

E_i

i=1

[

]

(195 − 35)

+ (45−90)2

+ (180−325)2

+ (80−50)2

325

= 836.62.

From the χ²-table, χ²

11.3449. Because the test statistic Q² = 836.62 > 11.3449, we reject H₀

0.01,3 =

α = 0.01. Hence, the data suggest that people have changed their opinion after the series on the ill effects

of smoking marijuana was shown.

Hypothesis Testing

Example 6.3

A die is rolled 60 times and the face values are recorded. The results are as follows.

Up face

Frequency

Is the die balanced? Test using α = 0.05.

Solution

If the die is balanced, we must have

p₁ = p₂ = … = p₆ =

where p_i = P (face value on the die is i), i = 1, 2, . . . , 6. This has the discrete uniform distribution.

Hence,

H₀ : p₁ = p₂ = … = p₆ =

versus

H_a : At least one of the probabilities is different from the hypothesized value of

1/6

E₁ = n₁p₁ = (60)(1/6) = 10,…,E₆ = 10.

We summarize the calculations in the following table:

Face value

Frequency, O_i

Expected value, E_i

The test statistic value is given by

∑

(O_i − E_i)²

Q² =

= 6.

E_i

i=1

From the chi-square table with 5 d.f., χ²

0.05,5 =11.070.

Because the value of the test statistic does not fall in the rejection region, we do not reject H₀. Therefore, we

conclude that the die is balanced.

6.2 Contingency Table: Test for Independence

One of the uses of the χ²-statistic is in contingency (dependence) testing where n randomly selected

items are classified according to two different criteria, such as when data are classified on the basis of

two factors (row factor and column factor) where the row factor has r levels and the column factor

has c levels. The obtained data are displayed as shown in the following table, where n_ij represents

6 Chi-Square Tests for Count Data

the number of data values under row i and column j. Our interest here is to test for independence of

two methods of classification of observed events. For example, we might classify a sample of students

by sex and by their grade on a statistics course in order to test the hypothesis that the grades are

dependent on sex. More generally the problem is to investigate a dependency (or contingency) between

two classification criteria.

Levels of column factor

… c

Row total

Row

n₁₁

n₁₂

n1c

n₁

levels

n₂₁

n2c

n₂

nr1

nr2

an_rc

n_r

Column total n.1

n.2

n_.c

∑ r∑ r∑

where N = n_.j = n_i. =

∑ n_ij is the grand total.

j=1

i=1

i=1 j=1

We wish to test the hypothesis that the two factors are independent. We summarize the procedure

in the following table for testing that the factors represented by the rows are independent with that

represented by the columns.

TESTING FOR THE INDEPENDENCE OF TWO FACTORS

To test

H₀ : The factors are independent

versus

H_a : The factors are dependent

the test statistic is,

∑∑ (Oij − Eij )2

Q² =

E_ij

i=1 j=1

where

O_ij = n_ij

and

n_in_j

E_ij

N .

Then under the null hypothesis the test statistic Q² has an approximate chi-square distribution with

(r − 1)(c − 1) degrees of freedom.

Hence, the rejection region is Q² > χ²

α,(r −1)(c−1) .

Assumption: E_ij ≥ 5.

Hypothesis Testing

Example 6.4

The following table gives a classification according to religious affiliation and marital status

for

500

randomly selected individuals.

Religious affiliation

None

Total

Marital status

Single

116

With spouse

172

384

Total

211

500

For α = 0.01, test the null hypothesis that marital status and religious affiliation are independent.

Solution

We need to test the hypothesis

H₀ : Marital status and religious affiliation are independent

versus

H_a : Marital status and religious affiliation are dependent.

Here, c = 5, and r = 2. For α = 0.01, and for (c − 1)(r − 1) = 4 degrees of freedom, we have

χ²

0.01,4 =13.2767

Hence, the rejection region is Q² > 13.276

We have E_ij =ninj

Thus,

N .

(116)(211)

(116)(80)

E₁₁ =

= 48.952; E₁₂ =

= 18.5;

500

(116)(56)

(116)(98)

E₁₃ =

= 12.992, E₁₄ =

= 22.736;

500

(116)(55)

(384)(211)

E₁₅ =

= 12.76, E₂₁ =

= 162.05;

500

(384)(80)

(384)(56)

E₂₂ =

= 61.44; E₂₃ =

= 43.008;

500

and

(384)(98)

(384)(55)

E₂₄ =

= 75.264; E₂₅ =

= 42.24.

500

The value of the test statistic is

∑

∑_(O

ij −Eij )²

Q² =

E_ij

i=1 j=1

6 Chi-Square Tests for Count Data

[

]

= (39−48.952)

+ (19−18.5)2

+ (12−12.992)2

+ (28−22.736)2

48.952

18.5

12.992

22.736

+ (18−12.76)

+ (172−162.05)2

+ (61−61.44)2

+ (44−43.08)2

12.76

162.05

61.44

43.08

+ (70−75.264)

+ (37−42.24)2

75.264

42.24

= 1351.

Because the observed value of Q² does not fall in the rejection region, we do not reject the null hypoth-

esis at α = 0.01. Therefore, based on the observed data, the marital status and religious affiliation are

independent.

6.3 Testing to Identify the Probability Distribution: Goodness-of-Fit

Chi-Square Test

Another application of the chi-square statistic is using it for goodness-of-fit tests in a different context.

In hypothesis testing problems we often assume that the form of the population distribution is known.

For example, in a χ²-test for variance, we assume that the population is normal. The goodness-of-fit

tests examine the validity of such an assumption if we have a large enough sample. We now describe

the goodness-of-fit test procedure for such applications.

GOODNESS-OF-FIT TEST PROCEDURES FOR PROBABILITY DISTRIBUTIONS

Let X₁, . . . ,X_n be a sample from a population with cdf F (x ), which may depend on the set of unknown

parameters θ. We wish to test H₀ : F (x ) = F₀(x ), where F₀(x ) is completely specified.

1. Divide the range of values of the random variables X₁ into K nonoverlapping intervals I₁, I₂, . . . , I_K .

Let O_j be the number of sample values that fall in the interval I_j (j = 1, 2, . . . , K ).

2. Assuming the distribution of X to be F₀(x ), find P(X ∈ I_j ). Let P(X ∈ I_j ) = π_i . Let e_j = nπ_j be the

expected frequency.

3. Compute the test statistic Q² given by

∑

(O_i − E_i )²

Q² =

E_i

i=1

The test statistic Q² has an approximate χ²-distribution with (K − 1) degrees of freedom.

4. Reject the H₀ if Q² ≥ χ²

α, (K −1) .

5. Assumptions: e_j ≥ 5, j = 1, 2, . . . , K .

If the null hypothesis does not specify F₀(x) completely, that is, if F₀(x) contains some unknown

parameters θ₁, θ₂, . . . , θ_p, we estimate these parameters by the method of maximum likelihood. Using

Hypothesis Testing

these estimated values we specify F₀(x) completely. Denote the estimated F₀(x) by F₀(x). Let

{

}

π_i = P X ∈ I_i F₀(x)

and

Êi= nπ_i.

The test statistic is

∑

(O_i − ê_i)²

Q²

ê_i

i=1

The statistic Q² has an approximate chi-square distribution with (K − 1 − p) degrees of freedom. We

reject H₀ if Q² ≥ χ²

a,(K−1−p).

We now illustrate the method of goodness-of-fit with an example.

Example 6.5

The grades of students in a class of

200 are given in the following table. Test the hypothesis

that the grades are normally distributed with a mean of 75 and a standard deviation of

Use

α = 0.05.

Range

0-59

60-69

70-79

80-89

90-100

Number of students

Solution

We have O₁ = 12, O₂ = 36, O₃ = 90, O₄ = 44, O₅ = 18.

We now compute π_i(i = 1, 2, . . . , 5), using the continuity correction factor,

π₁ = P{X ≤ 59.5 H₀} = P{z ≤59.5−75} = 0.0262,

π₂ = 0.2189,π₃ = 0.4722,π₄ = 0.2476,π₅ = 0.0351,

and

E₁ = 5.24,E₂ = 43.78,E₃ = 94.44,E₄ = 49.52,E₅ = 02.

The test statistic results in

∑

(O_i − e_i)²

Q² =

e_i

i=1

(18 − 02)²

= (12−5.74)

+ (36−43.78)2

+ (90−94.44)2

+ (44−49.52)2

5.74

43.78

94.44

49.52

= 26.22.

6 Chi-Square Tests for Count Data

Q² has a chi-square distribution with (5 − 1) = 4 degrees of freedom. The critical value is χ²

11.

0.05,4 =

Hence, the rejection region is Q² > 11. Because the observed value of Q² = 26.22 > 11, we reject H₀

at α = 0.05. Thus, we conclude that the population is not normal.

EXERCISES 6

6.1.

The following table gives the opinion on collective bargaining by a random sample of 200

employees of a school system, belonging to a teachers’ union.

Opinion on Collective Bargaining by Teachers’ Union

For

Against

Undecided

Total

Staff

Faculty

100

Administration

Column totals

200

Test the hypotheses

H₀ : Opinion on collective bargaining is independent of employee classification

versus

H_a : Opinion on collective bargaining is dependent on employee classification

using α = 0.05.

6.2.

A random sample was taken of 300 undergraduate students from a university. The students

in the sample were classified according to their gender and according to the choice of their

major. The result is given in the following table.

College

Gender Arts and sciences Engineering Business Other Total

Male

205

Female

Total

120

300

Test the hypothesis that the choice of the major by undergraduate students in this university

is independent of their gender. Use α = 0.01.

6.3.

The speeds of vehicles (in mph) passing through a section of Highway 75 are recorded for a

random sample of 150 vehicles and are given below. Test the hypothesis that the speeds are

normally distributed with a mean of 70 and a standard deviation of 4. Use a = 0.01.

Range

40-55

56-65

66-75

76-85

> 85

Number

6.4.

Based on the sample data of 50 days contained in the following table, test the hypothesis that

the daily mean temperatures in the city are normally distributed with mean 77 and variance

6. Use α = 0.05.

Hypothesis Testing

Temperature

46-55

56-65

66-75

76-85

86-95

Number of days

6.5.

A presidential candidate advertises on TV by comparing his positions on some important

issues with those of his opponent. After a series of advertisements, a pollster wants to know

whether people have changed their opinion about the candidate. The following are the data

based on a survey of 950 randomly chosen people:

Before the Advertisement Was Shown

Support the

Oppose the

Need to know more

Undecided

candidate

about the candidate

40%

20%

35%

After the Advertisement Was Shown

Support the

Oppose the

Need to know more

Undecided

candidate

about the candidate

45%

25%

28%

Let p_i, i = 1, 2, 3, 4, represent the respective true proportions.

Test

H₀ : p₁ = 0.35;p₂ = 0.20;p₃ = 0.15;p₄ = 0.3

versus

H_a : At least one of the probabilities is different from the hypothesized value.

Test this hypothesis using α = 0.05.

6.6.

A survey of footwear preferences of a random sample of 100 undergraduate students (50

females and 50 males) from a large university resulted in the following data.

Boots Leather Sneakers Sandals Other

shoes

Female

Male

(a) Let p_i, i

= 1, 2, 3, 4, 5, represent the respective true proportions of students with a

particular footwear preference, and let

H₀ : p₁ = 0.20;p₂ = 0.20;p₃ = 0.30;p₄ = 0.20;p₅ = 0.10

versus

H_a : At least one of the probabilities is different from the hypothesized value.

Test this hypothesis using α = 0.05.

(b) Test the hypothesis that the choice of footwear by undergraduate students in this

university is independent of their gender, using α = 0.05.

8 Computer Examples

7 CHAPTER SUMMARY

In this chapter, we have learned various aspects of hypothesis testing. First, we dealt with hypothesis

testing for one sample where we used test procedures for testing hypotheses about true mean, true

variance, and true proportion. Then we discussed the comparison of two populations through their

true means, true variances, and true proportions. We also introduced the Neyman-Pearson lemma

and discussed likelihood ratio tests and chi-square tests for categorical data.

We now list some of the key definitions in this chapter.

■ Statistical hypotheses

■ Tests of hypotheses, tests of significance, or rules of decision

■ Simple hypothesis

■ Composite hypothesis

■ Type I error

■ Type II error

■ The level of significance

■ The p-value or attained significance level

■ The Smith-Satterthwaite procedure

■ Power of the test

■ Most powerful test

■ Likelihood ratio

In this chapter, we also learned the following important concepts and procedures:

■ General method for hypothesis testing

■ Steps to calculate β

■ Steps to find the p-value

■ Steps in any hypothesis testing problem

■ Summary of hypothesis tests for μ

■ Summary of large sample hypothesis tests for p

■ Summary of hypothesis tests for the variance σ²

■ Summary of hypothesis tests for μ₁ − μ₂ for large samples (n₁ & n₂ ≥ 30)

■ Summary of hypothesis tests for p₁ − p₂ for large samples

■ Testing for the equality of variances

■ Summary of testing for a matched pairs experiment

■ Procedure for applying the Neyman-Pearson lemma

■ Procedure for the likelihood ratio test

■ Testing the parameters of a multinomial distribution (summary)

■ Testing the independence of two factors

■ Goodness-of-fit test procedures for probability distributions

8 COMPUTER EXAMPLES

In the following examples, if the value of α is not specified, we will always take it as 0.05.

Hypothesis Testing

8.1 Minitab Examples

Example 8.1

(t-Test): Consider the data

Using Minitab, test H₀ : μ = 75 vs. H₁ : μ > 75.

Solution

Enter the data in C1. Then

Stat > Basic Statistics > 1-sample t. . .

> In Variables: enter C1 > choose Test Mean > enter 75 >

in Alternative: choose greater than and click OK

We obtain the following output.

T-Test of the Mean

Test of mu = 75.00 vs mu > 75.00

Variable N

Mean

StDev SE Mean T

74.80

6.00

1.90

−0.11

0.54

Example 8.2

For the following data:

Sample 1:

Sample 2:

Test H₀ : μ₁ = μ₂ vs. H₁ : μ₁ < μ₂. Use α = 0.02.

Solution

Enter sample 1 data in C1 and sample 2 data in C2. Then

Stat > Basic Statistics > 2-sample t. . .

> Choose Samples in different columns > in Alternative:

choose less than > in Confidence level: enter 98 > click Assumed equal variances and click OK

We obtain the following output.

Two Sample T-test and Confidence Interval

Two sample T for C1 vs C2

8 Computer Examples

Mean StDev SE Mean

123

2.74

0.76

12.18

2.40

0.72

98% CI for mu C1 − mu C2: (2.38, 71)

T-Test mu C1 = mu C2 (vs <): T = 4.75 P = 1.0 DF = 22

Both use Pooled StDev = 2.59

If we did not select Assumed equal variances, we will obtain the following output.

Two Sample T-Test and Confidence Interval

Two sample T for C1 vs C2

Mean StDev SE Mean

123

2.74

0.76

12.18

2.40

0.72

98% CI for mu C1 – mu C2: (2.40, 69)

T-Test mu C1 = mu C2 (vs <): T = 4.81 P = 1.0 DF = 21

Example 8.3

For the following data:

6.8

5.6

8.5

8.4

9.3

9.4

9.9

9.6

9.0

9.4

13.7

16.6

9.1

10.1

10.6

11.1

8.9

11.7

12.8

11.5

12.0

10.6

11.1

6.4

12.3

11.4

9.9

14.3

11.5

11.8

13.3

12.8

13.7

13.9

12.9

14.2

14.0

15.5

16.9

18.0

21.8

18.4

34.3

Test H₀ : μ = 12 versus H₁ : μ = 12. Use α = 0.05.

Solution

Enter the data in C1. Then

Stat > Basic Statistics > 1-sample z. . . > in Variables: Type C1 > choose Test Mean and enter

12 >

choose not equal in Alternative, and Type 4.7 for sigma > Click OK

We obtain the following output.

Z-Test

Test of mu = 12.000 vs mu not = 12.000

The assumed sigma = 4.70

Variable N

Mean

StDev SE Mean

12.124

4.700

0.671

0.19

0.85

Here the test statistic is 0.19 and the p-value is 0.85, which is larger than 0.05. Hence, we cannot reject the

null hypothesis.

Hypothesis Testing

Example 8.4

(Contingency Table): Consider the following data with five levels and two factors. Test for dependence

of the factors.

Factors

Levels

172

Solution

In C1 enter the data in column 1 (39 and 172), and continue to C5. Then

Stat > Tables > Chi-Square-Test. . . > in Columns containing the table: Type C1 C2 C3 C4 C5 >

click OK

We will obtain the following output.

Chi-Square Test

Expected counts are printed below observed counts

Total

116

48.95

18.56

12.99

22.74

12.76

172

384

162.05

61.44

43.01

75.26

42.24

Total

211

500

Chi-Sq = 2.023 + 0.010 + 0.076 + 1.219 + 2.152 +

0.611 + 0.003 + 0.023 + 0.368 + 0.650 = 135

DF = 4, p-value = 0.129

Example 8.5

(Paired t-Test): Consider the data of Example 5. Using Minitab, perform a paired t-test.

Solution

Enter sample 1 in column C1 and sample 2 in column C2. Then:

Stat > Basic Statistics > Paired t. . . > in First Sample: Type C2, and in the Second sample: Type

C1 > click options > and click less than (if α is other than 0.05, enter appropriate percentage in

Confidence level: and enter appropriate number if it is not zero in Test mean:) > click OK > OK

8 Computer Examples

We obtain the following output.

Paired T-test and Confidence Interval

Paired T for C2 − C1

Mean

StDev

SE Mean

171.3

14.9

243.2

40.1

12.7

Difference

−71.9

56.2

95% CI for mean difference: (−112.1, −31.7)

T-Test of mean difference = 0 (vs < 0): T-Value = −4.05

p-value = 0.001

because the p-value 0.001 < 0.05 = α.

8.2 SPSS Examples

Example 8.6

Consider the data

Using SPSS, test H₀ : μ = 75 vs. H₁ : μ > 75.

Solution

Use the following procedure:

1. Enter the data in column 1.

2. Click Analyze > Compare Means > One-sample t Test. . . , Move var00001

to Test Variable(s),

and change Test Value: 0 to 75. Click OK

We obtain the following output.

One-Sample Statistics

Std. Error

Mean

Std. Deviation

Mean

VAR00001

74.8000

5.99630

1.89620

One-Sample Test

Test Value = 75

95% Confidence

Interval of the

Sig.

Mean

Difference

(2-tailed)

Difference

Lower

Upper

VAR00001

−.105

.918

−.2000

−4.4895

4.0895

For the one sample t-test H₀ : μ = 75 vs. H₁ : μ > 75, the t-statistic is −0.105 with 9 degrees of freedom.

The p-value is 0.46 > 0.02. Hence, we will not reject the null hypothesis.

404

Hypothesis Testing

If we want the computer to calculate the p-value in the previous example, use the following procedure.

1. Enter the test statistic (−0.105) in the data editor using ‘teststat’.

2. Click Transform > compute. . .

3. Type ‘p-value’ in the box called Tarobtain value. In the box called Functions: scroll and click on

CDF.T(q,df) and move to Numeric Expressions.

4. The CDF(q,df) will appear as CDF(?,?) in the Numeric Expressions box. Replace teststat for q and 9

for df (the degree of freedom in this example is 9). Click OK

We obtain the p-value as 0.46.

Example 8.7

For the following data

Sample 1:

Sample 2:

Test H₀ : μ₁ = μ₂ vs. H₁ : μ₁ < μ₂. Use α = 0.02.

Solution

In column 1, under the title ‘‘group’’ enter 1s to identify the sample 1 data and 2s to identify sample 2 data.

In column C2, under the title ‘‘data’’ enter the data corresponding to samples 1 and 2. Then:

Analyze > Compare Means > Independent Samples t–test. . . > bring Data to Test Variable(s): and

group to Grouping Variable:, click Define Groups. . . , and enter 1 for sample 1, 2 for sample 2 >

click continue > click Options

Enter 98 in Confidence interval: > click continue > OK

We obtain the following output.

Group Statistics

GROUP

Mean

Std. Deviation

Std. Error Mean

DATA

1.00

12308

2.74329

.76085

2.00

12.1818

2.40076

.72386

Independent Samples Test

Levene’s Test

t-test for

for Equality

Equality

of Variances

of Means

Sig.

Mean

Std. Error

98% Confidence

(2-tailed)

Difference

Interval of the

Difference

Lower

Upper

DATA

Equal variances

.975

.334

4.753

.000

5.0490

1.06237

2.38419

71372

assumed

4.808

21.963

.000

5.0490

1.05017

2.41443

68347

Equal

variances

not

assumed

8 Computer Examples

Looking at the statistical significance values, which are greater than

0.05, we do not reject the null

hypothesis.

Example 8.8

(Paired t-Test) For the data of Example 5.7, use SPSS to test whether the data provide sufficient

evidence for the claim that the new program reduces blood glucose level in diabetic patients. Use α = 0.05.

Solution

Enter after data in column C1 and before data in column C2. Then:

Analyze > Compare Means > Paired-Sample T-Test > bring after and before to Paired Variables:

so that it will look after-before > click OK

We obtain the following output.

Paired Samples Statistics

Mean

Std. Deviation

Std. Error Mean

Pair 1

AFTER

171.3000

411228

14.89821

BEFORE

243.2000

40.12979

12.69015

Paired Samples Correlations

Correlation

Sig.

Pair 1

AFTER & BEFORE

.179

.621

Paired Samples Test

Paired

Sig.

Differences

(2-tailed)

Std.

Std. Error

Mean

95% Confidence

Deviation

Mean

Interval of the

Difference

Upper

Lower

Pair 1

AFTER —

−71.9000

175791

−112.0712

−31.7288

−4.049

56.15544

.003

BEFORE

Because the significance level for the test is 0.003, which is less than α = 0.05, we reject the null hypothesis.

8.3 SAS Examples

To conduct a hypothesis test using SAS, we could use proc ttest, or proc means with option of

computing the t-value and corresponding probability. However, to use this, we need a hypothesis

of the form H₀ : μ = 0. For testing nonzero values, H₀ : μ = μ₀, we must create a new variable

Hypothesis Testing

by subtracting μ₀ from each observation, and then use the test procedure for this new variable. The

following example illustrates this concept.

Example 8.9

(t-Test): The following radar measurements of speed (in miles per hour) are obtained for 10 vehicles

traveling on a stretch of interstate highway.

Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this

stretch of highway is at least 75 mph? Test using α = 0.01. Use an SAS procedure to do the analysis.

Solution

In the SAS editor, type in the following commands.

data speed;

title ’Test on highway speed’;

input X @@;

Y=X-75;

datalines;

66 74 79 80 69 77 78 65 79

;

PROC TTEST data=speed;

run;

We obtain the following output.

Test on highway speed

The TTEST Procedure

Statistics

Lower CL

Upper

Lower

Upper

Variable N

Mean

Std

Dev

Err

70.511

74.8

79.089

4.1245

5.9963

10.947

1.8962

−4.489

−0.2

4.0895

4.1245

5.9963

10.947

T-Tests

Variable

t Value

Pr > t

39.45

<.0001

−0.11

0.9183

To test H₀ : μ = 75, we need to look at the Y-values. The corresponding t-value is −0.11, and because this

is a one-sided test, we need to divide 0.9183 by 2 to obtain the p-value as p = 0.45915. Because the p-value

is larger than 0.01 = α, we cannot reject the null hypothesis.

8 Computer Examples

One of the easier ways to conduct large sample hypothesis testing using SAS procedures is through

the computation of the p-value. The following example illustrates the procedure.

Example 8.10

(z-Test): It is claimed that the average miles driven per year for sports cars is at least 18,000 miles. To check

the claim, a consumer firm tests 40 of these cars randomly and obtains a mean of 17,463 miles with standard

deviation of 1348 miles. What can it conclude if α = 0.01?

Solution

Here we will find the p-value and compare that with α to test the hypothesis. We use the following SAS

procedure:

Data ex888;

z=(17463–18000)/(1348/(SQRT(40)));

pval=probnorm(z);

run;

proc print data=ex888;

title ’Test of mean, large sample’;

run;

We obtain the following output.

Test of mean,

large sample

Obs

pval

2.51950

.005876079

Because the p-value of 0.005876079 is less than α = 0.01, we reject the null hypothesis. There is sufficient

evidence to conclude that the mean miles driven per year for sport cars is less than 18,000.

Note that in the previous example, the value of z was negative. If the value of z is positive, use

pval=probnorm(-z);, also, if it is a two-sided hypothesis, we need to multiply by 2, so use

pval=probnorm(z)*2; to obtain the p-value.

Example 8.11

(Paired t-Test): For the data of Example 5.7, use SAS to test whether the data provide sufficient evidence

for the claim that the new program reduces blood glucose level in diabetic patients. Use α = 0.05.

Solution

We can use the following commands.

Hypothesis Testing

data dietexr;

input before after;

diff = after – before;

datalines;

268 106

225 186

252 223

192 110

307 203

228 101

246 211

298 176

231 194

185 203

;

run;

proc means data=dietexr t prt;

var diff;

title ’Test of mean, Paired difference’;

run;

We obtain the following output.

Test of mean, Paired difference

The MEANS Procedure

Analysis Variable : diff

t Value

Pr > t

−4.05

0.0029

Because the p-value 0.0029 is less than α = 0.05, we reject the null hypothesis.

PROJECTS FOR

7A. Testing on Computer-Generated Samples

(a) Small sample test:

Generate a sample of size 20 from a normal population with μ = 10, and σ² = 4.

(i) Perform a t-test for the test H₀ : μ = 10 versus H_a : μ = 10 at level α = 0.05.

(ii) Perform the test H₀ : σ² = 4 versus H_a : σ² = 4 at level α = 0.05.

Repeat the procedure 10 times, and comment on the results.

(b) Large sample test:

Projects for

Generate a sample of size 50 from a normal population with μ = 10, and σ² = 4. Perform a z-test

for the test H₀ : μ = 10 versus H_a : μ = 10 at level α = 0.05. Repeat the procedure 10 times and

comment on the results.

7B. Conducting a Statistical Test with Confidence Interval

Let θ be any population parameter. Consider the three tests of hypotheses

H₀ : θ = θ₀ vs. H_a : θ > θ₀

(1)

H₀ : θ = θ₀ vs. H_a : θ < θ₀

(2)

H₀ : θ = θ₀ vs. H_a : θ = θ₀

(3)

The following procedure can be exploited to test a statistical hypothesis utilizing the confidence

intervals.

Procedure to Use Confidence Interval for Hypothesis Testing

Let θ be any population parameter.

(a)

For test (1), that is,

H₀ : θ = θ₀ vs. H_a : θ > θ₀

choose a value for α. From a random sample, compute a confidence interval for θ using

a confidence coefficient equal to 1 − 2α. Let L be the lower end point of this confidence

interval.

Reject H₀ if θ₀ < L.

That is, we will reject the null hypothesis if the confidence interval is completely to the right

of θ₀.

(b)

For test (2), that is,

H₀ : θ = θ₀ vs. H_a : θ < θ₀

choose a value for α. From a random sample, compute a confidence interval for θ using

a confidence coefficient equal to 1 − 2α. Let U be the upper end point of this confidence

interval.

Reject H₀ if U < θ₀.

That is, we will reject the null hypothesis if the confidence interval is completely to the

left of θ₀.

(c)

For test (3), that is,

H₀ : θ = θ₀ vs. H_a : θ = θ₀

Hypothesis Testing

choose a value for α. From a random sample, compute a confidence interval for θ using a

confidence coefficient equal to 1 − α. Let L be the lower end point and U be the upper end

point of this confidence interval.

Reject H₀ if θ₀ < L or U < θ₀.

That is, we will reject the null hypothesis if the confidence interval does not contain θ₀.

(i) For any large data set, conduct all three of these hypothesis tests using a confidence

interval for the population mean.

(ii) For any small data set, conduct all three of these hypothesis tests using a confidence

interval for the population mean.

Functional gastrointestinal disturbances in the early age children

Diseases of the lips and tongue in children

Diaphragmatic hernia. Disease of mediastinum

IMMUNE PREPARATIONS

Lecture 1

Articulation and occlusion. Biomechanics of movement of the mandible (Vertical, sagittal, transversal movements of the mandible).

Приєднуйся до нас!

Підписатись на новини:

Наші соц мережі

Leave a Reply Cancel reply

Functional gastrointestinal disturbances in the early age children

Diseases of the lips and tongue in children

Diaphragmatic hernia. Disease of mediastinum

IMMUNE PREPARATIONS

Lecture 1

Articulation and occlusion. Biomechanics of movement of the mandible (Vertical, sagittal, transversal movements of the mandible).