## Description

1. Table 1 presents a subset of data collected by V¨ais¨anen and J¨arvinen (1977) on bird

species in the Krunnit Islands archipelago of Finland. In particular, they reported

on the bird species found on each of the islands in 1949 and how many of those bird

species were extinct by 1970.

It is of interest to understand whether the area of

the island (in km2

) is associated with species’ survival. The data corresponding to

Table 1 are available in the Excel file Extinction.xlsx.

Extinct?

Island Area (X) Yes No

Ulkokrunni 185.80 5 70

Maakrunni 105.80 3 64

Ristikari 30.70 10 56

Isonkivenletto 8.50 6 45

Hietakraasukka 4.80 3 25

Kraasukka 4.50 4 16

L¨ansiletto 4.30 8 35

Table 1: Extinction of bird species from 1949 to 1970 on seven islands in the Krunnit

Islands archipelago, Finland.

Fit the logistic regression model

log

p(X)

1 − p(X)

= β0 + β1X

where X denotes island area and p(X) denotes the probability of extinction.

Figure 1 shows relevant SAS output for the logistic regression model.

(a) Carry out an appropriate goodness-of-fit test to determine whether the model

provides a good fit to the data. State the hypotheses, and give the test statistic

and the p-value of the test. What do you conclude at the α = 0.05 significance

level?

(b) Give estimates of β0 and β1 (up to 5dp).

(c) Interpret the association between island area and extinction using the odds ratio. Demonstrate how the odds ratio is calculated from Figure 1. Additionally,

provide a 95% confidence interval for the odds ratio.

Figure 1: Summary output for the logistic regression model log

p(X)

1−p(X)

= β0 + β1X.

(d) Find the predicted probability of extinction for an island with an area of 50

km2

(to 4dp).

(e) Find the fitted count of extinct bird species on the island of Ulkokrunni (to

2dp). Also find the fitted count of non-extinct bird species on Ulkokrunni (to

2dp).

(f) Test

H0 : β1 = 0

H1 : β1 6= 0

using the Wald statistic. Give the test statistic and the p-value of the test.

What do you conclude at the α = 0.05 significance level?

2. Consider data reported by Gilbert (1981) on the relationship between pre-marital

sex (i.e., sexual intercourse before marriage), extra-marital sex (i.e., sexual intercourse with someone other than a spouse whilst married), and whether the person

had been divorced for a random sample of heterosexual men and women who had

been married at least once.

These data are presented in Table 2 and are available

in the Excel file Divorce.xlsx.

Gender Pre-marital Extra-marital Divorced? (Z)

(W) Sex (X) Sex (Y ) No Yes

Woman

Yes Yes 4 17

No 25 54

No Yes 4 36

No 322 214

Man

Yes Yes 11 28

No 42 60

No Yes 4 17

No 130 68

Table 2: Data on reported pre-marital sex, extra-marital sex, and divorce for a random

sample of heterosexual men and women.

First, use the backward model selection method to find the simplest model that

provides a good fit to the data. Start from the following model, which we will

denote by M2,

log

pijk

1 − pijk

= β0 + β

W

i + β

X

j + β

Y

k + β

W X

ij + β

W Y

ik + β

XY

jk + β

W XY

ijk ,

where pijk is the probability of divorce when the gender (W) is at level i, pre-marital

sex status (X) is at level j, and extra-marital sex status (Y ) is at level k.

Figure 2 shows relevant summary output from SAS.

(a) Is model M2 a saturated model? Why or why not?

(b) What information does Step 1 provide in the SAS output? Write down the

test hypotheses. What do you conclude?

Figure 2: Summary output for the backward selection method applied to the logit model

log

pijk

1−pijk

= β0 + β

W

i + β

X

j + β

Y

k + β

W X

ij + β

W Y

ik + β

XY

jk + β

W XY

ijk .

(c) What is the final model?

Now consider the logit model, which we will denote by M1,

log

pijk

1 − pijk

= β0 + β

W

i + β

X

j + β

Y

k + β

XY

jk .

which uses a reference level parametrisation for all factors.

Figure 3 shows relevant summary output from SAS.

(d) Carry out an appropriate goodness-of-fit test to determine whether model M1

provides a good fit to the data. State the hypotheses, and give the test statistic

and the p-value of the test. What do you conclude at the α = 0.05 significance

level?

(e) Compare the odds of divorce for men with the odds of divorce for women using

an odds ratio, and interpret this odds ratio. Give a 95% confidence interval for

the odds ratio.

Figure 3: Summary output for the logit model log

pijk

1−pijk

= β0 + β

W

i + β

X

j + β

Y

k + β

XY

jk .