## Description

## 1. Model identification using PCA

Consider the flow process shown in Fig. 1 consisting of five streams, the flow rates

of all of which are measured. A data set (flowdata3.mat) consisting of 1000

samples corresponding to different steady states have been obtained.

(a) Apply PCA to identify the linear constraint model relating the variables

(assuming that you know that the number of linear relations that exist between

variables). In order to verify whether your constraint model is good, choose F3

and F5 as independent variables and obtain the relationship between the

dependent and independent variables (regression form of the model) using your

estimated constraint model and find the maximum absolute difference (maxdiff)

between estimated regression model coefficients and true regression model

coefficients. Report the eigenvalues and maxdiff value.

(b) Apply IPCA to estimate diagonal error variances and identify the linear steady

state model relating the flow variables (assuming that you know that the number

of linear relations that exist between variables). Report the estimated variances,

eigenvalues and maxdiff value.

(c) Apply IPCA assuming incorrectly that there are four constraints. Report the

eigenvalues obtained? Are you able to determine from the eigenvalues that the

number of constraints has been incorrectly guessed? Give reasons for your

answer.

(d) From the constraint model identified in (b) suggest a procedure (a measure)

by which you can determine a set of independent variables for the process.

Determine the best and worst possible choice of independent variable set for this

system based on your proposed measure and justify whether these inferences

(obtained from data) are consistent with the physical process.

## 2. Multivariate calibration model using PCA

Multivariate calibration of spectral measurements is a technique that is used in

chemometrics to develop a model relating spectral measurements (obtained using

instruments such as UV, FIR or NIR or MS spectrophotometers) to properties such

as concentration or other properties of species (usually liquid or gases). The

application we consider is to obtain a model relating UV absorbance spectra to

compositions (concentrations) of mixtures. Such a model is useful in online

monitoring of chemical and biochemical reactions.

Twenty six samples of different concentrations of a mixture of Co, Cr, and Ni ions

in dilute nitric acid were prepared in a laboratory and their spectra recorded over

the range 300-650 nm using a HP 8452 UV diode array spectrophotometer (data

in Inorfull.mat). (Water and ethanol are generally used as solvents since these do

not absorb in the UV range. Also the nitrate ions do not absorb in the UV range.

So an aqueous solution of nitric acid is used to dissolve the metals in this

experiment). Five replicates for each mixture were obtained. The measurements

were made at 2 nm intervals giving rise to an absorbance matrix of size 130 x 176.

The concentrations of the 26 samples, which is a 26 x 3 matrix are also given in

the data file. In order to predict the concentration of the mixture using absorbance

measurements, it is necessary to build a calibration model relating concentration

of mixtures to its absorbance spectra. According to Beer-Lambert’s law the

absorbance spectra of a dilute mixture is a linear (weighted) combination of the

pure component spectra with the weights corresponding to the concentrations of

the species in the mixture.

If absorbances are measured only a minimum number of wavelengths, then OLS

can be used to build a calibration model. For example, if a mixture containing ns

non-reacting species, then absorbances at ns wavelengths need to be measured.

Typically, the wavelengths are chosen corresponding to the maximum absorbing

wavelengths of individual species.

However, if we measure absorbances at nw >

ns wavelengths, then the absorbance matrix will not be full column rank. In this

case, Principal Component Regression can be used to develop a multivariate

calibration model. In this method PCA is first applied to the absorbance matrix to

obtain the scores corresponding to different mixtures. In the second step, a

regression model is used to relate the concentrations to the scores using OLS

(assuming concentrations are the dependent variables).

In order to use this model

for predicting the concentrations of a mixture whose absorbance spectra is given,

we first obtain the scores and then use the OLS regression model to predict the

concentrations. Note that the true rank of the absorbance matrix is equal to the

number of species in the mixture.

The quality of the linear calibration model is evaluated using leave-one-sampleout cross-validation (LOOCV) and computing the root mean square error (RMSE)

in predicting the left out sample concentrations. Pick the first replicate for each

mixture to obtain a data matrix of size 26 x 176 and use it for the following different

multivariate calibration modelling methods. For each method report the LOOCV

RMSE results in the form of a table for number of PCs chosen between 1 and 5.

Based on the RMSE values indicate whether you are able to estimate the number

of species correctly?

(a) Develop a multivariate calibration model using PCR.

(b) The absorbances are very noisy near the ends of the instrument. Estimate the

standard deviation of errors in absorbance measurements using the five replicates

for each wavelength and for each mixture. Assume that the error standard

deviations vary significantly with respect to wavelength but are almost same for all

mixtures (verify this by plotting the estimated standard deviations wrt wavelength

and mixtures).

Therefore, obtain the average standard deviation or errors with

respect to each wavelength. Use these standard deviations to scale the

absorbance measurements for each wavelength before applying PCR to develop

the calibration model (known as scaled PCR).

(c) Use IPCA to estimate the error variances with respect to wavelength in step 1

of PCR and use it to develop the calibration model (known as IPCR).

(d) If the error variances varies with respect to both mixtures and wavelengths,

then Maximum Likelihood PCA (MLPCA) proposed by Wentzell et al. (1997) can

be used to reduce the rank of the absorbance matrix and then use OLS to develop

the calibration model (also known as MLPCR).

Write a MATLAB function to

implement MLPCA given a data matrix, corresponding error standard deviation

matrix, and number of factors (or PCs). The function should return the scores

matrix. Use this function and the standard deviation of errors for each wavelength

and mixture estimated directly from the replicate measurements to develop the

calibration model using MLPCR.