CS 6320 Project 1: Recognizing Textual Entailment

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Problem Definition and Data For Project-1, you will implement a deep learning model that recognizes the textual entailment relation between two sentences. Here, we are given two sentences: the premise, denoted by the letter t and the hypothesis, denoted by the letter h. We say that the premise entails the hypothesis i.e. t → h if the meaning of h can be inferred from the meaning of t [1]. To motivate the problem, consider some examples given below: (1) t: Eating lots of foods that are a good source of fiber may keep your blood glucose from rising too fast after you eat. h: Fiber improves blood sugar control. t → h as the meaning of h can be inferred from t. (2) t: Scientists at the Genome Institute of Singapore (GIS) have discovered the complete genetic sequence of a coronavirus isolated from a Singapore patient with SARS. h: Singapore scientists reveal that SARS virus has undergone genetic changes t →/ h as the meaning of h cannot be inferred from t. The task of textual entailment is set up as a binary classification problem where, given the premise and the hypothesis, the goal is to classify the relation between them as Entails or Not Entails. For conducting your experiments, you will use the RTE-1 dataset [2] that is provided as an addendum to this homework. The dataset contains 2 XML files: a train file and test file. The entailment relations are contained within pair tags in both files; examples of which are provided below: As you can observe, each pair tag contains the premise t and the hypothesis h contained within t and h tags respectively. The value attribute of the pair tag is a boolean indicating whether t → h or not. 1 Model Architecture The architecture of our model for textual entailment is fairly simple. It contains the following layers: 1. Embedding layer: This layer transforms the integer-encoded representations of the sentences into dense vectors. 2. Recurrent layer: This is a stacked bi-directional LSTM layer that takes in the vector representation from the Embedding layer and outputs another vector. 3. Fully connected layer: This layer transforms the output of the RNN into a vector of 2 dimensions. (one corresponding to each label i.e. Entails and Not Entails) A schematic showing the architecture of the model is provided below: We define the forward pass of our network as follows: Let t = {t1, t2, …tn} denote the premise and h = {h1, h2, …hn} denote the hypothesis. We first obtain the dense vector representations for both sentences by passing them through the same embedding layer. Let et = {et1 , et2 , …etn } denote the vector representations for the premise and eh = {eh1 , eh2 , …ehn } denote the vector representations for the hypothesis where each vector eti and ehi is of dimension d1. Next, we pass the vector representations through the same LSTM to obtain temporal sequences rt and rh respectively, each vector having dimension d2. The vectors rt and rh are concatenated together to obtain the vector rth Finally, this concatenated representation rth is passed through the fully connected layer to get vector fth, with dimension d3 = 2 (this is because you have 2 labels as discussed previously). Implementation and Execution Framework You are free to use any API like PyTorch, TensorFlow, DyNet, Caffe, etc. (with Python) for implementing this model. We recommend you use either TensorFlow or PyTorch for implementing your model as there is lot of help available online for writing code in these frameworks. For executing your code, you will use Google Colab, a virtual environment provided by Google that allows you to edit and run your code on your browser. Additionally, Colab provides free access to GPUs making it convenient for you to train and test your model quickly. 2 Tasks to be performed Here, we outline the tasks to be performed for this project. Task – 1: Prepare dataset (15 points) Write a method that takes in the path to (train or test) xml file as input and outputs three lists, one containing the lists of tokens for the premise, one containing the lists of tokens for the hypothesis, and one containing the label. For example, consider the given input and expected output: Input file:

O r acle had f o u g h t t o keep the forms from bein g r e l e a s e d</ t> O r acle r e l e a s e d a c o n f i d e n t i a l document </ p ai r>

iTunes s o f t w a r e has s e e n s t r o n g s a l e s i n Europe</ t> Poor s a l e s f o r iTunes i n Europe </ p ai r> Output: # p r emi s e s l i s t o f l i s t s p = [ [ ’ O r acle ’ , ’ had ’ , ’ f o u g h t ’ , ’ t o ’ , ’ keep ’ , ’ the ’ , ’ forms ’ , ’ from ’ , ’ bein g ’ , ’ r e l e a s e d ’ ] , [ ’ i t u n e s ’ , ’ s o f t w a r e ’ , ’ has ’ , ’ s e e n ’ , ’ s t r o n g ’ , ’ s a l e s ’ , ’ i n ’ , ’ eu r ope ’ ] ] # h y p o t h e s e s l i s t o f l i s t s h = [ [ ’ o r a c l e ’ , ’ r e l e a s e d ’ , ’ a ’ , ’ c o n f i d e n t i a l ’ , ’ document ’ ] , [ ’ poor ’ , ’ s a l e s ’ , ’ f o r ’ , ’ i t u n e s ’ , ’ i n ’ , ’ eu r ope ’ ] ] # l i s t o f l a b e l s / c l a s s e s h = [ ’TRUE’ , ’FALSE ’ ] Next, write a function to integer-encode the premises and hypotheses. In other words, replace each word in both lists by a unique integer (you may want to save this as dictionary to be used later). Likewise, integer-encode the labels also. For the same example given previously, the output will be: # p r emi s e s l i s t o f l i s t s p = [ [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 ] , [ 1 1 , 1 2 , 1 3 , 1 4 , 1 5 , 1 6 , 1 7 , 1 8 ] ] # h y p o t h e s e s l i s t o f l i s t s h = [ [ 1 , 1 0 , 1 9 , 2 0 , 2 1 ] , [ 2 2 , 1 6 , 2 3 , 1 1 , 1 7 , 1 8 ] ] # l i s t o f l a b e l s / c l a s s e s h = [ 1 , 2 ] Neural networks work only when all inputs are of uniform length. Pad zeros at the end of premise and hypothesis lists so that they are of uniform length. For example, if the max allowed length is set to 10, the lists will change to: # p r emi s e s l i s t o f l i s t s p = [ [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 ] , [ 1 1 , 1 2 , 1 3 , 1 4 , 1 5 , 1 6 , 1 7 , 1 8 , 0 , 0 ] ] # h y p o t h e s e s l i s t o f l i s t s h = [ [ 1 , 1 0 , 1 9 , 2 0 , 2 1 , 0 , 0 , 0 , 0 , 0 ] , [ 2 2 , 1 6 , 2 3 , 1 1 , 1 7 , 1 8 , 0 , 0 , 0 , 0 ] ] 3 Task – 2: Preparing the inputs for training/testing (10 points) Look into how to create batches for training and testing your model. For example, if you are using PyTorch, you can look into TensorDatasets and DataLoaders for effective training and testing. Note that while training, you will use the RandomSampler and for testing, you will use the SequentialSampler class. Task – 3: Define the model (20 points) Create the model, following the architectural specifications provided in the previous section. Be careful when defining the parameters of each layer in the model. You may want to use the token-integer mapping dictionary saved previously to define the size of the embedding layer. Task – 4: Train and Test the model (25 points) Look into how the model can be trained and tested. Define a suitable loss and optimizer function. Define suitable values for different hyper-parameters such as learning rate, number of epochs and batch size. To test the model, you may use scikitlearn’s classification report to get the precision, recall, f-score and accuracy values. Additionally, also report the throughput of your model (in seconds) at the time of inference. Task – 5: Prepare a report (10 points) Prepare a report summarizing your results. Specifically, observe the effect of hyper-parameters such as number of LSTM layers considered, embedding dimension, hidden dimension of the LSTM layers, etc. on model performance and throughput. To get a better understanding of how results are analyzed/summarized, consider the reference paper provided on the webpage. To Submit Submit the following: 1. Your source code (either as a Python notebook or regular Python file) 2. Instructions on how to run the code 3. Report External Links 1. How to use Google Colab 2. PyTorch documentation 3. TenforFlow documentation 4. A very simple TensorFlow tutorial 5. A very simple PyTorch tutorial 6. Textual Entailment with TensorFlow 7. Textual Entailment with PyTorch References [1] Daniel Z Korman, Eric Mack, Jacob Jett, and Allen H Renear. Defining textual entailment. Journal of the Association for Information Science and Technology, 69(6):763–772, 2018. [2] Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, pages 177–190. Springer, 2005. 4