Description

5/5 - (2 votes)

Paths for project 3
Defining the data sets to analyze yourself
For project 3, you can propose own data sets that relate to your research interests or just use existing data sets from say

Kaggle
The University of California at Irvine (UCI) with its machine learning repository.
The approach to the analysis of these new data sets should follow to a large extent what you did in projects 1 and 2. That is:
Whether you end up with a regression or a classification problem, you should employ at least two of the methods we have discussed among linear regression (including Ridge and Lasso), Logistic Regression, Neural Networks, Convolution Neural Networks, Recurrent Neural Networks, Support Vector Machines and Decision Trees, Random Forests, Bagging and Boosting. You could for example explore all of the approaches from decision trees, via bagging and voting classifiers, to random forests, boosting and finally XGboost. If you wish to venture into convolutional neural networks or recurrent neural networks, or extensions of neural networkds, feel free to do so.
For Boosting, feel also free to write your own codes.
For project 3, you should feel free to use your own codes from projects 1 and 2, eventually write your own for SVMs and/or Decision trees/random forests/bagging/boosting’ or use the available functionality of Scikit-Learn, Tensorflow, etc.
The estimates you used and tested in projects 1 and 2 should also be included, that is the R2-score, MSE, confusion matrix, accuracy score, information gain, ROC and Cumulative gains curves and other, cross-validation and/or bootstrap if these are relevant.
Similarly, feel free to explore various activations functions in deep learning and various approachs to stochastic gradient descent approaches.
If possible, you should link the data sets with exisiting research and analyses thereof. Scientific articles which have used Machine Learning algorithms to analyze the data are highly welcome. Perhaps you can improve previous analyses and even publish a new article?
A critical assessment of the methods with ditto perspectives and recommendations is also something you need to include.
All in all, the report should follow the same pattern as the two previous ones, with abstract, introduction, methods, code, results, conclusions etc..
We propose also an alternative to the above. This is a project on using machine learning methods (neural networks mainly) to the solution of ordinary differential equations and partial differential equations, with a final twist on how to diagonalize a symmetric matrix with neural networks..

This is a field with a large interest recently, spanning from studies of turbulence in fluid mechanics and meteorology to the solution of quantum mechanical systems. As reading background you can use the slides from week 43 and/or the textbook by Yadav et al.

The basic structure of your project
Here follows a set up on how to structure your report and analyze the data you have opted for.

Part a)
The first part deals with structuring and reading the data, much along the same lines as done in projects 1 and 2. Explain how the data are produced and place them in a proper context.

Part b)
You need to include at least two central algorithms, or as an alternative explore methods from decisions tree to bagging, random forests and boosting. Explain the basics of the methods you have chosen to work with. This would be your theory part.

Part c)
Then describe your algorithm and its implementation and tests you have performed.

Part d)
Then presents your results and findings, link with existing literature and more.

Part e)
Finally, here you should present a critical assessment of the methods you have studied and link your results with the existing literature.

Solving partial differential equations with neural networks
For this variant of project 3, we will assume that you have some background in the solution of partial differential equations using finite difference schemes. We will study the solution of the diffusion equation in one dimension using a standard explicit scheme and neural networks to solve the same equations.

For the explicit scheme, you can study for example chapter 10 of the lecture notes in Computational Physics or alternative sources. For the solution of ordinary and partial differential equations using neural networks, the lectures by Kristine Baluka Hein at this course are highly recommended.

For the machine learning part you can use your own code from project 2 or the functionality of for example Tensorflow/Keras..

Part a), setting up the problem
The physical problem can be that of the temperature gradient in a rod of length L=1 at x=0 and x=1. We are looking at a one-dimensional problem
∂2u(x,t)∂x2=∂u(x,t)∂t,t>0,x∈[0,L]
or
uxx=ut,
with initial conditions, i.e., the conditions at t=0,
u(x,0)=sin(πx)0

Project 3 Data Analysis and Machine Learning FYS-STK3155/FYS4155

Description

Related products

Project 2 Data Analysis and Machine Learning FYS-STK3155/FYS4155

Project 1 on Machine Learning, Data Analysis and Machine Learning FYS-STK3155/FYS4155