CS416 Project 4: Tiny File System using FUSE Library

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (4 votes)

In project 3, you successfully designed a memory management library with user-level page tables and you are now an expert in virtual memory management. However, we have not thought about persistent storage. In this project, your task is to design a user-level file system with the FUSE file system library. In addition, we have provided a simple test case (simple_test.c) and a detailed test case (test_case.c) to help you debug and guide you as you develop your code. We have also provided some crucial debugging tips for FUSE. Finally, FUSE-based file systems are used in several real-world applications including Android-based phones. You are building a real user-level file system; so write clean code, modularize your code, and add code comments where possible. Please take some time to read the instructions carefully before getting started with implementation. You should approach your implementation like any seasoned software developer would; list down things you understand, list down things you do not understand, and take time to come up with some high-level thoughts on how you will go about designing the solution. 1. Description File systems provide the ‘file’ abstraction. Standing between the user and the disk, they organize data and indexing information on the disk so that metadata like ‘filename’, ‘size’, and ‘permissions’ can be mapped to a series of disk blocks that correspond to a ‘file’. File systems also handle the movement of data in and out of the series of disk blocks that represent a ‘file’ in order to support the abstractions of ‘reading’ and ‘writing’ from/to a ‘file’. You will be implementing a fairly simple file system called Tiny File System (TFS), building on top of the FUSE library. 2. FUSE Library FUSE is a kernel module that redirects system calls to your file system from the OS to the code you write at the user level. While working with an actual disk through a device driver is a worthy endeavor, we would prefer you concentrate more on your file system’s logic and organization instead of building one while also learning how to operate a disk directly. You will, however, have to emulate a working disk by storing all data in a flat file that you will access like a block device. All user data as well as all indexing blocks, management information, or metadata about files must be stored in that single flat file. All your FUSE calls will only reference this ‘virtual disk’ and will provide the ‘file’ abstraction by fetching segments from the ‘disk’ in a logical manner. For example, if a call is made to read from a given file descriptor, your library would use the request to lookup the index information to find out where the ‘file’ is located in your ‘disk’, determine which block would correspond to the offset indicated by the file handle, read in that disk block, and write the requested segment of the data block to the pointer given to you by the user. To give your file system implementation some context and scope, stub files will be provided. The iLab machines are already running the FUSE driver. So, when you run TFS, the current code registers important file system functionalities that must be handled with the FUSE driver. How is that done? Look in the tfs_ope structure in tfs.c, which registers with the OS FUSE driver the functions you will be handling. The .init, .destroy, and other structure variables are generic stubs that any new file system must implement. More details on FUSE can be found in the Resources section (Section 6). static struct fuse_operations tfs_ope = { .init = tfs_init, 1 .destroy = tfs_destroy, .getattr = tfs_getattr, .readdir = tfs_readdir, … … } 3. Tiny File System Framework The skeleton code of TFS is structured as follows: • code/block.c : Basic block operations, acts as a disk driver reading blocks from disk. Also configures disk size via DISK_SIZE • code/block.h : Block layer headers and configures the block size via BLOCK_SIZE • code/tfs.c : User-facing file system operations • code/tfs.h : Contains inode, superblock, and dirent structures. Also, provides functions for bitmap operations. • code/benchmark/simple_test.c : Simple test case for initial development. • code/benchmark/test_cases.c : More comprehensive test cases. • Makefile : in both code/ and code/benchmark/ . Similar to VM project, first compile TFS code and then the benchmark code. 3.1 Block I/O layer block.h/block.c specifies low-level block I/O operations. Since we’re using a flat file to simulate a real block device (HDD or SSD), we use Linux read()/write() system call as our low-level block read and write operation. You will be using the following two functions to implement TFS functions on top of them. We have provided the implementation of bio_write and bio_read functions. int bio_read(const int block_num, void *buf); Reads a block at block_num from flat file (our ‘disk’) to buf int bio_write(const int block_num, const void *buf); Writes a block at block_num from buf to flat file (our ‘disk’) Note there are also three other functions to help with your flat file ‘disk’: void dev_init(const char* diskfile_path) Creates a flat file at the given path. The file will be of size DISK_SIZE bytes and will serve as your ‘disk’. This should be called if the file hasn’t been created yet. int dev_open(const char* diskfile_path) Simply opens the flat file that will serve as the ‘disk’. This should be called upon mounting the file system, so the the ‘disk’ could be read from and written to using bio_read() and bio_write() void dev_close() Simply closes the disk file. This should be called when the file system is being unmounted. Note: There is a global variable called diskfile_path in tfs.c. This variable is set in the main() function within tfs.c, setting the path to /DISKFILE. You should use this path for your disk file and pass this global variable into dev_init() and dev_open(). It is done this way to avoid hardcoding a path for the disk file, making tfs more portable. You should not set diskfile_path yourselves. 2 3.2 Bitmap In tfs.h, you are given the following three bitmap functions: set_bitmap(bitmap_t b, int i) Set the ith bit of bitmap b. unset_bitmap(bitmap_t b, int i) Unset the ith bit of bitmap b. get_bitmap(bitmap_t b, int i) Get the value of ith bit of bitmap b. In a traditional file system, both the inode area and the data block area needs a bitmap to indicate whether an inode or a data block is available or not (similar to virtual or physical page bitmap in project 3). When adding or removing a block, traverse the bitmap to find an available inode or a data block. Here are the two functions you need to implement: int get_avail_ino() Traverse the inode bitmap to find an available inode, set this inode number in the bitmap and return this inode number. int get_avail_blkno() Traverse the data block bitmap to find an available data block, set this data block in the bitmap and return this data block number. 3.3 Inode An inode is the meta-data of a file or directory that stores important information such as inode number, file size, data block pointers; you could see the definition of struct inode in tfs.h. In TFS, you only need to implement the following two inode operations: int readi(uint16_t ino, struct inode *inode) This function uses the inode number as input, reads the corresponding on-disk inode from inode area of the flat file (our ‘disk’) to an in-memory inode (struct inode). int writei(uint16_t ino, struct inode *inode) This function uses the inode number as input, writes the in-memory inode struct to disk inode in the inode area of the flat file (our ‘disk’). 3.4 Directory and Namei Directories, in any file system, is an essential component. Similar to Linux file systems, TFS also organizes files and directories in a tree-like structure. To lookup a file or directory, your implementation of TFS must follow each part of the path name until the terminal point is found. For example, to lookup “/foo/bar/a.txt”, you will start at the root directory (“/”), look through the directory entries stored in the data blocks of the root directory to get the inode number of “foo”, then look through the directory entries stored in the data blocks of the directory “foo” to get the inode number of “bar”, and then finally look through the directory entries stored in the data blocks of the directory “bar” to get the inode number of “a.txt”, the terminal point of the path. In tfs.h, you will find a directory entry structure called struct dirent. This struct describes the mapping of every file and sub-directory in the current directory. Thus, to create a 3 file/sub-directory in the current directory, you will need to add a directory entry for created file/sub-directory in the current directory’s data blocks. Likewise, to delete a file/sub-directory from the current directory, you will have to remove/invalidate the directory entry of the file/sub-directory in the data blocks of the current directory. The following are the directory and namei functions you will have to implement: int dir_find(uint16_t ino, const char *fname, size_t name_len, struct dirent *dirent) This function takes the inode number of the current directory, the file or sub-directory name and the length you want to lookup as inputs, and then reads all direct entries of the current directory to see if the desired file or sub-directory exists. If it exists, then put it into struct dirent dirent*. int dir_add(struct inode dir_inode, uint16_t f_ino, const char *fname, size_t name_len) In this function, you would add code to add a new directory entry. This function takes as input the current directory’s inode stucture, the inode number to put in the directory entry, as well as the name to put into the directory entry. The function then writes a new directory entry with the given inode number and name in the current directory’s data blocks. Look at the code skeleton in tfs.c for more hints. int dir_remove(struct inode dir_inode, const char *fname, size_t name_len) In this function, you would add code to remove a directory entry from a directory. This function takes as input the current directory’s inode structure and the name of the directory entry you want to remove. To remove a directory entry, you would look in the data blocks of the given directory for the directory entry with the corresponding name, then mark the directory entry as invalid. int get_node_by_path(const char *path, uint16_t ino, struct inode *inode) This is the actual namei function which follows a pathname until a terminal point is found. To implement this function use the path, the inode number of the root of this path as input, then call dir_find() to lookup each component in the path, and finally read the inode of the terminal point to “struct inode *inode”. 3.5 FUSE-based File System Handlers As described in Section 2, the FUSE kernel module will intercept and redirect file system calls back to your implementation. In tfs.c, you will find a struct called struct fuse_operations tfs_ope which specifies file system handler functions for each file system operation (e.g., .mkdir = tfs_mkdir). After you mount the FUSE filesystem in the mount path, the kernel module starts redirecting basic filesystem operations (e.g., mkdir()) to TFS handler (tfs_mkdir). For example, when you mount TFS under “/tmp/mountdir”, and you type the following command “mkdir /tmp/mountdir/testdir”, the FUSE module would redirect the call to tfs_mkdir instead of the default mkdir system call inside the OS. Here are the Tiny File System operations (handlers) you will implement in this project: static void *tfs_init(struct fuse_conn_info *conn) This function is the initialization function of TFS. In this function, you will open a flat file (our ‘disk’, remember the virtual memory setup) and read a superblock into memory. If the flat file does not exist (our ‘disk’ is not formatted), it will need to call tfs_mkfs() to format our “disk” (partition the flat file into superblock region, inode region, bitmap region, and data block region). You must also allocate any in-memory file system data structures that you may need. static void tfs_destroy(void *userdata) This function is called when your TFS is unmounted. In this function, de-allocate in-memory file system data structures, and close the flat file (our “disk”). static int tfs_getattr(const char *path, struct stat *stbuf) This function is called when accessing a file or directory and provides the stats of your file, such as inode permission, size, number of references, and other inode information. It takes the path of a file or directory as 4 an input. To implement this function, use the input path to find the inode, and for a valid path (inode), fill information inside “struct stat *stbuf”. We have shown a sample example in the skeleton code. On success, the return value must be 0; otherwise, return the right error code. Note 1: If you do not implement tfs_getattr(), a cd command into TFS mountpoint will show errors. Other parameters to fill include: st_uid, st_gid, st_nlink, st_size, st_mtime, and st_mode. The template code already shows how to do it for some parameters. You would have to do it for all files and all directories including the root. For setting st_uid and st_gid, you could use getuid() and getgid(). For other parameters, think! Note 2: tfs_getattr() must also be called before creating a file or directory to check whether the file or the directory you want to create already exists. So, think about what return value you would expect from tfs_getattr(). static int tfs_opendir(const char *path, struct fuse_file_info *fi) This function is called when accessing a directory (e.g., cd command). It takes the path of the directory as an input. To implement this function, find and read the inode, and if the path is valid, return 0 or return a negative value. static int tfs_readdir(const char *path, void *buffer, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi) This function is called when reading a directory (e.g., ls command). It takes the path of the file or directory as an input. To implement this function, read the inode and see if this path is valid, read all directory entries of the current directory into the input buffer. You might be confused about how to fill this buffer. Don’t worry, in Section 7, we will give you some online resource as a reference. static int tfs_mkdir(const char *path, mode_t mode) This function is called when creating a directory (mkdir command). It takes the path and mode of the directory as an input. This function will first need to separate the directory name and base name of the path. (e.g., for the path “/foo/bar/tmp”, the directory name is “/foo/bar”, the base name is “tmp”). It should then read the inode of the directory name and traverse its directory entries to see if there’s already a directory entry whose name is base name, and if true, return a negative value; otherwise, the base name must be added as a directory. The next step is to add a new directory entry to the current directory, allocate an inode, and also update the bitmaps (more detailed hints could be found in the skeleton code). static int tfs_rmdir(const char *path) This function is called when removing a directory (rmdir command). It takes the path directory as an input. Similar to mkdir, this function will first need to separate the directory name and base name of the path. (e.g. for path “/foo/bar/tmp”, the directory name is “/foo/bar”, the base name is “tmp”). It then reads the inode of the directory name, and traverses the directory entries to see if there’s a directory entry whose name is base name, if so, removes this directory from the current directory, reclaims its inode, data blocks, and updates the bitmaps (more detailed hints can be found in the skeleton code). If the directory you want to delete does not exist, just return a negative value. static int tfs_create(const char *path, mode_t mode, struct fuse_file_info *fi) This function is called when creating a file (e.g., touch command). It takes the path and mode of a file as an input. This function should first separate the directory name and base name of the path. (e.g. for path “/foo/bar/a.txt”, the directory name is “/foo/bar”, the base name is “a.txt”). It should then read the inode of the directory name, and traverse its directory entries to see if there’s already a directory entry whose name is base name, if so, then it should return a negative value. Otherwise, base name is a valid file name to be added. The next step is to add a new directory entry (“a.txt”) using dir_add() to the current directory, allocate an inode, and update the bitmaps (more detailed steps could be found in skeleton code). static int tfs_open(const char *path, struct fuse_file_info *fi) 5 This function is called when accessing a file. It takes the path of the file as an input. It should read the inode, and if this path is valid, return 0, else return -1. static int tfs_read(const char *path, char *buffer, size_t size, off_t offset, struct fuse_file_info *fi) This function is the read operation’s call handler. It takes the path of the file, read size and offset as input. To implement this function, read the inode of this file from the path input, get the inode, and the data blocks using the inode. Copy size bytes from the inodes data blocks starting at offset to the memory area pointed to by buffer. static int tfs_write(const char *path, const char *buffer, size_t size, off_t offset, struct fuse_file_info *fi) This function is the write call handler. It takes the path of the file, write size, and offset as an input. To perform a write operation, read the inode using the file path and using the inode, locate the data blocks and then copy size bytes from the memory area pointed by buffer to the file’s data blocks starting at offset. Note 3: All operations in the TFS (and any file system) are done in the granularity of “BLOCK_SIZE.” So, if you want to update only a few bytes of a block (say 5 bytes in inode blocks or bitmaps or superblock), you must read the entire disk block to a memory buffer, apply your change, and write the in-memory buffered block to the disk block. Note 4: You must handle unaligned write/read (i.e., writing to or reading from an offset that is not aligned with the BLOCK_SIZE). static int tfs_unlink(const char *path) This function is called when removing a file (rm command). It takes the path directory as an input. First, separate the directory name and base name of the path. (e.g. for path “/foo/bar/a.txt”, the directory name is “/foo/bar”, the base name is “a.txt”). Next, read the inode of the directory name, and traverse its directory entries to see if there’s a directory entry whose name is base name, if so, then remove/invalidate this directory entry in the data blocks of the current directory, reclaim the inode, data blocks of the inode, and update the appropriate bitmaps (more detailed hints could be found in the skeleton code). If the file you want to delete does not exist, just return a negative value; 3.6 Large file support (Extra Credit +15 pts) We know an inode stores locations of data blocks for a particular file/directory. Different file systems use different ways to store these locations. For example, FAT32 uses a file allocation table, Ext3 uses direct and indirect pointers, and Ext4 uses extents. In TFS, we use a direct pointer-based mechanism (look at the definition of struct inode in tfs.h). However, the number of direct pointers in each inode is limited. To support large files, you can make use of indirect pointers, more specifically, singly indirect pointers. A singly indirect pointer allows more data blocks to be referenced compared to a direct pointer. Instead of pointing directly to a file’s data block like a direct pointer would, a singly indirect pointer would point to a data block that stores direct pointers. These direct pointers within this data block would then point to the file’s actual data blocks. Remember, this is extra credit. You are not required to support indirect pointers, but you should try and implement this if you finish the basic requirements of the project and time allows. 4. Run TFS The code skeleton of TFS is already well-structured. You will need to build and run TFS on iLab machines. Unfortunately, not all iLab machines have the FUSE library installed. The following ilab machines have been tested and known to have the FUSE library already installed: 6 • cd.cs.rutgers.edu • cp.cs.rutgers.edu • ls.cs.rutgers.edu • kill.cs.rutgers.edu Here are the steps to build and run TFS. 1. Login to one the iLab machines listed above 2. Make a directory under /tmp/ to mount your file system to $ mkdir /tmp// $ mkdir /tmp//mountdir 3. Compile TFS by naviagating to your code and running make $ cd code $ make 4. Mount TFS to the directory you just created $ ./tfs -s /tmp//mountdir 5. Check if TFS is mounted successfully using findmnt $ findmnt If mounted successfully, you could see the following information in the output of findmnt: /tmp//mountdir tfs fuse.tfs rw,nosuid,nodev,relatime,… If you want to stop the Tiny File System, you can exit and umount TFS by running the following: $ fusermount -u /tmp//mountdir 5. Testing Tiny File System Functionality To test the functionality of TFS, you could just simply enter the mountpoint (In this case, it should be “/tmp/[your NetID]/mountdir”), and perform some commands (ls, touch, cat, mkdir, rmdir, rm . . . ). In addition, we also provide a simple benchmarks (In the Benckmark folder in your skeletion code) to test your implementation of Tiny File System. To run the benchmarks: 1. Make sure TFS is mounted and running. 2. Ensure to benchmarks know where to test the file system by configuring TESTDIR in each benchmark to point your file system’s mount point (ex. /tmp/[your NetID]/mountdir) 3. Compile the benchmarks using the provided Makefile 4. Run the benchmarks 6. Debugging Tips for FUSE 1. Debugging in the FUSE library is not an easy step because we cannot simply use GDB to debug TFS always. However, FUSE provides -d options; when you run TFS with the -d option, it will print all debug information and trace on the terminal window. Therefore, the best way is to add print statements to debug in combination with GDB for debugging your functions. 2. The “-d” parameter is supposed to run TFS continuously, which gives a notion that your file system is hanging. So, when you debug using “-d,” open a SEPARATE terminal window and run your commands from there. The Fuse system with “-d” also has another caveat that if you CTRL-C out of the -d “hang,” it also unmounts the file system, which is why you will not see your file system mounted (when using “findmnt”). 7 3. If you want to run TFS in the foreground, use -f option if (2) is insufficient. For getting more help on commands use, $ ./tfs –help Note: FUSE might return some errors (e.g., Input/Output error or Transport Endpoint is not connected); this is because some FUSE file handlers are not fully implemented in the skeleton code. For example, if you have not implemented tfs_getattr(), a cd command into the TFS mount point will show errors. 7. Resources FUSE library API documentation • https://libfuse.github.io/doxygen/index.html A FUSE file system tutorial: • http://www.maastaar.net/fuse/linux/filesystem/c/2016/05/21/writing-a-simple-filesystem-usingfuse/ Two very useful tutorials: • https://www.cs.nmsu.edu/~pfeiffer/fuse-tutorial/ • https://www.cs.hmc.edu/~geoff/classes/hmc.cs135.201109/homework/fuse/fuse_doc.html 8. Suggested steps for designing a solution Given the nature of the project it is quite hard to have a suggested steps. Its hard because in order to start testing out things, using commands like “cd,mkdir,ls,touch” etc.. all the functions have to be implemented for the most part. However we will say the following steps: 1. tfs_mkfs, tfs_init, and tfs_destroy should probably be the first place to start as you a “disk” to even start 2. You should probably implement all the helper functions as you will make good use of them when implementing the rest of the tfs functions. 3. The bitmap operations are the easiest to implement 4. The inode helper operations (readi and writei) are the second easiest 5. The directory operations are the hardest one 6. Once you implement some of the helper functions to can start to use them within the rest of the tfs functions you have to implement. Solving the project requires understanding file system concepts and understanding what functions are used where. If you do not have a firm understanding of how the file system is laid out or how directories work, you will have trouble trying to implement because you may not know at the start what the purpose of this function is or when it will be used. One thing we would suggest is to take a paper and pen and go through the operations that need to be done to do a certain operation for example, finding a file within a particular directory or creating a simple file called “a.txt” within the root directory. 8 9. Submission and Report Submit the following files to Sakai as is (Do not compress them): 1. tfs.c 2. tfs.h 3. Makefile (if modified) 4. Any other supporting source files you created 5. A report in .pdf format The report should contain the following things: – Both partners names and netids – Details on the total number of blocks used when running the sample benchmark, the time to run the benchmark, and briefly explain how you implemented the code – Any additional steps that we should follow to compile your code – Any difficulties and issues you faced when completing the project – If you had issues you were not able to resolve, describe what you think went wrong and how you think you could have solved them 10. Ideas and thought on TFS In this project, you will implement a workable file system all on your own. Please think about the following questions: 1. How many times is bio_read() called when reading a file “/foo/bar/a.txt” from our ‘disk’ (flat file)? 2. How can you reduce the number of bio_read() calls? 3. Besides storing meta-data structures in memory, what else could you do to improve performance? 4. In each API function of the skeleton code, we have provided several steps. Think about what would happen if a crash occurs between any of these steps? How would you improve the crash-consistency of TFS? You don’t have to submit anything about these questions. Just think about them when you finish your project. Remember: • Your grade will be reduced if your submission does not follow the above instruction. • Borrow ideas from online tutorials, but DO NOT copy code and solutions, we will find out. Created by CS416 staff. 9