標題: 分散式檔案系統存取行為之分析
The Access Pattern Analysis of Distributed File System
作者: 蘇元祺
Su Yeng Chee
張明峰
Dr. Ming-Feng Chang
資訊科學與工程研究所
關鍵字: 存取記錄;分散式檔案系統;存取行為分析;模擬器;trace;distributed file system;access pattern analysis
公開日期: 1993
摘要: 分散式檔案系統是分散式系統中一個重要的部份,它使分散式系統中各機 器間能分享儲存資源。目前已有多種分散式檔案系統的設計,然而,卻一 直沒有一個能充份反應真實使用情況的系統效能測試程式。因此我們便針 對這個問題,設計一個以真實檔案存取記錄驅動的分散式檔案系統測試環 境。這個測試環境包含三個部份:檔案存取記錄收集器、檔案存取行為分 析及以檔案存取記錄驅動的分散式系統模擬器。本篇論文介紹檔案存取記 錄器的製作與檔案存取行為之分析。檔案存取記錄收集器是用來在一個實 際作業的系統中將使用者程序所發出的檔案存取要求記錄下來。檔案存取 記錄分析則包含了針對使用者程序的分析、個別檔案存取的分析以及檔案 開啟及生命周期的分析。我們所設計的檔案存取記錄收集器是利用修改系 統核心將使用者程序所發出的系統呼叫攔截下來,並將這些記錄儲存在一 個檔案中。這個存取記錄收集器已經安裝在交通大學資訊工程系計算機中 心的工作站上運行。根據存取模擬程式的交叉測試,證實檔案存取記錄收 集器工作無誤。我們將所收集的二十四小時記錄做為一個典型工作日來分 析,得到了下列結論:第一、大部份的檔案存取多是短暫的,換句話說, 使用者程序會在短暫時間中發出大量的存取要求。第二、大部份的檔案是 唯讀且循序讀取,然而隨機存取的資料量會隨著某些設計不良的程式會大 量增加。第三、大部份的循序讀取是以 1 KBytes 和 8 KBytes 為單位, 這是由於內建 C程式庫以其為緩衝區單位,同時由於單一讀取的連續段佔 了很大一個部份,導致有相當比例的預讀資料浪費掉了。第四、大部份檔 案的開啟時間都很短暫,而暫存檔的生命期也很短,假使系統延遲寫入六 十秒,超過 85% 的暫存檔將在被寫入前刪除。 Distributed file system (DFS) which enables storage sharing between machines is a key component of distributed system. DFS implementations such as NFS and AFS, has been widely used. However, there is no common performance testbed that reflects real working load. Our goal is to build a trace-driven evaluation environment for DFS. The environment consists of three parts: a file I/O trace collector , the access pattern analyzer and a trace-driven DFS simulator. In this thesis, the design of file I/O trace collector and the access pattern of DFS are described. The trace collector is used to collect file access system calls issued by user processes on a running system. The collected trace is used to analyze user process activity, file access pattern and file lifetime. The implementation of our trace collector is to modify SunOS kernel so that some system calls are redirected to our own routines. These routines not only execute the desired function but also record the event into a log file. The trace collector is installed on the machines in the computer center of CSIE, NCTU. The correctness of trace collector has been verified by cross comparison between the trace log and output a simulated file access generator. Two trace logs have been collected on two machines for 24 hours which represent typical workloads of a working day. The trace analysis show that most file accesses are bursty, i.e., user processes may issue a huge amount of file accesses in a short time. Most files are read-only and accessed sequentially. However, a large amount random access data is caused by an ill-behaved program. Most sequential accesses are 1Kbyte or 8Kbyte long. That is because they are the buffer size of high level C library. Most sequential accesses contain only one read/write event. That means a large amount of read ahead is wasted. At last, most files are open for very short time and their lifetime is short too. More than 85% newly created files are deleted in 60 seconds.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT820392027
http://hdl.handle.net/11536/57831
顯示於類別:畢業論文