標題: 導入關聯式資料庫系統應用於HBase
Migrating Relational Database Applications to HBase
作者: 楊濬仲
Yang, Chun-Chung
黃俊龍
Huang, Jiun-Long
資訊科學與工程研究所
關鍵字: 雲端運算;Hadoop MapReduce;HBase;Cloud Computing;Hadoop MapReduce;HBase
公開日期: 2011
摘要: MapReduce 在分散式系統的發展中已經佔有舉足輕重的地位,無論是在資料探勘、網頁資訊處理,需要海量資料處理的的科學計算,都可以透過 MapReduce 和其所提供的檔案處理系統得到很好的結果和延展性。在資料儲存方面,傳統的關聯式資料庫提供我們便利的索取資料語法,使工程師在開發程式時可以避免複雜的檔案儲存、資料整理。傳統的資料庫系統的儲存方式為 row-based,在資料儲存上需要較多的空間,其效能也隨資料大小而影響,公司常需要投資不小金額的資金來提供完善的資料庫系統供工程師開發。這使得公司在每次投資都需要謹慎評估大型資料庫系統的需求。HBase 是一個 column-based 的資料庫,不同於傳統的關聯式資料庫,HBase具有可擴充性、高可靠性等功能。本研究在如何應用 HBase 和 MapReduce 到現有的關聯式資料庫,借用 HBase 方便擴充的特性和利用多量的機器來達到現有資料庫系統的功能。在我們的研究發現要完整的用 HBase 取代 MapReduce 有產品上的考量,因為舊有的產品無法支援 HBase,將產品重寫的花費與風險是公司無法承擔的。因此我們提出一個完整的軟體架構提供工程師整合雲端系統進原本舊有的資料庫系統。使用者可以採用我們的架構逐步取代傳統的資料庫系統,另外我們針對 HBase 所無法達成的關聯式資料索引提出解法。讓工程師在使用我們的系統時可以避免繁雜的資料關聯式運算。
Distributed system is playing an important role in large-scale data processing like data mining, web data processing, even the science computing. Applying distributed systems to data processing can lead to fantasy result. Traditional relational database provides us with convenient commands to deal with data. This allows programmers to ignore the complicated data storing and processing when they develop a new application. However, traditional relational database system costs a lot when dealing with large-scale data. Sometimes it makes companies hard to make a decision on whether a company is going to expand a new database system. HBase is a distributed storage system which can manage structured data with column-based storing method. Compared to traditional relational database system, HBase can store data scale to a very large size with a distributed, persistent multidimensional sorted map indexed by a row key, column key, and a timestamp. In this paper, we want to apply HBase to the functionality of relational database system. However, there exists some legacy code in the original database application, which takes time to rewrite. To solve this problem, we propose an architecture to let programmers replace their relational database step by step. In our study, we also provide a solution for some relational operations in HBase. In our experiment, the new architecture can handle large scale data and has good performance in processing.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079755576
http://hdl.handle.net/11536/45921
Appears in Collections:Thesis