標題: 導入巨量資料倉儲之設計與建置 - 以半導體封測公司為例
Design and Deployment of Big Data Warehouses - A Case Study of a Packaging and Testing Company
作者: 劉姍澄
劉敦仁
Liu, Shan-Cheng
Liu, Duen-Ren
管理學院資訊管理學程
關鍵字: 資料倉儲;大數據;YARN;HDFS;Hive;Data Warehouse;Big Data;YARN;HDFS;Hive
公開日期: 2016
摘要: 資料收集、彙整與分析成為企業進行大數據分析需面臨的挑戰,傳統資料倉儲技術無法有效分析處理巨量資料,如何因應巨量資料及導入巨量資料倉儲之設計與建置,是企業因應大數據分析提昇競爭力之重要議題。 傳統資料倉儲擁有即時、複雜運算的查詢能力,而分散式運算適合儲存大量資料和非結構化資料。本研究藉助分散式檔案系統(HDFS)與Hive服務儲存大量歷史性資料,查詢與分析不同種類的儲存資料,並探討比較Hadoop –Yarn、Spark SQL、 及Drill之巨量資料查詢。 本研究以個案公司的生產製程資料庫及工單資料庫為資料來源,為簡化導入流程與節省轉換成本,保留原有企業資料倉儲架構,並另新導入巨量資料倉儲,讓使用者查詢資料倉儲系統時加快維運作業速度及正確性。本研究著重於巨量資料的整理、彙整,以及快速得到分析結果,提出巨量資料倉儲系統架構解決方案,研究成果可提供企業規劃導入巨量資料倉儲系統之參考。
The IT industry is facing with new challenges for big data analytics in data collection, aggregation, and analysis. Traditional data warehousing techniques cannot effectively process and analyze big data. Accordingly, to cope with big data analytics and promote competitive advantages, it is important for enterprises to design and deploy big data warehousing systems. Conventional data warehousing techniques are capable of real-time process and complex computation of queries, while distributed computation techniques are suitable for storing and processing large amount and unstructured data. This research uses Hadoop Distributed File System (HDFS) and Hive to store, query and analyze huge and various kinds of data. Hadoop-Yarn framework, Spark SQL and Drill are investigated and compared for querying big data. The production process data and work order data, which are collected from the case company, are used to deploy big data warehousing systems. To simplify the process of implementation and reduce cost in system transition, a big data warehousing system is deployed, while retaining the original data warehouse architecture of the case company. Moreover, users are benefited with higher operation speed and accuracy for data collection, aggregation and analysis. This research focuses on aggregating and analyzing big data, and proposes a resolution for deploying big data warehousing systems. The research result is provided as a reference model for enterprises to plan and deploy big data warehousing systems.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070363404
http://hdl.handle.net/11536/139087
顯示於類別:畢業論文