標題: | MapReduce 工作執行效能、可靠性、能源耗費與容錯之研究 Study of Job Execution Performance, Reliability, Energy Consumption, and Fault Tolerance in the MapReduce Framework |
作者: | 林佳純 Lin, Jia-Chun 陳穎平 呂芳懌 Chen, Ying-ping Leu, Fang-Yie 資訊科學與工程研究所 |
關鍵字: | MapReduce;Hadoop;工作完成可靠性;工作執行時間;工作能源耗費;單點錯誤;容錯;MapReduce;Hadoop;job completion reliability;job turnaround time;job energy consumption;single-point-of-failure;fault tolerance |
公開日期: | 2015 |
摘要: | 在一個大型的MapReduce叢集系統運作過程中,機器可能因為許多因素而發生故障。為了避免MapReduce工作因為機器故障而被迫中止,MapReduce採用多種策略,例如:任務重新執行策略、中間結果備份策略以及reduce任務分配策略等,來防止這件事發生。然而上述策略對於MapReduce工作的影響是不清楚的,特別是對工作完成可靠性、工作執行時間以及工作能源耗費而言。在本論文中,工作完成可靠性指的是一個MapReduce工作能被一個MapReduce叢集系統執行完成的可靠性、工作執行時間指的是該工作在該叢集系統上執行所需的時間、而工作能源耗費指的是該工作在該叢集系統上執行所耗費的電力能源。為了達到一個更可靠以及更節省能源的計算環境,深入了解上述影響是必需的。此外,MapReduce主伺服器具有單點故障的問題。當它發生故障時,整個MapReduce叢集系統的運作以及服務會因而中斷。為了探討上述不同MapReduce策略對於MapReduce工作之效能影響,在本論文中,我們深入地分析MapReduce工作在不同MapReduce策略之下的工作完成可靠性、工作執行時間以及工作能源耗費等表現。另外,為了解決單點故障問題,本論文也提出一個前攝性與自適應冗餘系統 (Proactive and Adaptive Redundant System,縮寫為PAReS) 來減輕MapReduce主伺服器之單點故障問題,並且同時改善其服務品質。我們的分析結果能夠幫助MapReduce管理者深入了解這些策略所造成的影響、協助MapReduce管理者針對其MapReduce叢集系統選擇適當的MapReduce策略以提升MapReduce工作之效能,以及有助於MapReduce架構設計人員設計出更適合MapReduce之策略。此外,根據我們的實驗結果指出,本論文提出的PAReS能夠有效改善MapReduce主伺服器之單點故障問題,以及大幅提升其服務品質。 Node/machine failure is the norm rather than an exception in a large-scale MapReduce cluster. To prevent jobs from being interrupted by machine/node failures, MapReduce has employed several policies, such as task-reexecution policy, intermediate-data replication policy, reduce-task assignment policy. However, the impacts of these policies on MapReduce jobs are not clear, especially in terms of Job Completion Reliability (JCR for short), Job Turnaround Time (JTT for short), and Job Energy Consumption (JEC for short). In this dissertation, JCR is the reliability with which a MapReduce job can be completed by a MapReduce cluster, JTT is the time period starting when the job is submitted to the cluster and ending when the job is completed by the cluster, and JEC is the energy consumed by the cluster to complete the job. To achieve a more reliable and energy-efficient computing environment than current MapReduce infrastructure, it is essential to comprehend the impacts of the above policies. In addition, the MapReduce master servers suffer from a single-point-of-failure problem, which might interrupt MapReduce operations and filesystem services. To study how the above polices influence the performances of MapReduce jobs, in this dissertation, we formally derive and analyze the JCR, JTT, and JEC of a MapReduce job under the abovementioned MapReduce policies. In addition, to mitigate the single-point-of-failure problem and improve the service qualities of MapReduce master servers, we propose a hybrid takeover scheme called PAReS (Proactive and Adaptive Redundant System) for MapReduce master servers. The analyses in this dissertation enable MapReduce managers to comprehend the influences of these policies on MapReduce jobs, help MapReduce managers to choose appropriate MapReduce policies for their MapReduce clusters, and allow MapReduce designers to propose better policies for MapReduce. Furthermore, based on our extensive experimental results, the proposed PAReS system can mitigate the single-point-of-failure problem and improve the service qualities of MapReduce master servers as compared with current redundant schemes on Hadoop. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079755818 http://hdl.handle.net/11536/125846 |
Appears in Collections: | Thesis |