BigExplorer: A Configuration Recommendation System for Big Data Platform

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Yeh, Chao-Chun	en_US
dc.contributor.author	Zhou, Jiazheng	en_US
dc.contributor.author	Chang, Sheng-An	en_US
dc.contributor.author	Lin, Xuan-Yi	en_US
dc.contributor.author	Sun, Yichiao	en_US
dc.contributor.author	Huang, Shih-Kun	en_US
dc.date.accessioned	2018-08-21T05:56:50Z	-
dc.date.available	2018-08-21T05:56:50Z	-
dc.date.issued	2016-01-01	en_US
dc.identifier.issn	2376-6816	en_US
dc.identifier.uri	http://hdl.handle.net/11536/146723	-
dc.description.abstract	With the complexity big data platform architectures, data engineer provides the infrastructure with computation and storage resource for data scientist and data analyst. With those supports, data scientists can focus their domain problem and design the intelligence module (e.g., prepare the data, select/train/tune the machine learning modules and validate the result). However, there is still a gap between system engineer team and data scientists/engineers team. For system engineers, they don't have any knowledge about the application domain and the propose of the analytic program. For data scientists/engineers, they don't know the configuration of the computation system, file system and database. Some application performance issues are related with system configurations. Data scientist and data engineer do not have information and knowledge about the system properties. In this paper, we propose a configuration layer with the current big data platform (i.e., Hadoop) and build a configuration recommendation system to collect data, pre-process data. Based on the processed data, we use semi-automatic feature engineer to provide features for data engineers and build the performance model with three different machine learning algorithms (i.e., random forest, gradient boosting machine and support vector regression). With the same two benchmarks (i.e., wordcount and terasort), our recommended configuration archives remarkable improvement than rule of thumb configuration and better than their improvements.	en_US
dc.language.iso	en_US	en_US
dc.subject	big data platform	en_US
dc.subject	machine learning	en_US
dc.subject	configuration optimization	en_US
dc.title	BigExplorer: A Configuration Recommendation System for Big Data Platform	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.journal	2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI)	en_US
dc.citation.spage	228	en_US
dc.citation.epage	234	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000406594200031	en_US
顯示於類別：	會議論文