完整後設資料紀錄
DC 欄位語言
dc.contributor.author黃郁珊zh_TW
dc.contributor.author楊千zh_TW
dc.contributor.author陳安斌zh_TW
dc.contributor.authorHuang, Yu-Shanen_US
dc.contributor.authorYang, Chyanen_US
dc.contributor.authorChen, An-Pinen_US
dc.date.accessioned2018-01-24T07:39:43Z-
dc.date.available2018-01-24T07:39:43Z-
dc.date.issued2017en_US
dc.identifier.urihttp://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070463423en_US
dc.identifier.urihttp://hdl.handle.net/11536/140763-
dc.description.abstract在這篇論文中,我們使用了機器學習的二元分類法來偵測重覆的廣告。對於購物網站來說,同一個產品的重覆廣告,會在許多層面上造成損傷,包含惡化買家使用者體驗,及增加網站營運成本。重覆廣告偵測的目標在於,給定兩個廣告,是否能判別所廣告的是相同的商品,而這是一個二元分類問題。 我們在一個公開的機器學習競賽網站Kaggle.com上,得到一份由俄國公司Avito所公開的訓練資料,並以此做為我們研究的資料集。透過架設分散式的Spark框架,我們使用了決策樹、隨機森林、單純貝氏、邏輯回歸、支援向量機及類神經網路來解決這個問題。我們使用了獨熱編碼及word2vec技術做特徵擷取。藉由接收者操作特徵下的曲面面積,我們得以驗證這些做法的有效性。zh_TW
dc.description.abstractIn this paper, we use binary classification algorithms by machine learning, to detect duplicate advertisement. For shopping websites, duplicate advertisements of the same product harm both buyers and sellers in ways like introducing bad user experience and increasing cost of website owners. The goal of duplicate advertisement detection is to determine whether a pair of advertisement is about a same product, and is a problem of binary classification. We obtain the data on a public machine learning competition website Kaggle.com, where a Russian company Avito provided many training data. By setting up a distributed Spark framework, we use decision tree, random forest, naive Bayes, logistic regression, support vector machine, and artificial neural network to solve this problem. We extract feature by means of one-hot encoding and word2vec. The result evaluated by area under ROC curve indicates the validity of these methods.en_US
dc.language.isozh_TWen_US
dc.subject二元分類zh_TW
dc.subject重複廣告偵測zh_TW
dc.subject分散式機器學習zh_TW
dc.subjectBinary classificationen_US
dc.subjectduplicate advertisement detectionen_US
dc.subjectdistributed machine learningen_US
dc.title以Apache Spark平台為基礎之重複廣告分析zh_TW
dc.titleDuplicate Advertisements Detection on Apache Spark Platformen_US
dc.typeThesisen_US
dc.contributor.department管理學院資訊管理學程zh_TW
顯示於類別:畢業論文