標題: 利用中介類別來提升購物網站產品分類和產品比對
Enhancing Product Classification and Matching for B2C Websites by Middle Category
作者: 張益洲
Yi-Chou Chang
吳毅成
I-Chen Wu
資訊學院資訊學程
關鍵字: 向量空間模型;類別;比對;相似度;VSM;Category;Matching;Similarity
公開日期: 2005
摘要: 目錄型索引」(Directory Style)是常見的網站資料模式,以購物網站為例,將產品依照類別予以分類,能提升產品資料處理之效率。產品比對問題是網站資料處理很基本的問題之一,其目的是找出兩個網站中哪些產品是屬於同一種產品。因為二個網站之間,相關類別下的產品才有可能是屬於同一種產品,因此,找出網站之間的類別相似度,能提升產品比對的準確度與效率。 本論文提出了一種有關類別對應的改進方式。傳統的VSM(向量特徵值模式)無法準確的比較兩個網站的類別相似度,這是因為類別下的產品總數太少或是產品標題的不一致,會使得實際上相似的類別,其向量特徵卻完全不同。為了改善這個問題,本論文提出一種中介類別(Middle Category)的概念,中介網站是一個具有一般化特徵的目錄網站,透過與中介網站的類別比對,其他網站的類別比對的準確度就得以提高。 本論文探討作為中介類別所必須具備的特徵,並證明透過中介類別的資訊,可提高網站之間的類別相關度。本論文的實驗結果顯示,透過中介類別,網站的類別在做相似度的比對後,相同的類別透過用LMCS(Longest and Most Common Segments)做產品比對,得出的回收率、精確度、相似度等都可以達到88%以上。
“Directory style” is the most common website data mode. Take shopping websites, for example; classifying products by catalogue can promote the efficiency of product data processing. The problem of product matching is one of very fundamental problems of the website data processing and the purpose of the product classification is to find out which products are belonged to the same kind of products from two websites. Products which are under the related category are possible to be the same kind of products because of product between two websites. Hence, finding out the similarity of the category between websites can elevate the accuracy and efficiency of product matching. This thesis comes up with a kind of improvement method about the categorical correspondence. The traditional VSM (Vector Space Model) can not compare the similarity of two websites accurately. This is because product amount is too few or the product title is inconsistent under the category and will make the similarity of categories be totally different from VSM (Vector Space Model). In order to improve this problem, the thesis proposes a concept of Middle Category which is a catalogue website has general characteristics. Through categorical matching with Middle Category, the accuracy of categorical matching of other websites will increase. This thesis discusses characteristics which are necessary for being Middle Category. In addition, the thesis also proves that similarity of categories will increase by the information of Middle Category. The result of the thesis’ experiment manifests that after categories of websites do the similarity matching by Middle Category; the same categories utilize LMCS (Longest and Most Common Segments) to do product matching that increases recall, precision, and similarity over eighty eight percent.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009267596
http://hdl.handle.net/11536/77769
顯示於類別:畢業論文