Full metadata record
DC FieldValueLanguage
dc.contributor.author林志青zh_TW
dc.contributor.authorLIN JA-CHENen_US
dc.date.accessioned2016-03-28T08:17:52Z-
dc.date.available2016-03-28T08:17:52Z-
dc.date.issued2015en_US
dc.identifier.govdocMOST103-2221-E009-119-MY3zh_TW
dc.identifier.urihttp://hdl.handle.net/11536/130502-
dc.identifier.urihttps://www.grb.gov.tw/search/planDetail?id=11269553&docId=454777en_US
dc.description.abstract這個三年期計畫之目的是針對不同類型的資料,或不同類型的資料儲存方式,做各 型資料的分群設計。第一年是「k-means 分群技術之加速」與「漸進可調式的分類器」; 第二年是設計一種「群數不必先給的處理文字數字相混型資料之分群法」。第三年則是 允許資料散在多處且使用各資料所在處之分散式分群資源的「分散式資料分群法」。分 群結果整合時,同質資料與異質資料會以不同方式分別設計。 第一年的「k-means 分群法之加速」利用Holder 不等式與Minkowski 不等式之適當 修改,再搭配前處理,以使k-means 這種大眾熟悉的方法能更快。該年子題二「漸進可 調式的分類器」則可加速分群後的資料分類時間。第二年的「群數不必先給的處理文數 相混型資料之分群法」則是因有愈來愈多的機構,其資料同時出現數字及文字,例如銀 行、保險公司、政府、社群、婚友社、醫院之資料。該分群法會有益機構之資料分析或 客戶開發。第三年的「分散式分群」則是允許資料散在多處,且各處使用自己的分群法 做分群。中央則負責整合各分群結果。整合須考慮各處資料之同質與異質,以不同方式 設計。分散式分群是因中央政府的資料本來就常是地方政府搜集來的。zh_TW
dc.description.abstractThis is a 3 years’ project. The goal is to design various types of clustering techniques to deal with different kinds of data or distributed databases. In Year 1, the designs include the acceleration of k-means clustering, and progressive adaptive classifier. In Year 2, we will design a clustering method to deal with data containing both categorical and numeric types. The design does not require the users to provide the number of clusters as input. In Year 3, we design a distributed method which allows that the data are grabbed separately from different databases; and individual local clustering results of local data are done using local clustering methods. The integration of the local clustering results is then designed. The integration of same-property data and distinct-property data will be designed separately The acceleration of k-means in year 1 will use, but not limited to, proper adjustment of Holder’s inequality and Minkowski’s inequality. The second issue of year 1 is to design a progressive adaptive classifier which can reduces the classification time of new data. In Year 2, without knowing the number of clusters in advance, we design a clustering method to deal with mixed-type data formed of categorical data and numeric data; this is because mixed-type data occur much more often nowadays in bank, insurance company, government, social network, dating agency, hospital, etc. This kind of clustering design will benefit these offices or companies, in data analysis, data mining, recruiting new customers, etc. Finally, in Year 3, we design a distributed clustering method which allows the data to be stored in or grabbed from different places; and local clustering methods in distinct sites can be different. Such a design is often needed because the data size nowadays becomes larger and larger, and many data are originally created in local places. The data in Federal government is just the union of many data sets which are local in nature, i.e. originally created locally from cities throughout the country.en_US
dc.description.sponsorship科技部zh_TW
dc.language.isozh_TWen_US
dc.subject文字數字相混型資料zh_TW
dc.subject分散式資料zh_TW
dc.subject資料之快速分群zh_TW
dc.subject資料探勘zh_TW
dc.subject資料分類zh_TW
dc.subjectMixed-type data (categorical vsen_US
dc.subjectnumeric)en_US
dc.subjectDistributed dataen_US
dc.subjectfast clustering of dataen_US
dc.subject_x000d_ data miningen_US
dc.subjectdata classificationen_US
dc.title文字數字相混型資料與分散式資料庫之分群zh_TW
dc.titleClustering of Mixed-Types Data and Data from Distribured Databasesen_US
dc.typePlanen_US
dc.contributor.department國立交通大學資訊工程學系(所)zh_TW
Appears in Collections:Research Plans