文字數字相混型資料與分散式資料庫之分群

Full metadata record

DC Field	Value	Language
dc.contributor.author	林志青	zh_TW
dc.contributor.author	LIN JA-CHEN	en_US
dc.date.accessioned	2016-03-28T08:17:52Z	-
dc.date.available	2016-03-28T08:17:52Z	-
dc.date.issued	2015	en_US
dc.identifier.govdoc	MOST103-2221-E009-119-MY3	zh_TW
dc.identifier.uri	http://hdl.handle.net/11536/130502	-
dc.identifier.uri	https://www.grb.gov.tw/search/planDetail?id=11269553&docId=454777	en_US
dc.description.abstract	這個三年期計畫之目的是針對不同類型的資料，或不同類型的資料儲存方式，做各型資料的分群設計。第一年是「k-means 分群技術之加速」與「漸進可調式的分類器」; 第二年是設計一種「群數不必先給的處理文字數字相混型資料之分群法」。第三年則是允許資料散在多處且使用各資料所在處之分散式分群資源的「分散式資料分群法」。分群結果整合時，同質資料與異質資料會以不同方式分別設計。第一年的「k-means 分群法之加速」利用Holder 不等式與Minkowski 不等式之適當修改，再搭配前處理，以使k-means 這種大眾熟悉的方法能更快。該年子題二「漸進可調式的分類器」則可加速分群後的資料分類時間。第二年的「群數不必先給的處理文數相混型資料之分群法」則是因有愈來愈多的機構，其資料同時出現數字及文字，例如銀行、保險公司、政府、社群、婚友社、醫院之資料。該分群法會有益機構之資料分析或客戶開發。第三年的「分散式分群」則是允許資料散在多處，且各處使用自己的分群法做分群。中央則負責整合各分群結果。整合須考慮各處資料之同質與異質，以不同方式設計。分散式分群是因中央政府的資料本來就常是地方政府搜集來的。	zh_TW
dc.description.abstract	This is a 3 years’ project. The goal is to design various types of clustering techniques to deal with different kinds of data or distributed databases. In Year 1, the designs include the acceleration of k-means clustering, and progressive adaptive classifier. In Year 2, we will design a clustering method to deal with data containing both categorical and numeric types. The design does not require the users to provide the number of clusters as input. In Year 3, we design a distributed method which allows that the data are grabbed separately from different databases; and individual local clustering results of local data are done using local clustering methods. The integration of the local clustering results is then designed. The integration of same-property data and distinct-property data will be designed separately The acceleration of k-means in year 1 will use, but not limited to, proper adjustment of Holder’s inequality and Minkowski’s inequality. The second issue of year 1 is to design a progressive adaptive classifier which can reduces the classification time of new data. In Year 2, without knowing the number of clusters in advance, we design a clustering method to deal with mixed-type data formed of categorical data and numeric data; this is because mixed-type data occur much more often nowadays in bank, insurance company, government, social network, dating agency, hospital, etc. This kind of clustering design will benefit these offices or companies, in data analysis, data mining, recruiting new customers, etc. Finally, in Year 3, we design a distributed clustering method which allows the data to be stored in or grabbed from different places; and local clustering methods in distinct sites can be different. Such a design is often needed because the data size nowadays becomes larger and larger, and many data are originally created in local places. The data in Federal government is just the union of many data sets which are local in nature, i.e. originally created locally from cities throughout the country.	en_US
dc.description.sponsorship	科技部	zh_TW
dc.language.iso	zh_TW	en_US
dc.subject	文字數字相混型資料	zh_TW
dc.subject	分散式資料	zh_TW
dc.subject	資料之快速分群	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	資料分類	zh_TW
dc.subject	Mixed-type data (categorical vs	en_US
dc.subject	numeric)	en_US
dc.subject	Distributed data	en_US
dc.subject	fast clustering of data	en_US
dc.subject	_x000d_ data mining	en_US
dc.subject	data classification	en_US
dc.title	文字數字相混型資料與分散式資料庫之分群	zh_TW
dc.title	Clustering of Mixed-Types Data and Data from Distribured Databases	en_US
dc.type	Plan	en_US
dc.contributor.department	國立交通大學資訊工程學系（所）	zh_TW
Appears in Collections:	Research Plans