以使用者為中心的特徵值選擇應用在數位鑑識、影像檢索與內容服務

標題:	以使用者為中心的特徵值選擇應用在數位鑑識、影像檢索與內容服務 The Characteristic Feature Selection Based on User-Centric Adaptation for Digital Forensic,Image Retrieval And Content Service
作者:	蔡銘箴 TSAI MIN-JEN 國立交通大學資訊管理研究所
關鍵字:	數位鑑識;影像檢索;特徵選擇;機器學習;Digital forensic;Image indexing;Feature Selection;Machine learning
公開日期:	2012
摘要:	近來資訊數位化技術的進步，以及網路的多方流通性，使得人們可以輕易地獲取數位內容，並亦能結合網路無遠弗界之特性，可快速地傳送與分享各類的數位內容至世界各地。在這樣的時空環境背景下，如何對數位內容作鑑識，搜尋與內容的服務提供，成為一項非常重要的研究。對數位鑑識而言，如果數位媒體之產物想要追查其原始來源，則需從中找尋其內容組成之特徵，以有效的方式提供需求者作認證，讓其數位內容能受到保障。如果經由印表機列印出的數位影像或文件被有心人士濫用來列印與偽造，若想要像傳統實物照片般做為法庭上的證據，則保護這些印刷文件或提供有效證明數位設備來源之信賴度、可靠度與真實度，皆成為保護這些數位影像內容與提供法庭或原始作者作為證據的關鍵因素。同時，數位內容的搜尋與內容提供，皆與使用者的操作，需求，息息相關，也使得其資料的正確性，成為研究者必須注意的研究重點。雖然這些題裁，在研究領域上分屬不同的科別，但經過本計畫申請人的深入瞭解，發現可以使用『特徵值選擇』，並以使用者為中心的訴求作探討，使技術可以作有效率的應用與發展，因此計畫申請人預計在以三年為期的專題研究中，在第一年，首先藉由相關文獻的研讀與整理方式，分析數位影像形成的相關影響因素，找出其代表性特徵外，並且分析、探討特徵值的製作，應用在彩色印表機的數位鑑識，進一步地探討印表機的機械特性，透過數位內容內所涵蓋的中英文字來輔助分析辨識印表機之特性，以提高辨識率。在第二年，計劃將引進適性化檢索的影像學習概念，以發展出一應用於數位影像之資訊搜尋的系統架構，除了能有效地分析找出最佳數位內容之特徵值，亦能提高其搜尋的正確率。適性化影像檢索的學習概念是指除了一般的訓練數位產物的特徵值之外，還會依照不同的數位設備影像的形成過程、機械特性、輸出成品及使用者的標記資訊，分析探討並進而學習檢索出其數位內容之特性，回饋給數位搜尋系統，以掌握適性化之特徵值，提高其整體的效能。第三年，利用適性化數位搜尋系統的架構，以不同數位設備為基準，依其所需求之數位內容來提供內容的服務，其過程是透過機器學習（machine learning）來分析評估數位設備與數位內容產物之間的關係方式，對使用者作學習並回饋數位設備搜尋系統，以獲取適性化的特徵值選擇與提高其服務的品質。所以，本研究將詳細檢視特徵值群組中造成相關性高低的原因，而影像特徵值的選擇，將透過實驗驗證與理論的探討，經由訓練與學習的過程，不斷地提高正確資訊的準確率。 Due to the fast development of digital technology and network communication, human beings can easily access a wide variety of digital contents and share them with worldwide friends. Under such circumstance, the studies of digital forensic, information retrieval and content services have lately become very important studies. For example, the digital forensic can trace the originality of the digital content source based on the characteristics of the content and provide an effective approach for the authentication with copyright protection. If the printed copies from printers want to serve as the evidence of the source in the court, the techniques to protect the digital image content and the source identification methods will need to be addressed. In addition, the user involvement is necessary for the search of digital information and content retrieval which are key issues during the research. Even those topics are categorized in different fields, the applicant of this proposal has found that they can be applied by “characteristic feature selection” as the main design attribute based on the user centric nature. Therefore, the techniques developed in this proposal can be systematically utilized in the above mentioned subjects. Consequently, the applicant plans a three year research project in this proposal. In the first year, the literature review and analysis of the digital imaging formation algorithms will be widely investigated. The characteristic features will be retrieved, analyzed and explored for the digital forensic of color printers. To further understand the effect of printer mechanic properties, the Chinese and English characters for printer source identification will be applied in order to improve the precision rate and speed up the whole processes. In the second year, this project will bring the learning adaptation concept for image indexing which will be developed for a digital image retrieval architecture. Not only to effectively obtain the prime features for the digital content, but also to increase the accuracy ratio. The adaptation is an important step to transform the original media into a new form which considers the image formation, mechanic property, the output product and user tag information. The relevance feedback will be interactively updated into the retrieval system to achieve higher hit rate eventually. The third year, the adaptive digital content retrieval architecture will provide flexible content service based on the request from different digital equipment. The process will be constantly evaluated through machine learning approach between the digital device and the digital content. The user centric nature is ubiquitously maintained for the best service quality. Hence, we will carefully examine which feature has better correlation for source content support through the experimental results and discuss why they can achieve high correlation relationship. After continuing training and testing, the characteristic feature selection will furnish the best information accuracy in the long run.
官方說明文件#:	NSC101-2410-H009-006-MY2
URI:	http://hdl.handle.net/11536/98815 https://www.grb.gov.tw/search/planDetail?id=2548213&docId=387471
顯示於類別：	研究計畫