Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter

doi:10.6688/JISE.201905_35(3).0011

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kumar, Sharath B. R.	en_US
dc.contributor.author	Wang, Kuochen	en_US
dc.contributor.author	Shen, Shi-Min	en_US
dc.date.accessioned	2019-06-03T01:08:38Z	-
dc.date.available	2019-06-03T01:08:38Z	-
dc.date.issued	2019-05-01	en_US
dc.identifier.issn	1016-2364	en_US
dc.identifier.uri	http://dx.doi.org/10.6688/JISE.201905_35(3).0011	en_US
dc.identifier.uri	http://hdl.handle.net/11536/151988	-
dc.description.abstract	With the use of the Internet, mobile platforms, online commerce, and social media services, the footprints of human behavior can be easily recorded in the digital world, which generates data on an extremely large scale. Twitter as a big data social network becomes one of the most important sources for capturing up-to-date events happened in the world. Deriving topics from Twitter is important for various applications, such as situation awareness, market analysis, content filtering, and recommendations. However, topic derivation with high purity in Twitter is hard to achieve because tweets are limited to 140 characters. Previous works on topic derivation in Twitter suffer from low purity. In this paper, we propose corpus-based topic derivation (CTD) approach that combines a Twitter corpus and LF-LDA, which is a text processing model to identify topics and clusters of similar hashtags. We use asymmetric topic LF-LDA to obtain better purity of topics. Compared to intJNMF, a representative related work, the purity (F-measure) of our proposed CTD increases from 5.26% (27.81%) to 11.32% (34.28%) for 20 to 100 topics. We also propose a timestamp-based popular hashtags prediction (TPHP) approach by creating trending hashtags lists (THLs), which are lists of hashtags used by many users and make use of timestamps in tweets. We use the edit distance to find the difference between consecutive THLs. Then the difference can be used to calculate volatilety to find how people react to real world events. Compared to Hybrid+, a representative related work, the mean average precision (MAP) of our TPHP increases by 19.45% (week-day), 15.08% (week-week) and 16.95% (month-week).	en_US
dc.language.iso	en_US	en_US
dc.subject	corpus	en_US
dc.subject	popular hashtag prediction	en_US
dc.subject	timestamp	en_US
dc.subject	topic derivation	en_US
dc.subject	twitter	en_US
dc.title	Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter	en_US
dc.type	Article	en_US
dc.identifier.doi	10.6688/JISE.201905_35(3).0011	en_US
dc.identifier.journal	JOURNAL OF INFORMATION SCIENCE AND ENGINEERING	en_US
dc.citation.volume	35	en_US
dc.citation.issue	3	en_US
dc.citation.spage	675	en_US
dc.citation.epage	696	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000467782400012	en_US
dc.citation.woscount	0	en_US
Appears in Collections:	Articles