Abstract
This paper compares Suppport Vector Machine (SVM) classification and a number of clustering approaches to separate human from not human users in Twitter in order to identify normal human activity. These approaches have similar F1 accuracy scores of 90% with both experiencing difficulties in classifying human users behaving abnormally. A second stage classification step was then used to further separate not human users into brands, celebrities and promoters/information achieving an average F1 accuracy of 74%. These accuracies were achieved by reducing the size of the feature space using stepwise feature selection and category balancing from manual inspection of classification results.
Original language | English |
---|---|
Pages (from-to) | 224-231 |
Number of pages | 8 |
Journal | Procedia Computer Science |
Volume | 53 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2015 |
Event | INNS Conference on Big Data 2015 - San Francisco, United States Duration: 8 Aug 2015 → 10 Aug 2015 |
Keywords
- Clustering
- Human
- SVM
ASJC Scopus subject areas
- General Computer Science