Summarization of Online Discussions using Topic Extraction and Agreement Estimation
In this thesis, we propose a novel method for opinion mining and summarization of online web forum discussions that contain debates. We create summaries of the discussions by extracting a small percentage of the posts, aiming at maximizing topic coverage and bringing out the different viewpoints of the participants. To achieve this, we proceed in two main steps. First, we identify all the sub-topics of the discussion and divide it in sub-discussions. This enables us to cluster posts in groups that cover all the topics discussed. Second, we create groups of agreeing users and identify disagreements between them. This enables us to create summaries that can be used to quickly and efficiently identify the different groups of users, their opinions, their arguments and the point of friction between them. All summaries include metadata information that can be used for searching by traditional keyword matching algorithms. Initial experiments with randomly chosen web forum discussions and human evaluators have given very encouraging results and indicate the great potential of the overall approach and the specific algorithms.