Product Characterization from Customer Reviews

Abstract

We built a data processing pipeline which is able to extract from user reviews product characteristics such as quality or performance. Our approach uses part-of-speech tagging to retrieve basic impressions on products and extracts qualities by identifying the best bigram collocations based on point-wise mutual information and ratio of likelihood scores. The characteristics are segmented into positivity classes by analyzing user sentiments. The pipeline developed is applicable to any type of product and has direct real-world application possibilities.

Read the report

Browse the IPython Notebook

Browse the source code