The Internet is home to countless forums in which individual users exchange fierce blows on supposedly neutral topics. As a private person, you ignore the majority of these comments, mostly rightly. From a business perspective, on the other hand, the negative and of course the positive utterances can be useful as long as they relate to a product or service.
For example, a software manufacturer could analyze social media channels after major updates to find out whether the users rate an update predominantly positively or rather negatively. Valuable insights can be gained with a sentiment analysis. The statements to be analyzed are available in unstructured text form. Developers cannot simply use them for machine learning, because this requires calculations and therefore numerical data. Before the ML algorithms can be let loose on the data, a few preparatory steps are necessary. A central part is the vectorization.
Vectorization describes the conversion of texts into vectors that represent the text in numerical format. This article describes an example of how data consisting of Reddit comments on Microsoft can be prepared and vectorized. Then you can analyze their sentiment, i.e. the mood. The main thing here is to take a closer look at the individual steps and especially the vectorization. The complete code for preparing this data and performing the analyzes is in two Jupyter-Notebooks (Download via GitHub)
- Access to all heise + content
- exclusive tests, advice & background: independent, critically well-founded
- Read c’t, iX, MIT Technology Review, Mac & i, Make, c’t photography directly in your browser
- register once – read on all devices – can be canceled monthly
- first month free, thereafter € 12.95 per month
- Weekly newsletter with personal reading recommendations from the editor-in-chief
Start FREE month
Start your FREE month now
Already subscribed to heise +?
Sign up and read
Register now and read articles right away
More information about heise +
Disclaimer: This article is generated from the feed and not edited by our team.
Credit: Google News