Saturday, March 6, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Introduction to Authorship Analysis as a Text Classification/Clustering Problem

September 2, 2019
in Data Science
Introduction to Authorship Analysis as a Text Classification/Clustering Problem
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Introduction:

The art and science of discriminating between writing styles of authors by identifying the characteristics of the persona of the authors and examining articles authored by them is called Authorship Analysis. It aims to determine characteristics of an individual like age, gender, native language and personality traits based on “available information” pertaining to that individual.

In this article, “available information” refers to textual data only in the context of authorship analysis, however, information in this context could go beyond textual format as it might also involve usage of multi-modal observations. Multi-modal observations capture characteristic features such as voice, intonation, gestures, body posture and other physical behavioral aspects of an individual. A combination of all these characteristics reflects the persona of an individual and consequently helps in profiling that individual. In most cases, multi-modal data are sourced from videos which are then quantified to machine readable as well as processable format.

You might also like

A Plethora of Machine Learning Articles: Part 2

The Effect IoT Has Had on Software Testing

Why Cloud Data Discovery Matters for Your Business

Application Areas:

Why authorship analysis is important? It plays a crucial role in forensic analysis and crime investigation. Besides, social media and the open web resources have invited a wide set of cyber crimes — fake profile creations, fake reviews by bots, plagiarism, dark web websites facilitating networked and organised terror, discerning terrorist proclamations, harassment and intimidation through social media messaging to name a few. [1]

Understanding consumer profiles and feedback analysis is paramount to Market Analysis and intends to examine the demographics of the author of anonymous feedback. The source of the raw texts could be blogs, online product reviews or social media forums. [3]

Other application areas include resolving disputes in authorship of novels, plagiarism detection, document dating, examining socio-economic factors and mental health examination.

Text Classification Tasks involved in Authorship Analysis:

Different objectives or tasks work towards a common goal of authorship analysis. The three major tasks are — Author Attribution, Author Verification and Author Profiling.

i) Author Attribution: Author Attribution is determining that, after investigating a collection text from multiple authors of unequivocal authorship, if an unforeseen text was written by a particular individual. This is ideally a closed-set multi-class text classification problem. [2]

ii) Author Verification: This task determines whether an individual has authored a piece of text or not by studying a corpora of the same author. This is a binary single-label text classification problem statement. Although, this task seems easy, author verification is a far more complicated process in real.

iii) Author Profiling: Author profiling could also be recognized as personality identification of an author by studying the authored texts. This involved predicting demographic features like gender, age, native language and personality traits of an author from examining their writing styles [1]. Author profiling can be viewed as a multi-class multi-label text classification and a clustering problem. This is a potential clustering problem because we aim to identify homogeneous writing styles and cluster them together for similarity analysis in the given corpus.

Each of these tasks are extensible depending on the kind of problem statement they are used for in the real world. Sometimes, these tasks overlap the objectives of each other.

These tasks are not limited to English as a language in automatic authorship analysis. Computerized applications are developed for other languages such as Greek, French, Dutch, Spanish and Italian.[2, 3]

References:

[1] Reddy, T. Raghunadha, B. Vishnu Vardhan, and P. Vijaypal Reddy. “A survey on authorship profiling techniques.” International Journal of Applied Engineering Research 11.5 (2016): 3092-3102.

[2] Stamatatos, Efstathios, et al. “Overview of the author identification task at PAN 2014.” CLEF 2014 Evaluation Labs and Workshop Working Notes Papers, Sheffield, UK, 2014. 2014.

[3] Stamatatos, Efstathios, et al. “Overview of the pan/clef 2015 evaluation lab.” International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, Cham, 2015.

[4] Rangel, Francisco, et al. “Overview of the author profiling task at PAN 2013.” CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT, 2013.


Credit: Data Science Central By: Nabanita Roy

Previous Post

SGV AI and Machine Learning

Next Post

Cisco releases guides for incident responders handling hacked Cisco gear

Related Posts

A Plethora of Machine Learning Articles: Part 2
Data Science

A Plethora of Machine Learning Articles: Part 2

March 4, 2021
The Effect IoT Has Had on Software Testing
Data Science

The Effect IoT Has Had on Software Testing

March 3, 2021
Why Cloud Data Discovery Matters for Your Business
Data Science

Why Cloud Data Discovery Matters for Your Business

March 2, 2021
DSC Weekly Digest 01 March 2021
Data Science

DSC Weekly Digest 01 March 2021

March 2, 2021
Companies in the Global Data Science Platforms Resorting to Product Innovation to Stay Ahead in the Game
Data Science

Companies in the Global Data Science Platforms Resorting to Product Innovation to Stay Ahead in the Game

March 2, 2021
Next Post
Cisco releases guides for incident responders handling hacked Cisco gear

Cisco releases guides for incident responders handling hacked Cisco gear

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Mazafaka — Elite Hacking and Cybercrime Forum — Got Hacked!
Internet Privacy

Mazafaka — Elite Hacking and Cybercrime Forum — Got Hacked!

March 6, 2021
Autonomous Cars And Minecraft Have This In Common  
Artificial Intelligence

Autonomous Cars And Minecraft Have This In Common  

March 5, 2021
The ML Times Is Growing – A Letter from the New Editor in Chief – Machine Learning Times
Machine Learning

Explainable Machine Learning, Model Transparency, and the Right to Explanation « Machine Learning Times

March 5, 2021
FTC joins 38 states in takedown of massive charity robocall operation
Internet Security

FTC joins 38 states in takedown of massive charity robocall operation

March 5, 2021
Google Cloud Certifications — Get Prep Courses and Practice Tests at 95% Discount
Internet Privacy

Google Cloud Certifications — Get Prep Courses and Practice Tests at 95% Discount

March 5, 2021
Three Finalists Selected in $4.5 Million Watson AI XPrize Competition  
Artificial Intelligence

Three Finalists Selected in $4.5 Million Watson AI XPrize Competition  

March 5, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Mazafaka — Elite Hacking and Cybercrime Forum — Got Hacked! March 6, 2021
  • Autonomous Cars And Minecraft Have This In Common   March 5, 2021
  • Explainable Machine Learning, Model Transparency, and the Right to Explanation « Machine Learning Times March 5, 2021
  • FTC joins 38 states in takedown of massive charity robocall operation March 5, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates