Saturday, March 6, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Machine Learning

Pangeanic: Over 10Bn Alignments for Machine Learning in 84 Languages

June 19, 2020
in Machine Learning
Pangeanic: Over 10Bn Alignments for Machine Learning in 84 Languages
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Language Technology and Human Translation firm Pangeanic announced today that it crossed the 10 billion aligned data segments mark in 84 languages, propelling the company forward in its mission to build and train new machine learning technologies.

You might also like

Explainable Machine Learning, Model Transparency, and the Right to Explanation « Machine Learning Times

How to Boost Machine Learning in Healthcare Market Compound Annual Growth Rate (CAGR)? – KSU

Comprehensive Report on Machine Learning Market 2021 | Size, Growth, Demand, Opportunities & Forecast To 2027

The company reached a new milestone last week when it confirmed it had successfully clocked the 10,200,054th segment, boosting its research and development capabilities for machine translation and Natural Language Processing (NLP) technologies.

Manuel Herranz, Pangeanic CEO, stated: “In an increasingly data-centric society, the value of companies is often derived by the quality of the data they manage, structure and produce. In order to be cutting-edge in machine translation, and in many other NLP disciplines, the value of human-approved data is essential. The best algorithm is worthless if it does not have millions of segments to learn from. Our automated data acquisition pipelines make our repositories a goldmine for data scientists.”

Advertisement


Pangeanic has carved a name for itself in the language technology space by developing cutting-edge algorithms, infrastructures and toolkits as well as leading data-focused European projects, most recently spearheading its 2020 European-wide anonymization project, built with state-of-the-art NLP tools.

Pangeanic and its sister division PangeaMT, have gathered and trained a diversified pool of data from different sources; including open source data, human-produced data, anonymizing data from public sources, crawling from websites, and even creating near-human, highly scalable in-domain synthetic data.

Pangeanic’s Chief Research Scientist Mercedes Garcia said: “Having reached this milestone is a great step forward for us because it means that we can automatically obtain  high-quality translations in many languages and domains.”

“Machine learning is an area of AI where data is the basic ingredient. Without data you can’t generate or build an automatic model or system. This is really the value of the company, having access to all this data.”

Pangeanic’s tech team uses this rich bank of data to train AI algorithms that partners, companies and institutions can benefit from. NTEU, the company’s recent European Commission-funded project, sees Pangeanic implementing Automatic Translation across Member States’ Public Administrations.

NTEU along with other Pangeanic projects are based on neural machine translation engines that require volumes of quality data the company farms daily to create a proprietary data repository. 

Pangeanic's Ms Garcia
Ms. Garcia

Ms Garcia said: “Neural networks imitate the behavior of a brain. Therefore, large amounts of data along with examples of sentences or segments are needed when training a neural model.” 

“Models based on machine learning learn by examples fed to them through data collected in datasets. Good results rely on high quality data, and domain specific data for particular applications.”

She explained Pangeanic’s data achieves high quality after it is rigorously cleaned as selected by the team, and edited by expert in-house translators who maintain, improve and grow the quality of the data to obtain “really near human results… sometimes scaringly human-like!”.

The company also boasts of a huge archive of in-domain data, specialised data for defined areas such as finance, banking, robotics, dialogs, social media and entertainment, medical and legal fields.

Ms Garcia said: “Acquiring in-domain data is extremely important in order to produce quality translations in specific areas. For example, having quality medical data is crucial for us to develop automatic systems for that specific field.”

“This is part of our competitive advantage, we are specialized in adapting systems to specific areas.” 

Pangeanic's Alex Kohan
Alex Kohan

Pangeanic’s Programmer and Data Analyst Alex Kohan agreed, noting Pangeanic’s capability to adapt language to different fields could also be said for adapting language styles and variants. 

He said: “If we would like Portuguese to sound more Brazilian for example, then we can build processes to adapt the data by including Brazilian-specific data.”

Mr Kohan outlined Pangeanic’s trained segments also consist of under-resourced languages the company’s team built in-house through automatic data gathering processes.

He said: “You need voluminous amounts of data samples to obtain quality machine translation, because when you clean data you may lose several thousands of segments. Although a percentage of some stock data may come from open repositories, it is usually not trustable because of the noise it may contain. Having the assurance and confidence that a dataset is X% reliable adds to better processes”.

“We build synthetic data where there is less source data available. This occurs when working with under-resourced languages such as Maltese or Irish Gaelic, as there is less original data available on the internet.”

Aside from collecting language data, Mr Kohan said Pangeanic will focus on expanding its data gathering efforts in 2020 by widening the remit of data it collects to train AI-based systems.

He said: “We have some exciting projects coming up, we’re looking at collecting different types of data, like voice for speech language translation and pictures and videos to automatically categorize them on a large-scale.”

Credit: Google News

Previous Post

a revolution in financial sectors

Next Post

What If We Made A Robot That Could Drive Autonomously?

Related Posts

The ML Times Is Growing – A Letter from the New Editor in Chief – Machine Learning Times
Machine Learning

Explainable Machine Learning, Model Transparency, and the Right to Explanation « Machine Learning Times

March 5, 2021
How to Boost Machine Learning in Healthcare Market Compound Annual Growth Rate (CAGR)? – KSU
Machine Learning

How to Boost Machine Learning in Healthcare Market Compound Annual Growth Rate (CAGR)? – KSU

March 5, 2021
Comprehensive Report on Machine Learning Market 2021 | Size, Growth, Demand, Opportunities & Forecast To 2027
Machine Learning

Comprehensive Report on Machine Learning Market 2021 | Size, Growth, Demand, Opportunities & Forecast To 2027

March 5, 2021
2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms
Machine Learning

2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

March 5, 2021
UVA doctors give us a glimpse into the future of artificial intelligence
Machine Learning

UVA doctors give us a glimpse into the future of artificial intelligence

March 5, 2021
Next Post
What If We Made A Robot That Could Drive Autonomously?

What If We Made A Robot That Could Drive Autonomously?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Zigbee inside the Mars Perseverance Mission and your smart home
Internet Security

Zigbee inside the Mars Perseverance Mission and your smart home

March 6, 2021
Mazafaka — Elite Hacking and Cybercrime Forum — Got Hacked!
Internet Privacy

Mazafaka — Elite Hacking and Cybercrime Forum — Got Hacked!

March 6, 2021
Autonomous Cars And Minecraft Have This In Common  
Artificial Intelligence

Autonomous Cars And Minecraft Have This In Common  

March 5, 2021
The ML Times Is Growing – A Letter from the New Editor in Chief – Machine Learning Times
Machine Learning

Explainable Machine Learning, Model Transparency, and the Right to Explanation « Machine Learning Times

March 5, 2021
FTC joins 38 states in takedown of massive charity robocall operation
Internet Security

FTC joins 38 states in takedown of massive charity robocall operation

March 5, 2021
Google Cloud Certifications — Get Prep Courses and Practice Tests at 95% Discount
Internet Privacy

Google Cloud Certifications — Get Prep Courses and Practice Tests at 95% Discount

March 5, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Zigbee inside the Mars Perseverance Mission and your smart home March 6, 2021
  • Mazafaka — Elite Hacking and Cybercrime Forum — Got Hacked! March 6, 2021
  • Autonomous Cars And Minecraft Have This In Common   March 5, 2021
  • Explainable Machine Learning, Model Transparency, and the Right to Explanation « Machine Learning Times March 5, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates