Thursday, April 15, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Big Data

The new Cloudera-Hortonworks Hadoop: 100 percent open source, 50 percent boring

March 25, 2019
in Big Data
The new Cloudera-Hortonworks Hadoop: 100 percent open source, 50 percent boring
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Credit: ZDnet

You might also like

Apache Software Foundation retires slew of Hadoop-related projects

Weaviate is an open-source search engine powered by ML, vectors, graphs, and GraphQL

MinIO simplifies onramps to do-it-yourself hybrid cloud object storage

The future of cloud-based services
Jason McGee, IBM fellow, VP and CTO, IBM cloud platform, talks about how IBM continues to grow within the open-source community.

Hadoop is the operating system for big data in the enterprise. So when Cloudera and Hortonworks, the two leading Hadoop distributions and vendors, merged, that was big news in and by itself. Last week’s DataWorks Summit Europe was the first big public event for the new Cloudera after the merger, and it sure was not short of interesting news, both on the technology and the business front.

Also: Cloudera eyes Cloudera Data Platform launch 

First off, in case you’re wondering, it’s all Cloudera from now on. That’s the name the new company will go by, and there’s a new-ish logo and branding to go with this too. DataWorks historically was a Hortonworks event, and a few people noted they will miss those Hadoop elephants.

If anecdotal evidence is anything to go by, however, keeping Cloudera as the new name may be a good choice business-wise: In a quick poll we did with people beyond the Hadoop scene, more of them seemed to be familiar with Cloudera as a brand than Hortonworks. 

Now, there are many aspects in every merger, and many hard decisions to be made. The brand name decision may have been made in favor of Cloudera, but if you think that means Cloudera has set the tone in the new company, well, maybe not so fast.

Cloudera drops the bomb: From open core to 100 percent open source

For any open source company these days, how to go about their business model and licensing is probably the most important decision to be made. As we have argued, and as fellow ZDNet contributor and Ovum analyst Tony Baer recently noted, too, open source is becoming the new default business model for enterprise software. It has proven to be a better model, for a number of reasons.

As Baer, and a number of others have pointed out in the past, enterprise software vendors based on a 100 percent open source model find it very hard to scale. It essentially means the only viable pathway for revenue is services. This is why, in the cloud era, open source enterprise software vendors practically have to choose between two strategies.

The first one is to go open core. That is to not open source the entirety of their software, but keep some parts of it proprietary and charge for those. The second one is to keep all the software open source, but rely on offering it as a service in the cloud for revenue. Cloudera used to be a firm believer in open core. That’s not the case anymore, so let’s ponder on what this means, and how it could play out. 

During the initial briefing on the analyst day organized in DataWorks before the event kicked off officially, Cloudera executives made some statements on new developments, strategy, etc. As part of those, they repeatedly referred to a “100 percent open source platform.”

Had this been the old Hortonworks days, nobody would have bat an eyelash. But as Cloudera has historically been a strong proponent of the open core strategy, asking for clarification was in order. So, we had the fortune of hearing the bombshell news first: the new platform will be 100 percent open source. Does this mean we’ll see a Commons Clause in its license? No comment on that from Cloudera executives.

If there is no Commons Clause, what’s to stop AWS from appropriating Cloudera’s codebase, as it has done with others before it? If there is one, what does it mean for the shifting open source licensing battleground? This does warrant further analysis, and we will embark on it. Teaser for open source enthusiasts: hold your horses. But let’s first see what this new Cloudera platform will be, exactly.

The new Cloudera platform: It will get complicated before it gets boring

Many people we spoke to in DataWorks were of the opinion that the merger made a lot of sense, and this is something we share. Cloudera executives themselves pointed out that there was something like 75 percent overlap in the clients the two companies were competing for, as well as in the codebase they were developing. But that does not necessarily mean integration will be easy.

There were a lot of Xs in Cloudera’s marchitecture slides. And in that case, X did not mark the spot, neither did it stand for some piece of the platform targeted for obsolescence. The idea is that no component will be thrown out of the new platform. Current users will continue to get support for their distribution, be it Cloudera or Hortonworks. 

Also: Cloudera’s Hilary Mason: To make AI useful, make it more boring

The goal is for the new, merged platform to be available in Q2 2019. When this happens, customers will be offered a clear migration path. But they will also have the option to keep using their current distribution. Eventually, the idea is that the codebase will merge and everyone will be on the same platform. But that’s going to be complicated. 

bored-businessman-ios-9.gif

Merging codebases, and catering to things such as data management, governance, and security, may come across as boring. But this is what it takes to be the data fabric for the enterprise.

http://www.zdnet.com/

Let’s take one example to see how this would work. To access data stored in Hadoop using SQL today, Cloudera and Hotonworks users rely on Impala and Hive, respectively. In the short term, the new Cloudera platform will integrate and support both. In the mid-to-long term, the goal is to have one solution there. That explains all those Xs — not even the names of the new, integrated components have been figured out yet.

What seems to have been figured out, however, is the focus of the new company: It’s all about the enterprise. According to Cloudera executives, Cloudera is not interested in getting your local bank in its clientèle. It’s only interested in its parent bank, or holding company, which it most likely has in its clientèle already. In other words, Cloudera is going boring. 

Also: Cloudera and Hortonworks’ merger closes; quo vadis Big Data?

The “make AI boring” notion was something Hilary Mason, Cloudera’s GM of Machine Learning, shared with ZDNet’s Andrew Brust before sharing with the DataWorks crowd. The essence of Mason’s plea is something others have argued for as well: the cool machine learning algorithms are just a part of the so-called AI stack, and not even a big part for that matter.

To make this work, the infrastructure to collect the data needed to train the algorithms and to deploy them in production is needed. Things like data management, governance and security. These may sound boring, but it’s the substrate that cool algorithms need to work in production environments. This is what Cloudera is aiming for, and managing Hadoop, the platform on which tons of data in the enterprise live, is at least 50 percent of the job.

Hadoop is passe: It’s all about the enterprise data cloud operating system for AI and the cloud era

Ah, Hadoop. Not many people use the “H word” much these days. Perhaps having a legacy, and code to go with it, is part of being boring. Hadoop certainly has it anyway. To recap: Hadoop’s original main premises seem increasingly less relevant today. Cloudera promotes its platform not as Hadoop, but as an enterprise data cloud. Making the point that it is in a position to leverage both on premise and in the cloud resources. This is a common argument from data platform vendors to differentiate from cloud vendors.

Hadoop was built to deal with chunks of data on premise, organized in big files. To do this, the idea was to co-locate compute and storage, organizing storage around HDFS and compute around MapReduce, based on a batch processing abstraction. 

Also: Cloudera Machine Learning release takes cloud-native path

  Today, data seems to be organized in many small files, often originating and/or stored in the cloud — think S3. MapReduce has proven to be a rather cumbersome API, although lots of higher-level APIs have been built on top of it. Batch processing has not gone away, but increasingly, real-time processing is becoming the norm.

Hadoop was built to disrupt data warehouses, dealing with their inefficiencies. It did that, bringing a wave of innovation which also transformed data warehouses, at least in part. Data warehouses are still around, but this time it’s Hadoop that’s being disrupted by the cloud, AI, and real-time processing. The question is: Can Hadoop react fast enough to avoid being the data warehouse of the future — i.e. still around, but not as relevant in a few years? 

go-elephant-go.jpg

Hadoop is moving forward, reinventing its core premises

http://www.zdnet.com/

Unlike data warehouses, Hadoop is in a better position to deal with disruption. Its key strengths are open source, and decoupled architecture. Open source means the pace of innovation can be faster, and decoupled architecture means its components can change in parts, while the whole can remain in place. The parts of Hadoop on the forefront of this race today are Ozone, Submarine, and the push toward a cloud-native platform.

Ozone is the codename for the ongoing work to enable Hadoop to operate seamlessly across HDFS and S3. Last year, it was Sanjay Radia, one of Hortonworks co-founders, who introduced Ozone. Now, it’s Marton Elek, Hortonworks lead software engineer who presented the latest in Ozone. We observed it seems like Ozone is essentially recreating S3 on premise. Elek concurred, adding Ozone will be consistent, not eventually consistent.

Also: Cloudera and Hortonworks merger: A win-win for all

Submarine is the codename for the ongoing work to make distributed deep learning/machine learning applications easily launched, managed, monitored in Hadoop. Support for GPUs in Hadoop was already there, now Submarine is working on improvements such as container-DNS support, scheduling on YARN, etc.

YARN, Hadoop’s job scheduler, is another key area for Hadoop’s future. YARN is mature and battle-tested, but if Hadoop wants to be cloud-native, it will have to adapt to using Kubernetes, which comes with its own scheduler. We briefly discussed this with Tristan Zajonc, Cloudera’s CTO for Machine Learning. Zajonc presented future directions for Cloudera Data Science Workbench, and much of this revolved around Kubernetes.

Our takeaway is that those projects will likely move faster than the core Hadoop codebase. They are the front runners needed to bring Hadoop in the AI and cloud era. At this point, Hadoop’s core value proposition seems to be this “boring” middle layer, unifying data access in the enterprise. That’s no small feat, and should be enough to grant the new Cloudera a spot in the enterprise.

But to stay relevant in the long term, the platform formerly known as Hadoop should be able to make rapid progress on those fronts. We’ll be here to keep track.

Credit: ZDnet

Previous Post

EDP Invests in Presenso, ML-Based Predictive Maintenance Solution Provider

Next Post

How to be an Artificial Intelligence (AI) Expert?

Related Posts

Apache Software Foundation retires slew of Hadoop-related projects
Big Data

Apache Software Foundation retires slew of Hadoop-related projects

April 14, 2021
Weaviate is an open-source search engine powered by ML, vectors, graphs, and GraphQL
Big Data

Weaviate is an open-source search engine powered by ML, vectors, graphs, and GraphQL

April 8, 2021
MinIO simplifies onramps to do-it-yourself hybrid cloud object storage
Big Data

MinIO simplifies onramps to do-it-yourself hybrid cloud object storage

April 7, 2021
Trifacta goes all in on the cloud
Big Data

Trifacta goes all in on the cloud

April 6, 2021
Cloudera Data Platform hits Google Cloud
Big Data

Cloudera Data Platform hits Google Cloud

March 31, 2021
Next Post

How to be an Artificial Intelligence (AI) Expert?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

New JavaScript Exploit Can Now Carry Out DDR4 Rowhammer Attacks
Internet Privacy

New JavaScript Exploit Can Now Carry Out DDR4 Rowhammer Attacks

April 15, 2021
Sailthru Announces Machine Learning Features for Improved Lifecycle Optimization
Machine Learning

Sailthru Announces Machine Learning Features for Improved Lifecycle Optimization

April 14, 2021
Data Labeling Service — How to Get Good Training Data for ML Project? | by ByteBridge | Apr, 2021
Neural Networks

Data Labeling Service — How to Get Good Training Data for ML Project? | by ByteBridge | Apr, 2021

April 14, 2021
The Search Engine Land Awards are open: Wednesday’s daily brief
Digital Marketing

The Search Engine Land Awards are open: Wednesday’s daily brief

April 14, 2021
Six courses to build your technology skills in 2021 – IBM Developer
Technology Companies

IBM joins Eclipse Adoptium and offers free certified JDKs with Eclipse OpenJ9 – IBM Developer

April 14, 2021
Cyber criminals are installing cryptojacking malware on unpatched Microsoft Exchange servers
Internet Security

Cyber criminals are installing cryptojacking malware on unpatched Microsoft Exchange servers

April 14, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • New JavaScript Exploit Can Now Carry Out DDR4 Rowhammer Attacks April 15, 2021
  • Sailthru Announces Machine Learning Features for Improved Lifecycle Optimization April 14, 2021
  • Data Labeling Service — How to Get Good Training Data for ML Project? | by ByteBridge | Apr, 2021 April 14, 2021
  • The Search Engine Land Awards are open: Wednesday’s daily brief April 14, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates