Thursday, March 4, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Big Data

.NET for Apache Spark brings enterprise coders and big data pros to the same table

April 15, 2020
in Big Data
.NET for Apache Spark brings enterprise coders and big data pros to the same table
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Enterprise software development and open source big data analytics technologies have largely existed in separate worlds. This is especially true for developers in the Microsoft .NET ecosystem. The reasons for this are many, including .NET’s Windows heritage and the open source analytics stack’s allegiance to Linux.  

But Microsoft’s .NET Core, already in its third major version, is cross-platform, running not just on Windows but also on Linux and macOS. And Apache Spark, which largely eclipsed Hadoop as the open source analytics poster child, has made its way into numerous Microsoft platforms, including its flagship SQL Server database and Azure Synapse Analytics, Redmond’s latest gambit in the cloud data warehouse wars. Despite these developments, coders on the Spark platform have largely stuck with Scala, Python, R and Java. What had been missing was something that connected the dots between .NET and Spark.

You might also like

Streamlining data science with open source: Data version control and continuous machine learning

Microsoft Ignite Data and Analytics roundup: Platform extensions are the key theme

At Microsoft Ignite, expanding reach for Azure ML & Purview, Power BI Premium

Also read:

Casting a .NET

All this changed a year ago, when, at the Spark and AI Summit, Microsoft introduced the preview of its .NET for Apache Spark framework, which provides bindings for developers using the C# and F# languages on the .NET platform. And that plot thickened a couple of weeks ago, when Microsoft extended .NET for Apache Spark to support in-memory .NET DataFrames, something Brigit Murtaugh, Program Manager for .NET for Apache Spark, announced in a blog post.

I’ve been involved with .NET since its it was still in its Alpha days 20 years ago. And I’ve been involved in the big data world for almost half that time. I’ve wanted to see these two worlds converge and have argued for such a union. That aside, I hadn’t really investigated the .NET for Apache Spark framework (hereafter, Spark.NET) until now, choosing instead to hobble along mostly in Python. Having now examined the framework more carefully, I like what I see and wanted to report back on it. The good news: Spark.NET works well and, beyond integrating the two technologies, makes their respective programming paradigms dovetail very nicely.

Getting started 

Microsoft has worked hard to make the Spark.NET barrier-to-entry quite low. Case in point: The .NET for Apache Spark Web site provides a big white “Get Started” button that guides developers through the process of installing the framework, creating a sample wordcount application and running it. It takes the developer through the installation of all required dependencies, configuration steps, installation of .NET for Apache Spark itself, and the creation and execution of the sample application.

The entire guided procedure is designed to take 10 minutes, and assumes little more than a clean machine as the starting environment.  In large part it succeeds (with the caveat that I had to research and manually set the SPARK_LOCAL_IP environment variable to localhost to get the sample to run on my Windows machine), and I have to say it’s quite a rush to get it working.

Pick your environment

The tutorial is designed to do everything from the command line, including editing an input text file and the C# code; compiling the application and running the .NET console application by calling Spark’s spark-submit utility. But experienced .NET developers who prefer to work in Visual Studio 2019 can use Spark.NET from there as well.

I verified this myself, in fact. After working through the Get Started tutorial, I created a new C# console application in Visual Studio 2019, used the NuGet package manager to add Spark.NET to my project, then replicated the coding steps in the command line-oriented Microsoft’s tutorial. After compiling everything in Visual Studio, I submitted the job to Spark and everything ran just fine. 

Ready for Spark prime time

After getting things to run locally on a dev machine, you’ll want to try running on a full-fledged Spark cluster. These days, that’s likely to be in the cloud. The tricky part is that you’ll need to make sure Spark.NET is installed on the cluster before your own code can run. Microsoft says Spark clusters on Microsoft’s own Azure HDInsight service, as well as Spark pools in Synapse Analytics (currently in preview) already have Spark.NET on board.

Beyond that, though, Microsoft provides explicit instructions for deploying .NET for Apache Spark to Azure Databricks *and* to the Databricks Unified Analytics Platform service that runs on Amazon Web Services. Still not impressed? Microsoft also provides installation instructions for AWS’ ubiquitous Elastic MapReduce (EMR) service.

Also read: Databricks comes to Microsoft Azure

You can deploy your .NET assembly to your Spark cluster and run it on a batch basis from the command line if you wish. But, for C# developers, Microsoft has also enabled the very common scenario of working interactively in a Jupyter notebook. That support includes a Jupyter notebook kernel that leverages the C# REPL (read-eval-print loop) technology which, is highly innovative in and of itself. Microsoft provides an F# kernel as well.

When you combine notebook support with Microsoft’s enabling of Spark.NET-based Spark SQL UDFs (user-defined functions), support for .NET DataFrames, and that implementation’s abstraction over Apache Arrow RecordBatch objects, you can see that Microsoft has worked hard not only to bring Spark into the .NET world, but also to bring .NET into each of several Spark programming use cases. It’s also made things perform well — Apache Arrow supports the sharing of columnar data in-memory, eliminating the overhead of converting the data into and out of different formats in order to process it.

Also read: Apache Arrow unifies in-memory Big Data systems

What’s the point?

Seasoned Spark developers are unlikely to switch from, say, Python to C# to do their work, and Microsoft has no illusions about that. But the number of lines of .NET code out there, created over the last 20 years, is staggering. Bringing even a small fraction of that code into the world of open source big data has a lot of value. So too does bringing the legions of .NET developers into the world of analyzing high-volume data sitting in data lakes, as well as the streaming data and machine learning use cases that Spark enables.

In other words, Microsoft’s goal here is to make the worlds of enterprise software development, analytics and data science converge. Blending those communities, use cases and skill sets, rather than leaving them in separate silos, is logical and laudable, so there’s that. But, more important, if we’re going to be serious about data-driven decision-making, pervasive data culture and digital transformation, unification of these communities and sub-disciplines must happen — doing so is critical, not discretionary.

What’s more, Microsoft is integrating the communities and the tech by elegantly blending their paradigms, rather than making one subservient to the other. That subtlety provides practitioners in each community a portal to see the wonders of the other, and not just smoosh them together in some contrived fashion that would make for a worst-of-both-worlds outcome. Instead, the ethos of pragmatism and platform openness that Satya Nadella has engendered at Microsoft has made its way all the way down to a developer framework. There’s nothing but upside in that.


Credit: Zdnet

Previous Post

China unveils blockchain committee with Huawei, Baidu execs and industry CEOs aboard

Next Post

Microsoft Issues Patches for 2 Bugs Exploited as Zero-Day in the Wild

Related Posts

Streamlining data science with open source: Data version control and continuous machine learning
Big Data

Streamlining data science with open source: Data version control and continuous machine learning

March 4, 2021
Microsoft Ignite Data and Analytics roundup: Platform extensions are the key theme
Big Data

Microsoft Ignite Data and Analytics roundup: Platform extensions are the key theme

March 3, 2021
At Microsoft Ignite, expanding reach for Azure ML & Purview, Power BI Premium
Big Data

At Microsoft Ignite, expanding reach for Azure ML & Purview, Power BI Premium

March 3, 2021
IBM Cloud Satellite goes GA
Big Data

IBM Cloud Satellite goes GA

March 1, 2021
DataStax Astra goes serverless | ZDNet
Big Data

DataStax Astra goes serverless | ZDNet

February 25, 2021
Next Post
Microsoft Issues Patches for 2 Bugs Exploited as Zero-Day in the Wild

Microsoft Issues Patches for 2 Bugs Exploited as Zero-Day in the Wild

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

13 challenges creating an open, scalable, and secure serverless platform – IBM Developer
Technology Companies

13 challenges creating an open, scalable, and secure serverless platform – IBM Developer

March 4, 2021
Ursnif Trojan has targeted over 100 Italian banks
Internet Security

Ursnif Trojan has targeted over 100 Italian banks

March 4, 2021
Hackers Now Hiding ObliqueRAT Payload in Images to Evade Detection
Internet Privacy

Hackers Now Hiding ObliqueRAT Payload in Images to Evade Detection

March 4, 2021
Streamlining data science with open source: Data version control and continuous machine learning
Big Data

Streamlining data science with open source: Data version control and continuous machine learning

March 4, 2021
Companion Raises $8M Seed Round to Use Machine Learning and Computer Vision to Talk to Dogs
Machine Learning

Companion Raises $8M Seed Round to Use Machine Learning and Computer Vision to Talk to Dogs

March 3, 2021
The TensorFlow Certification: get official recognition, but it’s hard! | by Keenan Moukarzel | Feb, 2021
Neural Networks

The TensorFlow Certification: get official recognition, but it’s hard! | by Keenan Moukarzel | Feb, 2021

March 3, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • 13 challenges creating an open, scalable, and secure serverless platform – IBM Developer March 4, 2021
  • Ursnif Trojan has targeted over 100 Italian banks March 4, 2021
  • Hackers Now Hiding ObliqueRAT Payload in Images to Evade Detection March 4, 2021
  • Streamlining data science with open source: Data version control and continuous machine learning March 4, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates