Saturday, January 16, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Jump-start with Linear Regression using PySpark MLlib

March 1, 2019
in Neural Networks
Jump-start with Linear Regression using PySpark MLlib
587
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Credit: BecomingHuman

Deep dive-in : Linear Regression using PySpark MLlib

You might also like

The Why, What, and How of Affective Computing | by Javier Gonzalez | Jan, 2021

Write Your First AI Project in 15 Minutes | by Doga Ozgon | Jan, 2021

Building a small knowledge graph using NER | by Swayatta Daw | Jan, 2021

PREREQUISITE : Amateur level knowledge of PySpark

spark.ml is a package introduced in Spark 1.2, which aims to provide a uniform set of high-level APIs that help users create and tune practical machine learning pipelines.

Do not get worried about the imports now. I would explain each packages and its use.

Our data is stored in a Dataframe data . Need to check schema of the data before proceeding as VectorAssembler accepts the following input column types: all numeric types, Boolean type and vector type, if not one of these types we could have explicitly type-casted

data = data.select(data.columnName.astype(‘int’))

from pyspark import SQLContext, SparkConf, SparkContext
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator
conf = SparkConf().setMaster('local').setAppName('ML_learning')
sc = SparkContext(conf=conf)
sqlcontext = SQLContext(sc)
data = sqlcontext.read.csv(path='salary_d.csv', header = True, inferSchema = True)
data.printSchema()
Print Schema result

There are two terms which you often come across in ML. “Feature” and “Label”.

In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating and independent features is a crucial step for effective algorithms in pattern recognition, classification and regression. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition. The concept of “feature” is related to that of explanatory variable used in statistical techniques such as linear regression.

In simple terms features are all the independent variables that you think will help you to predict the values of dependent variable. In many a cases there are one or more independent variable or say “Feature”. VectorAssembler combines all feature into single vector. Assume x1, x2, x3 are three columns having values 1, 2 ,3 which you want to combine into a single feature vector called features and use it to predict dependent variable. If we set VectorAssembler input columns to x1, x2 and x3 and output column to features, after transformation we get the following DataFrame:

features
| — — — — — — |
| [1.0, 2.0, 3.0]|

Label is dependent variable whose value our model predicts.

data2 = data.select(data.Expr_yrs,data.Salary.alias('label'))
train, test = data2.randomSplit([0.7,0.3])
assembler = VectorAssembler().setInputCols(['Expr_yrs',])
.setOutputCol('features')
train01 = assembler.transform(train)
''' we only need features and label column '''
train02 = train01.select("features","label")
train02.show(truncate=False)
train02 Data

We split our data into 2 parts train and test, train data is used to train our model based on certain algorithm and perform prediction on test data. Split size is complete choice of programmer and is debatable.

lr = LinearRegression()
model = lr.fit(train02)
test01 = assembler.transform(test)
test02 = test01.select('features', 'label')
test03 = model.transform(test02)
test03.show(truncate=False)
test03 prediction

Imported LinearRegression from ml package of PySpark and would use as our algorithm for predictive modeling to create model. Using fit() method we passed train data to train our model. Based on this data supplied our model will predict the values for test data which we supplied later and got the above result.

evaluator = RegressionEvaluator()
print(evaluator.evaluate(test03,
{evaluator.metricName: "r2"})
)

To verify how good our model is predicting the values of label and whether selecting linear regression as our algorithm for our model was good choice or not ( means whether independent variable explains dependent variable) we use a evaluator.

RegressionEvaluator from ml package evaluates regression model. The metric used is R². We can use other metrics like rmse, mse (root-mean square error and mean square error ) etc.

print(evaluator.evaluate(test03,
{evaluator.metricName: "mse"})
)
print(evaluator.evaluate(test03,
{evaluator.metricName: "rmse"})
)
print(evaluator.evaluate(test03,
{evaluator.metricName: "mae"})
)
Model’s performance metrics result

R² = 0.68. The R² value depends upon train and test data which is divided randomly, so the result might vary.

R² value ~ 1 is a good fit. Hence we can conclude that our assumption that “Years of Experience” explains “Salary”is correct.

Hope the concepts are bit clear now. Feel free to write your suggestions. Thanks !

Credit: BecomingHuman By: Saurabh Chakraborty

Previous Post

Meet Unbounce's drag-and-drop builder for AMP landing pages

Next Post

Cracking the Code on Adversarial Machine Learning

Related Posts

The Why, What, and How of Affective Computing | by Javier Gonzalez | Jan, 2021
Neural Networks

The Why, What, and How of Affective Computing | by Javier Gonzalez | Jan, 2021

January 15, 2021
Write Your First AI Project in 15 Minutes | by Doga Ozgon | Jan, 2021
Neural Networks

Write Your First AI Project in 15 Minutes | by Doga Ozgon | Jan, 2021

January 15, 2021
Building a small knowledge graph using NER | by Swayatta Daw | Jan, 2021
Neural Networks

Building a small knowledge graph using NER | by Swayatta Daw | Jan, 2021

January 14, 2021
The story of machine proofs — Part II | by Sharad Sundararajan | Jan, 2021
Neural Networks

The story of machine proofs — Part II | by Sharad Sundararajan | Jan, 2021

January 14, 2021
Top 20 Most Popular Programming Languages For 2021 and Beyond | by Amyra Sheldon | Jan, 2021
Neural Networks

Top 20 Most Popular Programming Languages For 2021 and Beyond | by Amyra Sheldon | Jan, 2021

January 13, 2021
Next Post
Cracking the Code on Adversarial Machine Learning

Cracking the Code on Adversarial Machine Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

This new ransomware is growing in strength and could become a major threat warn researchers
Internet Security

Ransomware attacks now to blame for half of healthcare data breaches

January 15, 2021
Researchers Disclose Undocumented Chinese Malware Used in Recent Attacks
Internet Privacy

Researchers Disclose Undocumented Chinese Malware Used in Recent Attacks

January 15, 2021
Apologetic AI Is A Somewhat Sorry Trend, Especially For Autonomous Cars  
Artificial Intelligence

Apologetic AI Is A Somewhat Sorry Trend, Especially For Autonomous Cars  

January 15, 2021
Machine Learning

BlackRock invests in data science & machine learning | Corporate Finance

January 15, 2021
Toyota slapped with $180 million fine for violating Clean Air Act
Internet Security

Toyota slapped with $180 million fine for violating Clean Air Act

January 15, 2021
AI Research at Amazon: Brand Voice, Entanglement Frontier, Humor Recognition  
Artificial Intelligence

AI Research at Amazon: Brand Voice, Entanglement Frontier, Humor Recognition  

January 15, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Ransomware attacks now to blame for half of healthcare data breaches January 15, 2021
  • Researchers Disclose Undocumented Chinese Malware Used in Recent Attacks January 15, 2021
  • Apologetic AI Is A Somewhat Sorry Trend, Especially For Autonomous Cars   January 15, 2021
  • BlackRock invests in data science & machine learning | Corporate Finance January 15, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates