机器学习算法：线性回归｜python与r语言代码实现

2018-10-11

线性回归

它是用来估计实际价值（房屋成本，电话号码，总销售额等）的基础上连续变量（S）。在这里，我们建立一个最佳的线建立独立变量和因变量之间的关系。这个最佳拟合线称为回归线，用线性方程y= a*x+b表示。

了解线性回归的最好方法是重温童年的经历。让我们说，你让五年级的孩子通过体重升序来安排班上的同学，而不问他们的体重！你认为孩子会做什么？他/她很可能会观察（视觉分析）人的身高和体型，并用这些可见参数的组合来安排他们。这是现实生活中的线性回归！孩子实际上已经计算出身高和体型与体重的关系就像上面的方程式。

在这个方程式中：

Y——因变量

a——斜率

X——独立变量

b——截距

这些系数a和b是基于最小化数据点与回归线之间的距离的平方差之和而导出的。

请看下面的例子。在这里，我们已经确定了最佳拟合线具有线性方程y= 0.2811x + 13.9。现在用这个方程式，我们可以知道重量，知道一个人的身高。

线性回归主要有两类：简单线性回归和多元线性回归。简单线性回归的特点是一个独立变量。多元线性回归（顾名思义）的特征是多个（1个以上）独立变量。在找到最佳拟合线时，可以拟合多项式或曲线回归。这些被称为多项式或曲线回归。

Python 代码

#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model
#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train=input_variables_values_training_datasets
y_train=target_variables_values_training_datasets
x_test=input_variables_values_test_datasets
# Create linear regression object
linear = linear_model.LinearRegression()
# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)
#Equation coefficient and Intercept
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
#Predict Output
predicted= linear.predict(x_test)

R 语言代码

#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train <- input_variables_values_training_datasets
y_train <- target_variables_values_training_datasets
x_test <- input_variables_values_test_datasets
x <- cbind(x_train,y_train)
# Train the model using the training sets and check score
linear <- lm(y_train ~ ., data = x)
summary(linear)
#Predict Output
predicted= predict(linear,x_test)

THE END

数据科学和机器学习面试问题:不容易啊！

<<上一篇

机器学习算法：逻辑回归｜python与r语言代码实现

下一篇>>