Training your machine learning model can be lengthy and time consuming. While there are plenty of libraries, framework which can help you out in this case, for example – pytorch, tensorflow, keras. Scikit-learn, or simply sklearn is yet another tool you can use to do so.
Why use sklearn ?
But what makes sklearn so great? Well the answer is, it simply eliminates your need to write a lot of code. For example, while training a simple regression model, all you need to do is give your training data to the model and sklearn trains it in less than 5 seconds. While also providing you a great number of evaluation techiques. Which is exactly what we would be doing right now. We will see how to do:
CONs of sklearn
The difference between sklearn and other libraries like pytorch, is that sklearn simplifies training models for you. In other words, you don’t have 100 % control on your model. You don’t have the ability to set every hyperparameter that exists in a model. Also sklearn doesn’t have any built-in support for GPUs . Hence cannot be used to build too complex model.
So sklearn is preferred for situations where the the task is not that complex and you need results as fast as possible. Which is the 80 % of the cases one might encounter.
So without further ado, lets see its basic code.
If you are unfamiliar with basics of machine learning , you might wanna read these 2 first
1. Linear regression using sklearn
CODE:
# STEP 1----------------------------------------------------------------------------------------------------
import matplotlib.pyplot as mp
import numpy as np
import random
x_data = np.arange(0, 10, 0.2)
y_data = x_data*2
noise = []
for i in range(0, len(y_data)):
a = round(random.uniform(3.0, 6.0), 2)
noise.append(a)
y_data = y_data + noise
# STEP 2----------------------------------------------------------------------------------------------------
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x_data, y_data, test_size=0.33)
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(xtrain.reshape(-1, 1), ytrain.reshape(-1, 1))
# STEP 3----------------------------------------------------------------------------------------------------
print(model.predict([[6]]))
# STEP 4----------------------------------------------------------------------------------------------------
test_preds = model.predict(xtest.reshape(-1, 1))
mp.scatter(x_data, y_data)
mp.plot(xtest, test_preds)
mp.show()
The entire code has been divided in 4 steps.
STEP 1
In step 1, we creating our data. We do so by first creating x values starting from 0 to 10, with a difference of 0.2 (step value) . using np.arange(0, 10, 0.2). Hence there are a total of 50 values. We then create y values which is 2 times (slope of our equation) of x values. To give our y values some deviation or noise, we make a list ( same as length of x and y values), and append random values between 3.0 and 6.0 . And finally we add it to y values.
round(random.uniform(1.3, 4.0), 2) returns a random float between 3.0 and 6.0. round is a python function which rounds it off to 2 decimal places.
STEP 2
In step 2, we are dividing our data into testing and training x and y values using train_test_split from sklearn.model_selection. 0.33 signifies the percentage of data we want to use as test data.
Then we import the linear model from sklearn and finally train using fit method.
Notice how we need to reshape our xtrain and ytrain into a 2d array (Its required).
STEP 3
Now that out model has been trained, in step 3, we try to predict a singular value. Which is 6 in our example.
So the answer i got is 16.68. Which is pretty much what we wanted. It is around 2 times of 6, which is 12.
STEP 4
In order to see it visually, in step 4, we are first making a list of predictions test_preds, and plotting it against xtest (whose prediction is gives.) Further we are plotting our entire data. The graph we get looks something like this

2. Polynomial regression using sklearn
CODE:
import matplotlib.pyplot as mp
import numpy as np
import random
x_data = np.arange(0, 10, 0.2)
y_data = x_data
# x_data = [1,1, 1.5, 2,2, 2.5, 3,3, 3.5, 4,4, 4.5, 5,5, 5.5, 6,6]
# y_data = [2.9, 3.1, 4.5, 6, 6.2, 7.6, 8.8, 8.9, 10.7, 12, 12, 13.6, 15, 15.3, 16.4, 17.7, 18.4 ]
noise = []
for i in range(0, len(y_data)):
a = round(random.uniform(0.5, 2.5), 2)
noise.append(a)
y_data = y_data + noise
y_data = 10*(y_data ** 3)
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x_data, y_data, test_size=0.33)
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree = 2)
xtrain_poly, xtest_poly = poly.fit_transform(xtrain.reshape(-1, 1)), poly.fit_transform(xtest.reshape(-1, 1))
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(xtrain_poly, ytrain)
print(model.predict(poly.transform([[6]]))) # ----- prediction
test_preds = model.predict(xtest_poly)
mp.scatter(x_data, y_data)
mp.scatter(xtest, test_preds)
mp.show()
So the code almost remains same, only this time, our y_values are non-linearly related to x values. And to train our model, we need to transform our xtrain data into different form using PolynomialFeatures from sklearn.preprocessing. You also need to specify the degree of your model( 2 in our case). And follow the usual steps.
The graph looks something like this.

3. Multiple regression using sklearn
CODE:
import numpy
import numpy as np
import random
x_data1 = np.arange(10, 20, 0.2)
x_data2 = np.arange(8, 108, 2)
y_data = x_data1*2 + x_data2*3
x_data = np.concatenate((x_data1.reshape(-1, 1), x_data2.reshape(-1, 1)), axis=1)
noise = []
for i in range(0, len(y_data)):
a = round(random.uniform(-20.0, 20.0), 2)
noise.append(a)
y_data = y_data + noise
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x_data, y_data, test_size=0.33)
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree = 1)
xtrain_poly, xtest_poly = poly.fit_transform(xtrain), poly.fit_transform(xtest)
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(xtrain_poly, ytrain)
#print(model.predict(poly.transform([[10, 16]]))) # ----- prediction
test_preds = model.predict(xtest_poly)
print(ytest)
print(test_preds)
from mpl_toolkits import mplot3d
import matplotlib.pyplot as mp
fig = mp.figure()
ax = mp.axes(projection ='3d')
ax.scatter(x_data1, x_data2, y_data)
x = []
y = []
for i in xtest:
x.append(i[0])
y.append(i[1])
ax.scatter(x, y, test_preds)
mp.show()
Here we have 2 independent values instead of 1. Hence the shape of our training / testing x values array has shape (x, 2). The syntax is no different from the others.
Notice how we used polynomial features anyway with a degree of 1. This eliminates the need of changing our code it case of non linear relation.
Here is how the graph looks like.

great article thank for sharing streaming anime sub indo
nice article , cdthank for shring help me, domainxxxxxxx