I can do it!!

He can do! She can do! why cannot me? i can do it!

개발/sk infosec cloud ai 전문가 양성과정

[Pandas를 이용한 데이터 분석mnist-fashion, svm, decision tree]SK infosec 클라우드 AI 전문가 양성과정 수업 실습내용

gogoriver 2020. 9. 8. 09:03
mnist_fashion_gpu의_사본의_사본

mnist fashion

  1. test 데이터 중에서 첫번째 데이터를 이미지로 표시 => 이미지 제출

  2. Fashion-mnist_train.csv(60000개), fashion-mnist_test.csv(10000개)

    • RandomForestClassifier
    • GradientBoostingClassifier
    • MLPClassifier
    • SVC
    • 이 중 최적의 알고리즘과 파라미터 학습 ( GridSearchCV)
    • 정답률 확인(WITH CODE)
    • 주피터 코드로 제출

1. test 데이터 중 첫번째 데이터를 이미지로 표시하기

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline
/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm
In [ ]:
data_file = open("/content/drive/My Drive/Colab Notebooks/ml/Day04/fashion-mnist_train.csv", 'r')
data_list = data_file.readlines()
data_file.close()

all_values = data_list[1].split(',')
image_array = np.asfarray(all_values[1:]).reshape((28,28))

plt.imshow(image_array, cmap='Greys', interpolation='None')
plt.show()
In [ ]:
scaled_input = np.asfarray(all_values[1:])/255.0*0.99+0.01
print(scaled_input)
[0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.02552941 0.01
 0.01       0.01       0.01       0.01       0.25070588 0.24682353
 0.09152941 0.12258824 0.09929412 0.208      0.538      0.24682353
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.35164706
 0.79035294 0.89517647 0.88352941 1.         0.45647059 0.25070588
 0.54188235 1.         0.92235294 0.87188235 1.         0.53411765
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.19247059 0.98835294 0.91847059 0.934      0.87964706
 0.84470588 0.84470588 0.89905882 0.42929412 0.70882353 0.81364706
 0.84082353 0.87964706 0.90682353 0.97670588 0.99611765 0.18470588
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01388235 0.01       0.01       0.84082353
 0.87188235 0.82529412 0.83694118 0.87964706 0.88352941 0.85247059
 0.86411765 0.99611765 0.91458824 0.86023529 0.868      0.85247059
 0.87576471 0.868      0.94176471 0.99611765 0.01       0.01
 0.01388235 0.01       0.01       0.01       0.01388235 0.01
 0.01       0.01       0.50694118 0.93011765 0.81364706 0.87964706
 0.87964706 0.81364706 0.84858824 0.84082353 0.82529412 0.81752941
 0.82917647 0.868      0.81752941 0.86023529 0.83694118 0.88741176
 0.82917647 0.93011765 0.59235294 0.01       0.01       0.01
 0.01       0.01       0.01       0.01776471 0.01       0.01
 0.93011765 0.87188235 0.84470588 0.81364706 0.82529412 0.83305882
 0.83694118 0.80976471 0.84082353 0.83694118 0.84082353 0.83694118
 0.82529412 0.84470588 0.84082353 0.80976471 0.78258824 0.85635294
 1.         0.06047059 0.01       0.01776471 0.01       0.01
 0.01       0.02552941 0.01       0.34       0.89517647 0.82529412
 0.85635294 0.78647059 0.82917647 0.81752941 0.79811765 0.84470588
 0.82529412 0.82141176 0.82141176 0.82529412 0.83694118 0.82917647
 0.82529412 0.85247059 0.80976471 0.83694118 0.90682353 0.68941176
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.85247059 0.87964706 0.84470588 0.80976471 0.80588235
 0.802      0.85247059 0.90294118 0.87188235 0.84470588 0.87964706
 0.91458824 0.89517647 0.91070588 0.89517647 0.87964706 0.81364706
 0.83305882 0.84470588 0.83694118 0.89905882 0.13035294 0.01
 0.02552941 0.01       0.01388235 0.01       0.09152941 0.88352941
 0.83305882 0.83305882 0.79811765 0.82917647 0.88352941 0.75929412
 0.54964706 0.538      0.76705882 0.58070588 0.61564706 0.54964706
 0.50694118 0.63894118 0.77482353 0.87576471 0.81364706 0.86411765
 0.83694118 0.91070588 0.69717647 0.01       0.01       0.01
 0.01       0.01       0.48752941 0.88741176 0.81364706 0.82917647
 0.82141176 0.80588235 0.89517647 0.62341176 0.35941176 0.40988235
 0.73211765 0.54576471 0.39823529 0.47976471 0.58070588 0.62341176
 0.72047059 0.88741176 0.81752941 0.84082353 0.82141176 0.84858824
 1.         0.06047059 0.01       0.01388235 0.01       0.01
 0.88741176 0.86023529 0.79423529 0.81752941 0.80976471 0.80588235
 0.84858824 0.72435294 0.61564706 0.59235294 0.75929412 0.67
 0.64670588 0.66223529 0.73988235 0.73211765 0.78647059 0.86023529
 0.84858824 0.83694118 0.83694118 0.82917647 0.91458824 0.58458824
 0.01       0.01       0.01       0.18470588 0.89129412 0.802
 0.84082353 0.82917647 0.85635294 0.87188235 0.868      0.90294118
 0.89905882 0.868      0.83694118 0.87964706 0.91458824 0.88741176
 0.86411765 0.86023529 0.868      0.87964706 0.87576471 0.85247059
 0.82529412 0.85635294 0.83694118 0.99611765 0.01       0.01
 0.01       0.61952941 0.88741176 0.79811765 0.81364706 0.82917647
 0.82141176 0.84470588 0.80588235 0.77870588 0.81364706 0.81752941
 0.79035294 0.79035294 0.77482353 0.79811765 0.80588235 0.82529412
 0.81364706 0.83694118 0.84082353 0.84082353 0.84082353 0.83694118
 0.81752941 0.91847059 0.42541176 0.01       0.01       0.92235294
 0.83694118 0.802      0.82917647 0.82529412 0.82141176 0.83694118
 0.79423529 0.77482353 0.802      0.84470588 0.85247059 0.83694118
 0.83305882 0.82529412 0.80976471 0.83305882 0.79811765 0.82917647
 0.85635294 0.84470588 0.84082353 0.81752941 0.82141176 0.87188235
 0.90294118 0.01       0.21188235 1.         0.81364706 0.78647059
 0.81752941 0.83694118 0.82529412 0.82529412 0.81752941 0.81364706
 0.79423529 0.79035294 0.82141176 0.84858824 0.84858824 0.84858824
 0.84858824 0.84082353 0.83305882 0.80588235 0.84470588 0.79035294
 0.89517647 0.81752941 0.84082353 0.83305882 0.85635294 0.10705882
 0.46811765 0.85247059 0.79035294 0.80976471 0.81752941 0.83694118
 0.81752941 0.80588235 0.80976471 0.82529412 0.82917647 0.79423529
 0.78258824 0.81364706 0.81752941 0.82141176 0.82529412 0.81364706
 0.82529412 0.82529412 0.96117647 0.54964706 0.472      1.
 0.79423529 0.79811765 0.92623529 0.45258824 0.67388235 0.934
 0.83305882 0.79811765 0.86411765 0.84858824 0.85247059 0.82141176
 0.81364706 0.80588235 0.82529412 0.82917647 0.80976471 0.802
 0.80976471 0.82141176 0.82917647 0.84470588 0.82529412 0.80976471
 0.868      0.94952941 0.01       0.87964706 0.91847059 0.90294118
 0.71270588 0.11094118 0.16141176 0.57294118 0.79035294 1.
 0.61952941 0.45647059 0.98058824 0.78647059 0.81364706 0.80976471
 0.81364706 0.83694118 0.84858824 0.80976471 0.80588235 0.80976471
 0.81364706 0.80976471 0.84470588 0.81364706 0.868      0.934
 0.01       0.01       0.73988235 0.34       0.01       0.01
 0.01       0.01       0.01       0.13035294 0.01       0.51082353
 0.99223529 0.74764706 0.81364706 0.81752941 0.81752941 0.81752941
 0.82141176 0.82917647 0.82917647 0.82141176 0.82141176 0.82141176
 0.83305882 0.79035294 0.88741176 0.65058824 0.01       0.01
 0.01       0.01       0.01       0.01       0.01776471 0.01
 0.01       0.01       0.01       0.35552941 0.99611765 0.78258824
 0.78258824 0.75541176 0.77094118 0.77870588 0.78258824 0.79035294
 0.79423529 0.79811765 0.802      0.79811765 0.79811765 0.78647059
 0.87188235 0.61176471 0.01       0.02164706 0.02164706 0.02164706
 0.01776471 0.01       0.01       0.01       0.01388235 0.02941176
 0.01       0.01       1.         0.85635294 0.88741176 0.91070588
 0.89517647 0.87964706 0.87188235 0.86411765 0.86023529 0.86023529
 0.85247059 0.868      0.86411765 0.83305882 0.92623529 0.37882353
 0.01       0.01776471 0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.61176471 0.76317647 0.66223529 0.67       0.67388235 0.68164706
 0.68164706 0.70494118 0.69717647 0.68941176 0.67776471 0.67388235
 0.65835294 0.63505882 0.70882353 0.01       0.01       0.01388235
 0.01       0.01388235 0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01      ]

2. Fashion-mnist_train.csv(60000개), fashion-mnist_test.csv(10000개)

In [ ]:
import numpy as np
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

train = pd.read_csv('/content/drive/My Drive/Colab Notebooks/ml/Day04/fashion-mnist_train.csv')
test = pd.read_csv('/content/drive/My Drive/Colab Notebooks/ml/Day04/fashion-mnist_test.csv')
In [ ]:
print("train data frame")
print(train)
print("test data frame")
print(test)
train data frame
       label  pixel1  pixel2  pixel3  ...  pixel781  pixel782  pixel783  pixel784
0          2       0       0       0  ...         0         0         0         0
1          9       0       0       0  ...         0         0         0         0
2          6       0       0       0  ...         0         0         0         0
3          0       0       0       0  ...         0         0         0         0
4          3       0       0       0  ...         0         0         0         0
...      ...     ...     ...     ...  ...       ...       ...       ...       ...
59995      9       0       0       0  ...         0         0         0         0
59996      1       0       0       0  ...         0         0         0         0
59997      8       0       0       0  ...         0         0         0         0
59998      8       0       0       0  ...         0         0         0         0
59999      7       0       0       0  ...         0         0         0         0

[60000 rows x 785 columns]
test data frame
      label  pixel1  pixel2  pixel3  ...  pixel781  pixel782  pixel783  pixel784
0         0       0       0       0  ...         0         0         0         0
1         1       0       0       0  ...         0         0         0         0
2         2       0       0       0  ...        31         0         0         0
3         2       0       0       0  ...       222        56         0         0
4         3       0       0       0  ...         0         0         0         0
...     ...     ...     ...     ...  ...       ...       ...       ...       ...
9995      0       0       0       0  ...         1         0         0         0
9996      6       0       0       0  ...        28         0         0         0
9997      8       0       0       0  ...        42         0         1         0
9998      8       0       1       3  ...         0         0         0         0
9999      1       0       0       0  ...         0         0         0         0

[10000 rows x 785 columns]
In [ ]:
datas = [train, test]
for data in datas:
    print(data.isnull().sum())
label       0
pixel1      0
pixel2      0
pixel3      0
pixel4      0
           ..
pixel780    0
pixel781    0
pixel782    0
pixel783    0
pixel784    0
Length: 785, dtype: int64
label       0
pixel1      0
pixel2      0
pixel3      0
pixel4      0
           ..
pixel780    0
pixel781    0
pixel782    0
pixel783    0
pixel784    0
Length: 785, dtype: int64
  • 결측치 확인 결과 결측치 0에 수렴하므로 결측치 처리 과정 생략

  • 각 컬럽에 픽셀값이 있기 때문에 숫자 데이터로 입력되어 있다.(int64). 따라서 값을 바꿔줘야 하는 항목은 따로 없다

MODEL에 학습하기

In [ ]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
In [ ]:
X_train = train[train.columns[1:]]
Y_train = train['label']
print(X_train)
print(Y_train)
       pixel1  pixel2  pixel3  pixel4  ...  pixel781  pixel782  pixel783  pixel784
0           0       0       0       0  ...         0         0         0         0
1           0       0       0       0  ...         0         0         0         0
2           0       0       0       0  ...         0         0         0         0
3           0       0       0       1  ...         0         0         0         0
4           0       0       0       0  ...         0         0         0         0
...       ...     ...     ...     ...  ...       ...       ...       ...       ...
59995       0       0       0       0  ...         0         0         0         0
59996       0       0       0       0  ...         0         0         0         0
59997       0       0       0       0  ...         0         0         0         0
59998       0       0       0       0  ...         0         0         0         0
59999       0       0       0       0  ...         0         0         0         0

[60000 rows x 784 columns]
0        2
1        9
2        6
3        0
4        3
        ..
59995    9
59996    1
59997    8
59998    8
59999    7
Name: label, Length: 60000, dtype: int64
In [ ]:
X_test = test[test.columns[1:]]
Y_test = test['label']
print(X_test)
print(Y_test)
      pixel1  pixel2  pixel3  pixel4  ...  pixel781  pixel782  pixel783  pixel784
0          0       0       0       0  ...         0         0         0         0
1          0       0       0       0  ...         0         0         0         0
2          0       0       0       0  ...        31         0         0         0
3          0       0       0       0  ...       222        56         0         0
4          0       0       0       0  ...         0         0         0         0
...      ...     ...     ...     ...  ...       ...       ...       ...       ...
9995       0       0       0       0  ...         1         0         0         0
9996       0       0       0       0  ...        28         0         0         0
9997       0       0       0       0  ...        42         0         1         0
9998       0       1       3       0  ...         0         0         0         0
9999       0       0       0       0  ...         0         0         0         0

[10000 rows x 784 columns]
0       0
1       1
2       2
3       2
4       3
       ..
9995    0
9996    6
9997    8
9998    8
9999    1
Name: label, Length: 10000, dtype: int64

랜덤포레스트

In [ ]:
model1 = RandomForestClassifier(n_estimators=20)
rdc = model1.fit(X_train, Y_train)
output1 = rdc .predict(X_test)
In [ ]:
print("RandomForest with n_estimators = 100")
print(accuracy_score(Y_test, output1))
RandomForest with n_estimators = 100
0.8679

랜덤포레스트(WITH GridSearchCV)

In [ ]:
rf = RandomForestClassifier()
## Grid Search
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
rf_param_grid = {
    "max_depth": [None],
    "max_features": [1, 3, 10],
    "min_samples_split": [2, 3, 10],
    "min_samples_leaf": [1, 3, 10],
    "bootstrap": [False],
    "n_estimators": [10, 20]
}
rf_grid = GridSearchCV(rf, param_grid = rf_param_grid, scoring="accuracy", n_jobs=4, verbose=1)
rf_grid.fit(X_train,Y_train)
Fitting 5 folds for each of 54 candidates, totalling 270 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:   18.2s
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed:  1.6min
[Parallel(n_jobs=4)]: Done 270 out of 270 | elapsed:  3.2min finished
Out[ ]:
GridSearchCV(cv=None, error_score=nan,
             estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                              class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              max_samples=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators=100, n_jobs=None,
                                              oob_score=False,
                                              random_state=None, verbose=0,
                                              warm_start=False),
             iid='deprecated', n_jobs=4,
             param_grid={'bootstrap': [False], 'max_depth': [None],
                         'max_features': [1, 3, 10],
                         'min_samples_leaf': [1, 3, 10],
                         'min_samples_split': [2, 3, 10],
                         'n_estimators': [10, 20]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=1)
In [ ]:
rf_best = rf_grid.best_estimator_
print(rf_grid.best_score_)
print(rf_grid)
0.8565066483379156
GridSearchCV(cv=None, error_score=nan,
             estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                              class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              max_samples=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators=100, n_jobs=None,
                                              oob_score=False,
                                              random_state=None, verbose=0,
                                              warm_start=False),
             iid='deprecated', n_jobs=4,
             param_grid={'bootstrap': [False], 'max_depth': [None],
                         'max_features': [1, 3, 10],
                         'min_samples_leaf': [1, 3, 10],
                         'min_samples_split': [2, 3, 10],
                         'n_estimators': [10, 20]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=1)
In [ ]:
rf_best = rf_grid.best_estimator_

GradientBoostingClassifier

In [ ]:
gb = GradientBoostingClassifier()
## Grid Search
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
gb_param_grid = {
    "loss":["deviance"],
    "n_estimators": [3, 4],
    "learning_rate": [0.1, 0.01],
    "max_depth": [4, 8],
    "max_features": [0.3, 0.1],
    "min_samples_leaf": [10, 15]
    
}
gb_grid = GridSearchCV(gb, param_grid = gb_param_grid, scoring="accuracy", n_jobs=4, verbose=1)
In [ ]:
gb_grid.fit(X_train,Y_train)

gb_best = gb_grid.best_estimator_
print(gb_grid.best_score_)
print(gb_best)
Fitting 5 folds for each of 32 candidates, totalling 160 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  7.7min
[Parallel(n_jobs=4)]: Done 160 out of 160 | elapsed: 37.5min finished
0.8412575606098475
GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=8,
                           max_features=0.1, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=15, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=4,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

MLPClassifier

In [ ]:
# ===================sgd====================================
mlp_sgd = MLPClassifier(solver='sgd', hidden_layer_sizes =(100,), random_state = 1)
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
mlp_param_grid= {
    "loss":["deviance"],
    "n_estimators": [3, 4],
    "learning_rate": [0.1, 0.01]
}
mlp_grid = GridSearchCV(gb, param_grid = mlp_param_grid, scoring="accuracy", n_jobs=4, verbose=1)
mlp_grid.fit(X_train,Y_train)
mlp_sgd_best = mlp_grid.best_score_
print(mlp_grid)
print(mlp_grid.best_score_)
Fitting 5 folds for each of 4 candidates, totalling 20 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  20 out of  20 | elapsed: 11.7min finished
GridSearchCV(cv=None, error_score=nan,
             estimator=GradientBoostingClassifier(ccp_alpha=0.0,
                                                  criterion='friedman_mse',
                                                  init=None, learning_rate=0.1,
                                                  loss='deviance', max_depth=3,
                                                  max_features=None,
                                                  max_leaf_nodes=None,
                                                  min_impurity_decrease=0.0,
                                                  min_impurity_split=None,
                                                  min_samples_leaf=1,
                                                  min_samples_split=2,
                                                  min_weight_fraction_leaf=0.0,
                                                  n_estimators=100,
                                                  n_iter_no_change=None,
                                                  presort='deprecated',
                                                  random_state=None,
                                                  subsample=1.0, tol=0.0001,
                                                  validation_fraction=0.1,
                                                  verbose=0, warm_start=False),
             iid='deprecated', n_jobs=4,
             param_grid={'learning_rate': [0.1, 0.01], 'loss': ['deviance'],
                         'n_estimators': [3, 4]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=1)
0.7741607598100475
In [ ]:
# ===================adam====================================
print("===================adam====================================")
mlp_adam = MLPClassifier(solver='adam', hidden_layer_sizes =(100,), random_state = 1)
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
mlp_param_grid_adam= {
    "activation":["relu"]
}

mlp_a_grid = GridSearchCV(mlp_adam,param_grid=mlp_param_grid_adam,n_jobs=-1, verbose=1)

mlp_a_grid.fit(X_train,Y_train)
mlp_a_grid_best= mlp_a_grid.best_score_
print(mlp_a_grid)
print(mlp_a_grid.best_score_)
===================adam====================================
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 out of   5 | elapsed:  2.5min finished
GridSearchCV(cv=None, error_score=nan,
             estimator=MLPClassifier(activation='relu', alpha=0.0001,
                                     batch_size='auto', beta_1=0.9,
                                     beta_2=0.999, early_stopping=False,
                                     epsilon=1e-08, hidden_layer_sizes=(100,),
                                     learning_rate='constant',
                                     learning_rate_init=0.001, max_fun=15000,
                                     max_iter=200, momentum=0.9,
                                     n_iter_no_change=10,
                                     nesterovs_momentum=True, power_t=0.5,
                                     random_state=1, shuffle=True,
                                     solver='adam', tol=0.0001,
                                     validation_fraction=0.1, verbose=False,
                                     warm_start=False),
             iid='deprecated', n_jobs=4, param_grid={'activation': ['relu']},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=1)
0.807657648087978
In [ ]:
mlp_a_grid_best= mlp_a_grid.best_score_

SVC

In [ ]:
svc = SVC() # 속도가 너무 느림
# 찾아보니 2만개 이상의 데이터셋의 경우 속도가 매우 느려진다고 하였다.
# 이에 svc학습 데이터로 데이터를 줄이기로 하였다.
X_train_m = X_train.loc[:10000]
Y_train_m = Y_train.loc[:10000]
Out[ ]:
pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 pixel10 pixel11 pixel12 pixel13 pixel14 pixel15 pixel16 pixel17 pixel18 pixel19 pixel20 pixel21 pixel22 pixel23 pixel24 pixel25 pixel26 pixel27 pixel28 pixel29 pixel30 pixel31 pixel32 pixel33 pixel34 pixel35 pixel36 pixel37 pixel38 pixel39 pixel40 ... pixel745 pixel746 pixel747 pixel748 pixel749 pixel750 pixel751 pixel752 pixel753 pixel754 pixel755 pixel756 pixel757 pixel758 pixel759 pixel760 pixel761 pixel762 pixel763 pixel764 pixel765 pixel766 pixel767 pixel768 pixel769 pixel770 pixel771 pixel772 pixel773 pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783 pixel784
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 5 0 0 0 105 92 101 107 100 132 0 0 2 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 150 227 ... 211 220 214 74 0 255 222 128 0 0 0 0 0 0 0 0 0 44 12 0 0 40 134 162 191 214 163 146 165 79 0 0 0 30 43 0 0 0 0 0
3 0 0 0 1 2 0 0 0 0 0 114 183 112 55 23 72 102 165 160 28 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 24 188 163 93 136 ... 171 249 207 197 202 45 0 3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 22 21 25 69 52 45 74 39 3 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 46 0 21 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 187 189 0 ... 230 237 229 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 116 112 136 147 144 121 102 63 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9996 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9997 0 0 0 2 0 22 126 0 5 0 0 10 46 0 0 0 31 48 91 0 0 26 16 0 0 0 0 0 0 0 0 2 0 81 156 36 254 124 42 219 ... 130 129 118 120 120 120 121 144 147 171 168 10 0 0 28 116 140 180 198 218 242 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9998 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9999 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10000 0 0 0 0 0 0 1 1 0 101 112 103 120 102 137 131 138 123 147 82 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 160 203 198 ... 0 0 0 1 0 0 152 185 97 0 0 0 0 0 0 0 22 90 95 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 95 110 47 0 0 0

10001 rows × 784 columns

scaler 적용하기
In [ ]:
# scaler 적용하기
# MinMaxScaler로 이용하기 preprocessing.Min
scaler = MinMaxScaler()
data_train = scaler.fit_transform(train.astype(np.float32))
data_test = scaler.fit_transform(test.astype(np.float32))

x_scaled = train.iloc[:,1:].values
y_scaled = train.iloc[:,0].values
# SCALER을 적용하였음에도 시간이 너무 오래걸려서 10000개로 줄임
x_scaled = pd.DataFrame(x_scaled).loc[:10000]
y_scaled = pd.DataFrame(y_scaled).loc[:10000]
x_scaled
Out[ ]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 5 0 0 0 105 92 101 107 100 132 0 0 2 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 150 227 ... 211 220 214 74 0 255 222 128 0 0 0 0 0 0 0 0 0 44 12 0 0 40 134 162 191 214 163 146 165 79 0 0 0 30 43 0 0 0 0 0
3 0 0 0 1 2 0 0 0 0 0 114 183 112 55 23 72 102 165 160 28 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 24 188 163 93 136 ... 171 249 207 197 202 45 0 3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 22 21 25 69 52 45 74 39 3 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 46 0 21 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 187 189 0 ... 230 237 229 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 116 112 136 147 144 121 102 63 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9996 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9997 0 0 0 2 0 22 126 0 5 0 0 10 46 0 0 0 31 48 91 0 0 26 16 0 0 0 0 0 0 0 0 2 0 81 156 36 254 124 42 219 ... 130 129 118 120 120 120 121 144 147 171 168 10 0 0 28 116 140 180 198 218 242 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9998 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9999 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10000 0 0 0 0 0 0 1 1 0 101 112 103 120 102 137 131 138 123 147 82 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 160 203 198 ... 0 0 0 1 0 0 152 185 97 0 0 0 0 0 0 0 22 90 95 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 95 110 47 0 0 0

10001 rows × 784 columns

In [ ]:
svc = SVC()
svc_param_grid = {'kernel':['linear'],
                  'gamma':[0.001,0.01,0.1,0.5,1],
                  'C':[0.01,0.1,1,10,50,100,200,300]}

svc_grid = GridSearchCV(svc, param_grid = svc_param_grid, scoring="accuracy",n_jobs=-1, verbose=1)
# X_train
In [ ]:
svc_grid.fit(x_scaled,y_scaled)
svc_best = svc_grid.best_estimator_
print(svc_grid.best_score_)
print(svc_grid.best_params_)
Fitting 5 folds for each of 40 candidates, totalling 200 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:  9.6min
[Parallel(n_jobs=-1)]: Done 196 tasks      | elapsed: 40.3min
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 41.1min finished
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
0.8074183408295852
{'C': 0.01, 'gamma': 0.001, 'kernel': 'linear'}
In [ ]:
svc = SVC()
svc_param_grid = {'kernel':['rbf'],
                  'gamma':[0.001,0.01,0.1,0.5,1],
                  'C':[0.01,0.1,1,10,50,100,200,300]}

svc_grid = GridSearchCV(svc, param_grid = svc_param_grid, scoring="accuracy",n_jobs=-1, verbose=1)
# X_train
In [ ]:
svc_grid.fit(x_scaled,y_scaled)
svc_best = svc_grid.best_estimator_
print(svc_grid.best_score_)
print(svc_grid.best_params_)
Fitting 5 folds for each of 40 candidates, totalling 200 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed: 79.5min
[Parallel(n_jobs=-1)]: Done 196 tasks      | elapsed: 329.2min
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 335.7min finished
0.11318965517241379
{'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}

GridSearchCV적용한 결과물 확인하기