mnist fashion¶

test 데이터 중에서 첫번째 데이터를 이미지로 표시 => 이미지 제출
Fashion-mnist_train.csv(60000개), fashion-mnist_test.csv(10000개)
- RandomForestClassifier
- GradientBoostingClassifier
- MLPClassifier
- SVC
- 이 중 최적의 알고리즘과 파라미터 학습 ( GridSearchCV)
- 정답률 확인(WITH CODE)
- 주피터 코드로 제출

1. test 데이터 중 첫번째 데이터를 이미지로 표시하기¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline

/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm

data_file = open("/content/drive/My Drive/Colab Notebooks/ml/Day04/fashion-mnist_train.csv", 'r')
data_list = data_file.readlines()
data_file.close()

all_values = data_list[1].split(',')
image_array = np.asfarray(all_values[1:]).reshape((28,28))

plt.imshow(image_array, cmap='Greys', interpolation='None')
plt.show()

scaled_input = np.asfarray(all_values[1:])/255.0*0.99+0.01
print(scaled_input)

[0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.02552941 0.01
 0.01       0.01       0.01       0.01       0.25070588 0.24682353
 0.09152941 0.12258824 0.09929412 0.208      0.538      0.24682353
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.35164706
 0.79035294 0.89517647 0.88352941 1.         0.45647059 0.25070588
 0.54188235 1.         0.92235294 0.87188235 1.         0.53411765
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.19247059 0.98835294 0.91847059 0.934      0.87964706
 0.84470588 0.84470588 0.89905882 0.42929412 0.70882353 0.81364706
 0.84082353 0.87964706 0.90682353 0.97670588 0.99611765 0.18470588
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01388235 0.01       0.01       0.84082353
 0.87188235 0.82529412 0.83694118 0.87964706 0.88352941 0.85247059
 0.86411765 0.99611765 0.91458824 0.86023529 0.868      0.85247059
 0.87576471 0.868      0.94176471 0.99611765 0.01       0.01
 0.01388235 0.01       0.01       0.01       0.01388235 0.01
 0.01       0.01       0.50694118 0.93011765 0.81364706 0.87964706
 0.87964706 0.81364706 0.84858824 0.84082353 0.82529412 0.81752941
 0.82917647 0.868      0.81752941 0.86023529 0.83694118 0.88741176
 0.82917647 0.93011765 0.59235294 0.01       0.01       0.01
 0.01       0.01       0.01       0.01776471 0.01       0.01
 0.93011765 0.87188235 0.84470588 0.81364706 0.82529412 0.83305882
 0.83694118 0.80976471 0.84082353 0.83694118 0.84082353 0.83694118
 0.82529412 0.84470588 0.84082353 0.80976471 0.78258824 0.85635294
 1.         0.06047059 0.01       0.01776471 0.01       0.01
 0.01       0.02552941 0.01       0.34       0.89517647 0.82529412
 0.85635294 0.78647059 0.82917647 0.81752941 0.79811765 0.84470588
 0.82529412 0.82141176 0.82141176 0.82529412 0.83694118 0.82917647
 0.82529412 0.85247059 0.80976471 0.83694118 0.90682353 0.68941176
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.85247059 0.87964706 0.84470588 0.80976471 0.80588235
 0.802      0.85247059 0.90294118 0.87188235 0.84470588 0.87964706
 0.91458824 0.89517647 0.91070588 0.89517647 0.87964706 0.81364706
 0.83305882 0.84470588 0.83694118 0.89905882 0.13035294 0.01
 0.02552941 0.01       0.01388235 0.01       0.09152941 0.88352941
 0.83305882 0.83305882 0.79811765 0.82917647 0.88352941 0.75929412
 0.54964706 0.538      0.76705882 0.58070588 0.61564706 0.54964706
 0.50694118 0.63894118 0.77482353 0.87576471 0.81364706 0.86411765
 0.83694118 0.91070588 0.69717647 0.01       0.01       0.01
 0.01       0.01       0.48752941 0.88741176 0.81364706 0.82917647
 0.82141176 0.80588235 0.89517647 0.62341176 0.35941176 0.40988235
 0.73211765 0.54576471 0.39823529 0.47976471 0.58070588 0.62341176
 0.72047059 0.88741176 0.81752941 0.84082353 0.82141176 0.84858824
 1.         0.06047059 0.01       0.01388235 0.01       0.01
 0.88741176 0.86023529 0.79423529 0.81752941 0.80976471 0.80588235
 0.84858824 0.72435294 0.61564706 0.59235294 0.75929412 0.67
 0.64670588 0.66223529 0.73988235 0.73211765 0.78647059 0.86023529
 0.84858824 0.83694118 0.83694118 0.82917647 0.91458824 0.58458824
 0.01       0.01       0.01       0.18470588 0.89129412 0.802
 0.84082353 0.82917647 0.85635294 0.87188235 0.868      0.90294118
 0.89905882 0.868      0.83694118 0.87964706 0.91458824 0.88741176
 0.86411765 0.86023529 0.868      0.87964706 0.87576471 0.85247059
 0.82529412 0.85635294 0.83694118 0.99611765 0.01       0.01
 0.01       0.61952941 0.88741176 0.79811765 0.81364706 0.82917647
 0.82141176 0.84470588 0.80588235 0.77870588 0.81364706 0.81752941
 0.79035294 0.79035294 0.77482353 0.79811765 0.80588235 0.82529412
 0.81364706 0.83694118 0.84082353 0.84082353 0.84082353 0.83694118
 0.81752941 0.91847059 0.42541176 0.01       0.01       0.92235294
 0.83694118 0.802      0.82917647 0.82529412 0.82141176 0.83694118
 0.79423529 0.77482353 0.802      0.84470588 0.85247059 0.83694118
 0.83305882 0.82529412 0.80976471 0.83305882 0.79811765 0.82917647
 0.85635294 0.84470588 0.84082353 0.81752941 0.82141176 0.87188235
 0.90294118 0.01       0.21188235 1.         0.81364706 0.78647059
 0.81752941 0.83694118 0.82529412 0.82529412 0.81752941 0.81364706
 0.79423529 0.79035294 0.82141176 0.84858824 0.84858824 0.84858824
 0.84858824 0.84082353 0.83305882 0.80588235 0.84470588 0.79035294
 0.89517647 0.81752941 0.84082353 0.83305882 0.85635294 0.10705882
 0.46811765 0.85247059 0.79035294 0.80976471 0.81752941 0.83694118
 0.81752941 0.80588235 0.80976471 0.82529412 0.82917647 0.79423529
 0.78258824 0.81364706 0.81752941 0.82141176 0.82529412 0.81364706
 0.82529412 0.82529412 0.96117647 0.54964706 0.472      1.
 0.79423529 0.79811765 0.92623529 0.45258824 0.67388235 0.934
 0.83305882 0.79811765 0.86411765 0.84858824 0.85247059 0.82141176
 0.81364706 0.80588235 0.82529412 0.82917647 0.80976471 0.802
 0.80976471 0.82141176 0.82917647 0.84470588 0.82529412 0.80976471
 0.868      0.94952941 0.01       0.87964706 0.91847059 0.90294118
 0.71270588 0.11094118 0.16141176 0.57294118 0.79035294 1.
 0.61952941 0.45647059 0.98058824 0.78647059 0.81364706 0.80976471
 0.81364706 0.83694118 0.84858824 0.80976471 0.80588235 0.80976471
 0.81364706 0.80976471 0.84470588 0.81364706 0.868      0.934
 0.01       0.01       0.73988235 0.34       0.01       0.01
 0.01       0.01       0.01       0.13035294 0.01       0.51082353
 0.99223529 0.74764706 0.81364706 0.81752941 0.81752941 0.81752941
 0.82141176 0.82917647 0.82917647 0.82141176 0.82141176 0.82141176
 0.83305882 0.79035294 0.88741176 0.65058824 0.01       0.01
 0.01       0.01       0.01       0.01       0.01776471 0.01
 0.01       0.01       0.01       0.35552941 0.99611765 0.78258824
 0.78258824 0.75541176 0.77094118 0.77870588 0.78258824 0.79035294
 0.79423529 0.79811765 0.802      0.79811765 0.79811765 0.78647059
 0.87188235 0.61176471 0.01       0.02164706 0.02164706 0.02164706
 0.01776471 0.01       0.01       0.01       0.01388235 0.02941176
 0.01       0.01       1.         0.85635294 0.88741176 0.91070588
 0.89517647 0.87964706 0.87188235 0.86411765 0.86023529 0.86023529
 0.85247059 0.868      0.86411765 0.83305882 0.92623529 0.37882353
 0.01       0.01776471 0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.61176471 0.76317647 0.66223529 0.67       0.67388235 0.68164706
 0.68164706 0.70494118 0.69717647 0.68941176 0.67776471 0.67388235
 0.65835294 0.63505882 0.70882353 0.01       0.01       0.01388235
 0.01       0.01388235 0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01       0.01       0.01
 0.01       0.01       0.01       0.01      ]

2. Fashion-mnist_train.csv(60000개), fashion-mnist_test.csv(10000개)¶

import numpy as np
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

train = pd.read_csv('/content/drive/My Drive/Colab Notebooks/ml/Day04/fashion-mnist_train.csv')
test = pd.read_csv('/content/drive/My Drive/Colab Notebooks/ml/Day04/fashion-mnist_test.csv')

print("train data frame")
print(train)
print("test data frame")
print(test)

train data frame
       label  pixel1  pixel2  pixel3  ...  pixel781  pixel782  pixel783  pixel784
0          2       0       0       0  ...         0         0         0         0
1          9       0       0       0  ...         0         0         0         0
2          6       0       0       0  ...         0         0         0         0
3          0       0       0       0  ...         0         0         0         0
4          3       0       0       0  ...         0         0         0         0
...      ...     ...     ...     ...  ...       ...       ...       ...       ...
59995      9       0       0       0  ...         0         0         0         0
59996      1       0       0       0  ...         0         0         0         0
59997      8       0       0       0  ...         0         0         0         0
59998      8       0       0       0  ...         0         0         0         0
59999      7       0       0       0  ...         0         0         0         0

[60000 rows x 785 columns]
test data frame
      label  pixel1  pixel2  pixel3  ...  pixel781  pixel782  pixel783  pixel784
0         0       0       0       0  ...         0         0         0         0
1         1       0       0       0  ...         0         0         0         0
2         2       0       0       0  ...        31         0         0         0
3         2       0       0       0  ...       222        56         0         0
4         3       0       0       0  ...         0         0         0         0
...     ...     ...     ...     ...  ...       ...       ...       ...       ...
9995      0       0       0       0  ...         1         0         0         0
9996      6       0       0       0  ...        28         0         0         0
9997      8       0       0       0  ...        42         0         1         0
9998      8       0       1       3  ...         0         0         0         0
9999      1       0       0       0  ...         0         0         0         0

[10000 rows x 785 columns]

datas = [train, test]
for data in datas:
    print(data.isnull().sum())

label       0
pixel1      0
pixel2      0
pixel3      0
pixel4      0
           ..
pixel780    0
pixel781    0
pixel782    0
pixel783    0
pixel784    0
Length: 785, dtype: int64
label       0
pixel1      0
pixel2      0
pixel3      0
pixel4      0
           ..
pixel780    0
pixel781    0
pixel782    0
pixel783    0
pixel784    0
Length: 785, dtype: int64

결측치 확인 결과 결측치 0에 수렴하므로 결측치 처리 과정 생략
각 컬럽에 픽셀값이 있기 때문에 숫자 데이터로 입력되어 있다.(int64). 따라서 값을 바꿔줘야 하는 항목은 따로 없다

MODEL에 학습하기¶

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

X_train = train[train.columns[1:]]
Y_train = train['label']
print(X_train)
print(Y_train)

       pixel1  pixel2  pixel3  pixel4  ...  pixel781  pixel782  pixel783  pixel784
0           0       0       0       0  ...         0         0         0         0
1           0       0       0       0  ...         0         0         0         0
2           0       0       0       0  ...         0         0         0         0
3           0       0       0       1  ...         0         0         0         0
4           0       0       0       0  ...         0         0         0         0
...       ...     ...     ...     ...  ...       ...       ...       ...       ...
59995       0       0       0       0  ...         0         0         0         0
59996       0       0       0       0  ...         0         0         0         0
59997       0       0       0       0  ...         0         0         0         0
59998       0       0       0       0  ...         0         0         0         0
59999       0       0       0       0  ...         0         0         0         0

[60000 rows x 784 columns]
0        2
1        9
2        6
3        0
4        3
        ..
59995    9
59996    1
59997    8
59998    8
59999    7
Name: label, Length: 60000, dtype: int64

X_test = test[test.columns[1:]]
Y_test = test['label']
print(X_test)
print(Y_test)

      pixel1  pixel2  pixel3  pixel4  ...  pixel781  pixel782  pixel783  pixel784
0          0       0       0       0  ...         0         0         0         0
1          0       0       0       0  ...         0         0         0         0
2          0       0       0       0  ...        31         0         0         0
3          0       0       0       0  ...       222        56         0         0
4          0       0       0       0  ...         0         0         0         0
...      ...     ...     ...     ...  ...       ...       ...       ...       ...
9995       0       0       0       0  ...         1         0         0         0
9996       0       0       0       0  ...        28         0         0         0
9997       0       0       0       0  ...        42         0         1         0
9998       0       1       3       0  ...         0         0         0         0
9999       0       0       0       0  ...         0         0         0         0

[10000 rows x 784 columns]
0       0
1       1
2       2
3       2
4       3
       ..
9995    0
9996    6
9997    8
9998    8
9999    1
Name: label, Length: 10000, dtype: int64

랜덤포레스트¶

model1 = RandomForestClassifier(n_estimators=20)
rdc = model1.fit(X_train, Y_train)
output1 = rdc .predict(X_test)

print("RandomForest with n_estimators = 100")
print(accuracy_score(Y_test, output1))

RandomForest with n_estimators = 100
0.8679

랜덤포레스트(WITH GridSearchCV)¶

rf = RandomForestClassifier()
## Grid Search
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
rf_param_grid = {
    "max_depth": [None],
    "max_features": [1, 3, 10],
    "min_samples_split": [2, 3, 10],
    "min_samples_leaf": [1, 3, 10],
    "bootstrap": [False],
    "n_estimators": [10, 20]
}
rf_grid = GridSearchCV(rf, param_grid = rf_param_grid, scoring="accuracy", n_jobs=4, verbose=1)
rf_grid.fit(X_train,Y_train)

Fitting 5 folds for each of 54 candidates, totalling 270 fits

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:   18.2s
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed:  1.6min
[Parallel(n_jobs=4)]: Done 270 out of 270 | elapsed:  3.2min finished

GridSearchCV(cv=None, error_score=nan,
             estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                              class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              max_samples=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators=100, n_jobs=None,
                                              oob_score=False,
                                              random_state=None, verbose=0,
                                              warm_start=False),
             iid='deprecated', n_jobs=4,
             param_grid={'bootstrap': [False], 'max_depth': [None],
                         'max_features': [1, 3, 10],
                         'min_samples_leaf': [1, 3, 10],
                         'min_samples_split': [2, 3, 10],
                         'n_estimators': [10, 20]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=1)

rf_best = rf_grid.best_estimator_
print(rf_grid.best_score_)
print(rf_grid)

0.8565066483379156
GridSearchCV(cv=None, error_score=nan,
             estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                              class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              max_samples=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators=100, n_jobs=None,
                                              oob_score=False,
                                              random_state=None, verbose=0,
                                              warm_start=False),
             iid='deprecated', n_jobs=4,
             param_grid={'bootstrap': [False], 'max_depth': [None],
                         'max_features': [1, 3, 10],
                         'min_samples_leaf': [1, 3, 10],
                         'min_samples_split': [2, 3, 10],
                         'n_estimators': [10, 20]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=1)

rf_best = rf_grid.best_estimator_

GradientBoostingClassifier¶

gb = GradientBoostingClassifier()
## Grid Search
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
gb_param_grid = {
    "loss":["deviance"],
    "n_estimators": [3, 4],
    "learning_rate": [0.1, 0.01],
    "max_depth": [4, 8],
    "max_features": [0.3, 0.1],
    "min_samples_leaf": [10, 15]
    
}
gb_grid = GridSearchCV(gb, param_grid = gb_param_grid, scoring="accuracy", n_jobs=4, verbose=1)

gb_grid.fit(X_train,Y_train)

gb_best = gb_grid.best_estimator_
print(gb_grid.best_score_)
print(gb_best)

Fitting 5 folds for each of 32 candidates, totalling 160 fits

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  7.7min
[Parallel(n_jobs=4)]: Done 160 out of 160 | elapsed: 37.5min finished

0.8412575606098475
GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=8,
                           max_features=0.1, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=15, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=4,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

MLPClassifier¶

# ===================sgd====================================
mlp_sgd = MLPClassifier(solver='sgd', hidden_layer_sizes =(100,), random_state = 1)
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
mlp_param_grid= {
    "loss":["deviance"],
    "n_estimators": [3, 4],
    "learning_rate": [0.1, 0.01]
}
mlp_grid = GridSearchCV(gb, param_grid = mlp_param_grid, scoring="accuracy", n_jobs=4, verbose=1)
mlp_grid.fit(X_train,Y_train)
mlp_sgd_best = mlp_grid.best_score_
print(mlp_grid)
print(mlp_grid.best_score_)

Fitting 5 folds for each of 4 candidates, totalling 20 fits

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  20 out of  20 | elapsed: 11.7min finished

GridSearchCV(cv=None, error_score=nan,
             estimator=GradientBoostingClassifier(ccp_alpha=0.0,
                                                  criterion='friedman_mse',
                                                  init=None, learning_rate=0.1,
                                                  loss='deviance', max_depth=3,
                                                  max_features=None,
                                                  max_leaf_nodes=None,
                                                  min_impurity_decrease=0.0,
                                                  min_impurity_split=None,
                                                  min_samples_leaf=1,
                                                  min_samples_split=2,
                                                  min_weight_fraction_leaf=0.0,
                                                  n_estimators=100,
                                                  n_iter_no_change=None,
                                                  presort='deprecated',
                                                  random_state=None,
                                                  subsample=1.0, tol=0.0001,
                                                  validation_fraction=0.1,
                                                  verbose=0, warm_start=False),
             iid='deprecated', n_jobs=4,
             param_grid={'learning_rate': [0.1, 0.01], 'loss': ['deviance'],
                         'n_estimators': [3, 4]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=1)
0.7741607598100475

# ===================adam====================================
print("===================adam====================================")
mlp_adam = MLPClassifier(solver='adam', hidden_layer_sizes =(100,), random_state = 1)
X_train = X_train.loc[:20000]
Y_train = Y_train.loc[:20000]
mlp_param_grid_adam= {
    "activation":["relu"]
}

mlp_a_grid = GridSearchCV(mlp_adam,param_grid=mlp_param_grid_adam,n_jobs=-1, verbose=1)

mlp_a_grid.fit(X_train,Y_train)
mlp_a_grid_best= mlp_a_grid.best_score_
print(mlp_a_grid)
print(mlp_a_grid.best_score_)

===================adam====================================
Fitting 5 folds for each of 1 candidates, totalling 5 fits

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 out of   5 | elapsed:  2.5min finished

GridSearchCV(cv=None, error_score=nan,
             estimator=MLPClassifier(activation='relu', alpha=0.0001,
                                     batch_size='auto', beta_1=0.9,
                                     beta_2=0.999, early_stopping=False,
                                     epsilon=1e-08, hidden_layer_sizes=(100,),
                                     learning_rate='constant',
                                     learning_rate_init=0.001, max_fun=15000,
                                     max_iter=200, momentum=0.9,
                                     n_iter_no_change=10,
                                     nesterovs_momentum=True, power_t=0.5,
                                     random_state=1, shuffle=True,
                                     solver='adam', tol=0.0001,
                                     validation_fraction=0.1, verbose=False,
                                     warm_start=False),
             iid='deprecated', n_jobs=4, param_grid={'activation': ['relu']},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=1)
0.807657648087978

mlp_a_grid_best= mlp_a_grid.best_score_

SVC¶

svc = SVC() # 속도가 너무 느림
# 찾아보니 2만개 이상의 데이터셋의 경우 속도가 매우 느려진다고 하였다.
# 이에 svc학습 데이터로 데이터를 줄이기로 하였다.
X_train_m = X_train.loc[:10000]
Y_train_m = Y_train.loc[:10000]

scaler 적용하기¶

# scaler 적용하기
# MinMaxScaler로 이용하기 preprocessing.Min
scaler = MinMaxScaler()
data_train = scaler.fit_transform(train.astype(np.float32))
data_test = scaler.fit_transform(test.astype(np.float32))

x_scaled = train.iloc[:,1:].values
y_scaled = train.iloc[:,0].values
# SCALER을 적용하였음에도 시간이 너무 오래걸려서 10000개로 줄임
x_scaled = pd.DataFrame(x_scaled).loc[:10000]
y_scaled = pd.DataFrame(y_scaled).loc[:10000]
x_scaled

svc = SVC()
svc_param_grid = {'kernel':['linear'],
                  'gamma':[0.001,0.01,0.1,0.5,1],
                  'C':[0.01,0.1,1,10,50,100,200,300]}

svc_grid = GridSearchCV(svc, param_grid = svc_param_grid, scoring="accuracy",n_jobs=-1, verbose=1)
# X_train

svc_grid.fit(x_scaled,y_scaled)
svc_best = svc_grid.best_estimator_
print(svc_grid.best_score_)
print(svc_grid.best_params_)

Fitting 5 folds for each of 40 candidates, totalling 200 fits

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:  9.6min
[Parallel(n_jobs=-1)]: Done 196 tasks      | elapsed: 40.3min
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 41.1min finished
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

0.8074183408295852
{'C': 0.01, 'gamma': 0.001, 'kernel': 'linear'}

svc = SVC()
svc_param_grid = {'kernel':['rbf'],
                  'gamma':[0.001,0.01,0.1,0.5,1],
                  'C':[0.01,0.1,1,10,50,100,200,300]}

svc_grid = GridSearchCV(svc, param_grid = svc_param_grid, scoring="accuracy",n_jobs=-1, verbose=1)
# X_train

svc_grid.fit(x_scaled,y_scaled)
svc_best = svc_grid.best_estimator_
print(svc_grid.best_score_)
print(svc_grid.best_params_)

Fitting 5 folds for each of 40 candidates, totalling 200 fits

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed: 79.5min
[Parallel(n_jobs=-1)]: Done 196 tasks      | elapsed: 329.2min
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 335.7min finished

0.11318965517241379
{'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}

[pandas를 활용한 데이터분석]SK infosec 클라우드 AI 전문가 양성과정 실습과제 (0)	2020.09.08
[pandas를 활용한 데이터분석]SK infosec 클라우드 AI 전문가 양성과정 수업필기본 (0)	2020.09.08
[PYTHON데이터분석 2020/09/07-2] SK infosec 클라우드 AI 전문가 양성과정 수업필기본 (0)	2020.09.07
[PYTHON데이터분석 2020/09/07-1] SK infosec 클라우드 AI 전문가 양성과정 수업필기본 (0)	2020.09.07
[PYTHON데이터분석 2020/09/01] SK infosec 클라우드 AI 전문가 양성과정 수업필기본 (0)	2020.09.07

개발인가 메모장인가

[Pandas를 이용한 데이터 분석mnist-fashion, svm, decision tree]SK infosec 클라우드 AI 전문가 양성과정 수업 실습내용

mnist fashion¶

1. test 데이터 중 첫번째 데이터를 이미지로 표시하기¶

2. Fashion-mnist_train.csv(60000개), fashion-mnist_test.csv(10000개)¶

MODEL에 학습하기¶

랜덤포레스트¶

랜덤포레스트(WITH GridSearchCV)¶

GradientBoostingClassifier¶

MLPClassifier¶

SVC¶

scaler 적용하기¶

GridSearchCV적용한 결과물 확인하기¶

'개발 > sk infosec cloud ai 전문가 양성과정' 카테고리의 다른 글

'개발/sk infosec cloud ai 전문가 양성과정'의 다른글

티스토리툴바

	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9	pixel10	pixel11	pixel12	pixel13	pixel14	pixel15	pixel16	pixel17	pixel18	pixel19	pixel20	pixel21	pixel22	pixel23	pixel24	pixel25	pixel26	pixel27	pixel28	pixel29	pixel30	pixel31	pixel32	pixel33	pixel34	pixel35	pixel36	pixel37	pixel38	pixel39	pixel40	...	pixel745	pixel746	pixel747	pixel748	pixel749	pixel750	pixel751	pixel752	pixel753	pixel754	pixel755	pixel756	pixel757	pixel758	pixel759	pixel760	pixel761	pixel762	pixel763	pixel764	pixel765	pixel766	pixel767	pixel768	pixel769	pixel770	pixel771	pixel772	pixel773	pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783	pixel784
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	5	0	0	0	105	92	101	107	100	132	0	0	2	4	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	150	227	...	211	220	214	74	0	255	222	128	0	0	0	0	0	0	0	0	0	44	12	0	0	40	134	162	191	214	163	146	165	79	0	0	0	30	43	0	0	0	0	0
3	0	0	0	1	2	0	0	0	0	0	114	183	112	55	23	72	102	165	160	28	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	24	188	163	93	136	...	171	249	207	197	202	45	0	3	0	0	0	0	0	0	0	0	0	0	1	0	0	0	22	21	25	69	52	45	74	39	3	0	0	0	0	1	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	46	0	21	68	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	25	187	189	0	...	230	237	229	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	68	116	112	136	147	144	121	102	63	0	0	0	0	0	0	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9996	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9997	0	0	0	2	0	22	126	0	5	0	0	10	46	0	0	0	31	48	91	0	0	26	16	0	0	0	0	0	0	0	0	2	0	81	156	36	254	124	42	219	...	130	129	118	120	120	120	121	144	147	171	168	10	0	0	28	116	140	180	198	218	242	30	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9998	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9999	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
10000	0	0	0	0	0	0	1	1	0	101	112	103	120	102	137	131	138	123	147	82	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	160	203	198	...	0	0	0	1	0	0	152	185	97	0	0	0	0	0	0	0	22	90	95	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	95	110	47	0	0	0

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	...	744	745	746	747	748	749	750	751	752	753	754	755	756	757	758	759	760	761	762	763	764	765	766	767	768	769	770	771	772	773	774	775	776	777	778	779	780	781	782	783
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	5	0	0	0	105	92	101	107	100	132	0	0	2	4	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	150	227	...	211	220	214	74	0	255	222	128	0	0	0	0	0	0	0	0	0	44	12	0	0	40	134	162	191	214	163	146	165	79	0	0	0	30	43	0	0	0	0	0
3	0	0	0	1	2	0	0	0	0	0	114	183	112	55	23	72	102	165	160	28	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	24	188	163	93	136	...	171	249	207	197	202	45	0	3	0	0	0	0	0	0	0	0	0	0	1	0	0	0	22	21	25	69	52	45	74	39	3	0	0	0	0	1	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	46	0	21	68	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	25	187	189	0	...	230	237	229	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	68	116	112	136	147	144	121	102	63	0	0	0	0	0	0	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9996	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9997	0	0	0	2	0	22	126	0	5	0	0	10	46	0	0	0	31	48	91	0	0	26	16	0	0	0	0	0	0	0	0	2	0	81	156	36	254	124	42	219	...	130	129	118	120	120	120	121	144	147	171	168	10	0	0	28	116	140	180	198	218	242	30	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9998	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9999	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
10000	0	0	0	0	0	0	1	1	0	101	112	103	120	102	137	131	138	123	147	82	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	160	203	198	...	0	0	0	1	0	0	152	185	97	0	0	0	0	0	0	0	22	90	95	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	95	110	47	0	0	0

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

[Pandas를 이용한 데이터 분석mnist-fashion, svm, decision tree]SK infosec 클라우드 AI 전문가 양성과정 수업 실습내용

mnist fashion¶

1. test 데이터 중 첫번째 데이터를 이미지로 표시하기¶

2. Fashion-mnist_train.csv(60000개), fashion-mnist_test.csv(10000개)¶

MODEL에 학습하기¶

랜덤포레스트¶

랜덤포레스트(WITH GridSearchCV)¶

GradientBoostingClassifier¶

MLPClassifier¶

SVC¶

scaler 적용하기¶

GridSearchCV적용한 결과물 확인하기¶

'개발 > sk infosec cloud ai 전문가 양성과정' 카테고리의 다른 글

'개발/sk infosec cloud ai 전문가 양성과정'의 다른글

관련글

티스토리툴바