머신러닝 - Deep Learning from scratch2019년 3월 31일

*Weight Initialization에 관해(잘못이해한 지식)

발생하는 문제 우리는 신경망을 학습시킬 때 제대로 학습이 이루어지지 않거나(under fitting) 혹은 이루어진 후에 실생활에 잘 들어먹지 않을 때가 발생한다. (over fitting) under fitting은 …

머신러닝 - Deep Learning from scratch2019년 3월 29일

M.L (p.203)

# coding: utf-8
import numpy as np
import matplotlib.pyplot as plt


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def ReLU(x):
    return np.maximum(0, x)


def tanh(x):
    return np.tanh(x)


input_data = np.random.randn(1000, 100)  # 1000개의 데이터
node_num = 100  # 각 은닉층의 노드(뉴런) 수
hidden_layer_size = 5  # 은닉층이 5개
activations = {}  # 이곳에 활성화 결과를 저장

x = input_data

for i in range(hidden_layer_size):
    if i != 0:
        x = activations[i-1]

    # 초깃값을 다양하게 바꿔가며 실험해보자！
    w = np.random.randn(node_num, node_num) * 1
    #w = np.random.randn(node_num, node_num) * 0.01
    #w = np.random.randn(node_num, node_num) * np.sqrt(1.0 / node_num)
    #w = np.random.randn(node_num, node_num) * np.sqrt(2.0 / node_num)
    #w = np.zeros((node_num,node_num))
    #w = np.ones((node_num,node_num))
    #w = np.full((node_num,node_num), 5)

    a = np.dot(x, w)

    # 활성화 함수도 바꿔가며 실험해보자！
    z = sigmoid(a)
    # z = ReLU(a)
    # z = tanh(a)

    activations[i] = z

# 히스토그램 그리기
for i, a in activations.items():
    plt.subplot(1, len(activations), i+1)
    plt.title(str(i+1) + "-layer")
    if i != 0:
        plt.yticks([], [])
    # plt.xlim(0.1, 1)
    # plt.ylim(0, 7000)
    plt.hist(a.flatten(), 30, range=(0, 1))

plt.show()

# coding: utf-8

import numpy as np

import matplotlib.pyplot as plt

def sigmoid(x):

return 1 / (1 + np.exp(-x))

def ReLU(x):

return np.maximum(0, x)

def tanh(x):

return np.tanh(x)

input_data = np.random.randn(1000, 100) # 1000개의 데이터

node_num = 100 # 각 은닉층의 노드(뉴런) 수

hidden_layer_size = 5 # 은닉층이 5개

activations = {} # 이곳에 활성화 결과를 저장

x = input_data

for i in range(hidden_layer_size):

if i != 0:

x = activations[i-1]

# 초깃값을 다양하게 바꿔가며 실험해보자！

w = np.random.randn(node_num, node_num) * 1

#w = np.random.randn(node_num, node_num) * 0.01

#w = np.random.randn(node_num, node_num) * np.sqrt(1.0 / node_num)

#w = np.random.randn(node_num, node_num) * np.sqrt(2.0 / node_num)

#w = np.zeros((node_num,node_num))

#w = np.ones((node_num,node_num))

#w = np.full((node_num,node_num), 5)

a = np.dot(x, w)

# 활성화 함수도 바꿔가며 실험해보자！

z = sigmoid(a)

# z = ReLU(a)

# z = tanh(a)

activations[i] = z

# 히스토그램 그리기

for i, a in activations.items():

plt.subplot(1, len(activations), i+1)

plt.title(str(i+1) + "-layer")

if i != 0:

plt.yticks([], [])

# plt.xlim(0.1, 1)

# plt.ylim(0, 7000)

plt.hist(a.flatten(), 30, range=(0, 1))

plt.show()

위의 코드는 가중치 값들의 여러가지 초기화방법과 활성화 함수 3가지를 가지고 신경망 학습을 할 때 이런 초기화 값들에 의해 데이터의 분포가 어떻게 이뤄지는지 그래프로 보여주는…

머신러닝 - Deep Learning from scratch2019년 3월 29일

M.L (p.202)

가중치의 초깃값 overfitting 이란 ? 신경망 학습에서 training data set에서는 높은 정확도를 보이는데 test data set에서는 그만큼의 정확도 보다 낮은 정확도를 가지는 현상으로, 보통 훈련…

머신러닝 - Deep Learning from scratch2019년 3월 25일

M.L (p.201)

MNIST 데이터셋으로 본 Adam, SGD, AdaGrad, Momentum 네가지 학습을 비교해본 것이다. 학습률과 신경망의 구조 등에 따라 결과가 달라진다는 점 주의. SGD보다 다른 세 기법이 빠르게…

머신러닝 - Deep Learning from scratch2019년 3월 25일

M.L (p.196)

AdaGrad 신경망 학습에서 학습을 할 때, 학습률을 너무 작게 주면 거의 움직이지 않고, 크게 주면 발산해버리는 문제가 발생한다. 얼만큼 학습률을 주어야 적절한 학습을 할 수…

머신러닝 - Deep Learning from scratch2019년 3월 25일

M.L (p.195)

Momentum : 운동량 물리와 관계가 있는 idea로

v ← αv - η * ∂L/∂W
W ← W + v

W : 갱신할 가중치 매개변수
∂L/∂W : W에 대한 손실 함수의 기울기
η : 학습률
v : 속도. 기울기 방향으로 힘을 받아 물체가 가속되는 것을 나타냄
α : 마찰/저항에 해당(0.9)

v ← αv - η * ∂L/∂W

W ← W + v

W : 갱신할 가중치 매개변수

∂L/∂W : W에 대한 손실 함수의 기울기

η : 학습률

v : 속도. 기울기 방향으로 힘을 받아 물체가 가속되는 것을 나타냄

α : 마찰/저항에 해당(0.9)

class Momentum:
    def __init__(self,lr=0.01, momentum=0.9):
        self.lr = lr
        self.momentum = momentum
        self.v = None

    def update(self,params,grads):
        if self.v is None:
            self.v = {}
            for key, val in params.items():
                self.v[key] = np.zeros_like(val)

            for key in params.keys():
                self.v[key] = self.momentum * self.v[key] - self.lr * grads[key]
                params[key] += self.v[key]

class Momentum:

def __init__(self,lr=0.01, momentum=0.9):

self.lr = lr

self.momentum = momentum

self.v = None

def update(self,params,grads):

if self.v is None:

self.v = {}

for key, val in params.items():

self.v[key] = np.zeros_like(val)

for key in params.keys():

self.v[key] = self.momentum * self.v[key] - self.lr * grads[key]

params[key] += self.v[key]

mometum을 모듈화한 코드다.

d = {1:'apple', 2 :'banana',3:'mango', 4:'fafa'}
print(d)
print(type(d))
for key,val in d.items():
    print("key:"+str(key)+"val:"+str(val))

d = {1:'apple', 2 :'banana',3:'mango', 4:'fafa'}

print(d)

print(type(d))

for key,val in d.items():

print("key:"+str(key)+"val:"+str(val))

참고로 items 라는 함수는 dictionary 에서 key와 value 값을 쌍으로…

머신러닝 - Deep Learning from scratch2019년 3월 25일

M.L (p.190)

Stochastic Gradient Descent method ( SGD ) 확률적 경사하강법 1.기능 W 는 가중치 매개변수이고 (eta)는 학습률, 라운드디L / 라운드디W 는 손실함수의 기울기를 뜻한다. 즉 W값을…

머신러닝 - Deep Learning from scratch2019년 3월 23일

Relu_code

import numpy as np

class relu:
    def __init__(self):
        self.grad = None

    def Relu(self,x):
        self.grad = (x <= 0)
        out = x.copy()
        out[self.grad] = 0.0
        return out

    def Relu_grad(self,dout):
        dout[self.grad] = 0.0
        dx = dout
        return dx

dout = np.array([-2.0,4,-1.1,5])
func = relu()
x = np.array([-1.0,2.1,-0.3,4.2])
print(func.Relu(x))
print(func.Relu_grad(dout))

import numpy as np

class relu:

def __init__(self):

self.grad = None

def Relu(self,x):

self.grad = (x <= 0)

out = x.copy()

out[self.grad] = 0.0

return out

def Relu_grad(self,dout):

dout[self.grad] = 0.0

dx = dout

return dx

dout = np.array([-2.0,4,-1.1,5])

func = relu()

x = np.array([-1.0,2.1,-0.3,4.2])

print(func.Relu(x))

print(func.Relu_grad(dout))

Relu function으로 입력 값 x <= 0 이면 0을 , x > 0 이면 x 자체를 출력하는 function이다. 순전파 시에는 x <= 0 에…

머신러닝 - Deep Learning from scratch2019년 3월 23일

sigmoid_code

import numpy as np

x = np.array([[1,2,3],[4,5,6]])
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_grad(x):
    return sigmoid(x) * (1 - sigmoid(x))
def sigmoid_grad_1(x):
    return sigmoid(x) * (1 - sigmoid(x))
print(sigmoid_grad_1(x))
print(sigmoid_grad(x))

import numpy as np

x = np.array([[1,2,3],[4,5,6]])

def sigmoid(x):

return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):

return sigmoid(x) * (1 - sigmoid(x))

def sigmoid_grad_1(x):

return sigmoid(x) * (1 - sigmoid(x))

print(sigmoid_grad_1(x))

print(sigmoid_grad(x))

sigmoid = 순전파 : 1 / (1 + exp(-x) ) 역전파 : y(1-y) 로 y는 sigmoid에 해당 행렬이라 연산 위치에 따라 값이 바뀔 줄…

머신러닝 - Deep Learning from scratch2019년 3월 23일

M.L (p.177)

sotfmax-with-loss 계층 그래프 softmax 와 cross entropy error 를 한꺼번에 순전파와 역전파로 구현한다. 순전파의 (3 layer에 한에)마지막 출력 값은 각 입력 값 y1, y2, y3…