In this project, you will build your first neural network and use it to predict the number of daily bicycle rentals. We provide some code, but you need to implement a neural network (most of the content). After submitting this project, you are welcome to explore this data and model further.
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
A key step in building a neural network is to properly prepare the data. Variables at different scales make it difficult for the network to efficiently grasp the correct weights. We have provided the code to load and prepare the data below. You will learn more about this code soon!
data_path = 'Bike-Sharing-Dataset/hour.csv'
rides = pd.read_csv(data_path)
rides.head()
This data set contains the number of bikes per hour per day from January 1, 2011 to December 31, 2012. Cycling users are divided into temporary users and registered users, and the cnt column is a summary of the number of cycling users. You can see the first few rows of data at the top.
The figure below shows the number of riders in the first 10 days of the data set (some days are not necessarily 24 entries, so it is not an exact 10 days). You can see the hourly rent here. These data are complicated! The number of rides on weekends is less, and during workdays, it is the peak of cycling. We can also see temperature, humidity and wind speed information from the above data, all of which will affect the number of riders. You need to use your model to display all of this data.
rides[:24*10].plot(x='dteday', y='cnt')
Below are some categorical variables such as season, weather, and month. To include this data in our model, we need to create a binary dummy variable. Pandas library use get_dummies()
can be easily achieved.
dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']
for each in dummy_fields:
dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)
rides = pd.concat([rides, dummies], axis=1)
fields_to_drop = ['instant', 'dteday', 'season', 'weathersit',
'weekday', 'atemp', 'mnth', 'workingday', 'hr']
data = rides.drop(fields_to_drop, axis=1)
data.head()
To make it easier to train the network, we will normalize each continuous variable by transforming and adjusting the variables so that their mean is 0 and the standard deviation is 1.
We save the conversion factor so that we can restore the data when we use the network for forecasting.
quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']
# Store scalings in a dictionary so we can convert back later
scaled_features = {}
for each in quant_features:
mean, std = data[each].mean(), data[each].std()
scaled_features[each] = [mean, std]
data.loc[:, each] = (data[each] - mean)/std
We save approximately the last 21 days of data as test data sets that will be used after training the network. We will use this data set for forecasting and compare it to the actual number of riders.
# Save data for approximately the last 21 days
test_data = data[-21*24:]
# Now remove the test data from the data set
data = data[:-21*24]
# Separate the data into features and targets
target_fields = ['cnt', 'casual', 'registered']
features, targets = data.drop(target_fields, axis=1), data[target_fields]
test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fields]
We split the data into two data sets, one for training and one for verifying the network after the network is trained. Because the data is time-series, we train with historical data and then try to predict future data (verify the data set).
# Hold out the last 60 days or so of the remaining data as a validation set
train_features, train_targets = features[:-60*24], targets[:-60*24]
val_features, val_targets = features[-60*24:], targets[-60*24:]
Below you will build your own network. We have built the structure and the reverse pass part. You will implement the forward delivery part of the network. You also need to set the hyperparameters: the learning rate, the number of hidden units, and the number of training passes.
The network has two levels, a hidden layer and an output layer. The hidden hierarchy will use the sigmoid function as the activation function. The output layer has only one node for recursion, and the output of the node is the same as the input of the node. That is, the activation function is . This function takes the input signal and produces an output signal, but takes into account the threshold, called the activation function. We complete each level of the network and calculate the output of each neuron. All outputs of one level become inputs to the next level of neurons. This process is called forward propagation.
We use weights in the neural network to propagate signals from the input layer to the output layer. We also use weights to propagate errors from the output layer back to the network in order to update the weights. This is called backpropagation.
Hint : You need to calculate the output activation function for backpropagation ( The derivative of ). If you are not familiar with calculus, the function is equivalent to the equation . What is the slope of this equation? That is the derivative .
You need to complete the following tasks:
__init__
in self.activation_function
to your S-type function.train
the implemented prior to delivery process.train
implementation of back propagation algorithm includes calculating the output error.run
the implemented prior to delivery process.class NeuralNetwork(object):
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes
# Initialize weights
self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5,
(self.input_nodes, self.hidden_nodes))
self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))
self.lr = learning_rate
#### TODO: Set self.activation_function to your implemented sigmoid function ####
#
# Note: in Python, you can define a function with a lambda expression,
# as shown below.
self.activation_function = lambda x : 1/(1 + np.exp(-x)) # Replace 0 with your sigmoid calculation.
### If the lambda code above is not something you're familiar with,
# You can uncomment out the following three lines and put your
# implementation there instead.
#
#def sigmoid(x):
# return 0 # Replace 0 with your sigmoid calculation here
#self.activation_function = sigmoid
def train(self, features, targets):
''' Train the network on batch of features and targets.
Arguments
---------
features: 2D array, each row is one data record, each column is a feature
targets: 1D array of target values
'''
n_records = features.shape[0]
delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)
delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)
for X, y in zip(features, targets):
#### Implement the forward pass here ####
### Forward pass ###
# TODO: Hidden layer - Replace these values with your calculations.
hidden_inputs = np.dot(X, self.weights_input_to_hidden) # signals into hidden layer
hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer
# TODO: Output layer - Replace these values with your calculations.
final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals into final output layer
final_outputs = final_inputs # signals from final output layer
# IMPORTANT: NOTE THAT YOU SHOULD NOT ADD ACTIVATION FUNCTION HERE AGAIN!! HATE DEBUGGING!
#### Implement the backward pass here ####
### Backward pass ###
# TODO: Output error - Replace this value with your calculations.
error = y - final_outputs # Output layer error is the difference between desired target and actual output.
# TODO: Calculate the hidden layer's contribution to the error
hidden_error = np.dot(self.weights_hidden_to_output,error) # IMPORTANT: note that error here is actually output_error_term
# TODO: Backpropagated error terms - Replace these values with your calculations.
output_error_term = error # IMPORTANT: I do not have activation function, so I do not look like the guy below me!
hidden_error_term = hidden_error * (hidden_outputs * (1-hidden_outputs)) # IMPORTANT: (hidden_outputs * (1-hidden_outputs)) this stuff comes from the activation function
# Weight step (input to hidden)
delta_weights_i_h += hidden_error_term * X[:, None] # IMPORTANT: [:, None] means **(-1) to a matrix. Remember to transform!
# Weight step (hidden to output)
delta_weights_h_o += output_error_term * hidden_outputs[:, None] # IMPORTANT: Here should be hidden_outputs, not final_inputs!
# TODO: Update the weights - Replace these values with your calculations.
self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records # update hidden-to-output weights with gradient descent step
self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records # update input-to-hidden weights with gradient descent step
def run(self, features):
''' Run a forward pass through the network with input features
Arguments
---------
features: 1D array of feature values
'''
#### Implement the forward pass here ####
# TODO: Hidden layer - replace these values with the appropriate calculations.
hidden_inputs = np.dot(features, self.weights_input_to_hidden) # signals into hidden layer
hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer
# TODO: Output layer - Replace these values with the appropriate calculations.
final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals into final output layer
final_outputs = final_inputs # signals from final output layer
# IMPORTANT: NOTE THAT YOU SHOULD NOT ADD ACTIVATION FUNCTION HERE AGAIN!! HATE DEBUGGING!
return final_outputs
def MSE ( y , Y ):
return np . mean (( and - Y ) ** 2 )
Run these unit tests to check if your network implementation is correct. This will help you ensure that the network is implemented correctly before you start training the network. These tests must be successful in order to pass this project.
import unittest
inputs = np.array([[0.5, -0.2, 0.1]])
targets = np.array([[0.4]])
test_w_i_h = np.array([[0.1, -0.2],
[0.4, 0.5],
[-0.3, 0.2]])
test_w_h_o = np.array([[0.3],
[-0.1]])
class TestMethods(unittest.TestCase):
##########
# Unit tests for data loading
##########
def test_data_path(self):
# Test that file path to dataset has been unaltered
self.assertTrue(data_path.lower() == 'bike-sharing-dataset/hour.csv')
def test_data_loaded(self):
# Test that data frame loaded
self.assertTrue(isinstance(rides, pd.DataFrame))
##########
# Unit tests for network functionality
##########
def test_activation(self):
network = NeuralNetwork(3, 2, 1, 0.5)
# Test that the activation function is a sigmoid
self.assertTrue(np.all(network.activation_function(0.5) == 1/(1+np.exp(-0.5))))
def test_train(self):
# Test that weights are updated correctly on training
network = NeuralNetwork(3, 2, 1, 0.5)
network.weights_input_to_hidden = test_w_i_h.copy()
network.weights_hidden_to_output = test_w_h_o.copy()
network.train(inputs, targets)
self.assertTrue(np.allclose(network.weights_hidden_to_output,
np.array([[ 0.37275328],
[-0.03172939]])))
self.assertTrue(np.allclose(network.weights_input_to_hidden,
np.array([[ 0.10562014, -0.20185996 ],
[ 0.39775194 , 0.50074398 ],
[ - 0.29887597 , 0.19962801 ]])))
def test_run(self):
# Test correctness of run method
network = NeuralNetwork(3, 2, 1, 0.5)
network.weights_input_to_hidden = test_w_i_h.copy()
network.weights_hidden_to_output = test_w_h_o.copy()
self.assertTrue(np.allclose(network.run(inputs), 0.09998924))
suite = unittest.TestLoader().loadTestsFromModule(TestMethods())
unittest.TextTestRunner().run(suite)
Now you will set the hyperparameters of the network. The strategy is to set the hyperparameter so that the error on the training set is small but the data does not over fit. If the network training time is too long, or there are too many hidden nodes, it may be too specific for a specific training set and cannot be generalized to the verification data set. That is, when the loss of the training set is reduced, the loss of the verification set will begin to increase.
You will also train the network using the Random Gradient Descent (SGD) method. For each training, random sample data is obtained instead of the entire data set. Compared to the normal gradient drop, the number of trainings is more, but each time is shorter. In this case, network training is more efficient. You will learn more about SGD later.
This is the number of batches sampled from the training data when training the network. The more iterations, the more the model fits the data. However, if the number of iterations is too large, the model will not be well generalized to other data, which is called overfitting. You need to choose a number that will make the training loss low and the verification loss to be medium. When you start fitting, you will find that the training loss continues to drop, but the verification loss begins to rise.
The rate can adjust the weight update range. If the rate is too large, the weight will be too large, causing the network to fail to fit the data. It is recommended to start with 0.1. If the network is experiencing problems with the data, try reducing the learning rate. Note that the lower the learning rate, the smaller the step size of the weight update and the longer the neural network converges.
The more hidden nodes, the more accurate the prediction results of the model. Try different number of hidden nodes to see how it affects performance. You can look at the loss dictionary and look for network performance metrics. If the number of hidden units is too small, then the model does not have enough space to learn. If there are too many, there are too many choices in the learning direction. The trick to choosing the number of hidden units is to find the right balance.
import sys
### TODO:Set the hyperparameters here, you need to change the defalut to get a better solution ###
iterations = 5000 # Actually training to 2000 has already been train_loss and val_loss is gone, no need at least 5000 epoch bar
learning_rate = 0.8
hidden_nodes = 15
output_nodes = 1 # Since the output_nodes option is given, can output_nodes be >1? Why?
N_i = train_features.shape[1]
network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)
losses = {'train':[], 'validation':[]}
for ii in range(iterations):
# Go through a random batch of 128 records from the training data set
batch = np.random.choice(train_features.index, size=128)
X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt']
network.train(X, y)
# Printing out the training progress
train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values)
val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values)
sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \
+ "% ... Training loss: " + str(train_loss)[:5] \
+ " ... Validation loss: " + str(val_loss)[:5])
sys.stdout.flush()
losses['train'].append(train_loss)
losses['validation'].append(val_loss)
plt.plot(losses['train'], label='Training loss')
plt.plot(losses['validation'], label='Validation loss')
plt.legend()
_ = plt.ylim()
Use test data to see how the network is performing on data modeling. If it's completely wrong, make sure that every step in the network is implemented correctly.
fig, ax = plt.subplots(figsize=(8,4))
mean, std = scaled_features['cnt']
predictions = network.run(test_features).T*std + mean
ax.plot(predictions[0], label='Prediction')
ax.plot((test_targets['cnt']*std + mean).values, label='Data')
ax.set_xlim(right=len(predictions))
ax.legend()
dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)
Please answer the following questions for your results. How does the model predict the data? Where is the problem? Why is there a problem?
Note : You can edit the text by double clicking on the unit. If you want to preview the text, press Control + Enter
The model made the prediction really well using some-what deep neuro-network
But there is a question I still have: According to the tranning loss plot
the model never overfits. I do not understand why. My guese is that we only have one hidden layer, making it impossible to overfit with 3-5 nodes.(I am using my laptop, I did not try some crazy numbers for hyper-paremeters. This may be why it never overfits)