Your first neural network¶

In this project, you will build your first neural network and use it to predict the number of daily bicycle rentals. We provide some code, but you need to implement a neural network (most of the content). After submitting this project, you are welcome to explore this data and model further.

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import  numpy  as  np 
import  pandas  as  pd 
import  matplotlib.pyplot  as  plt

Load and prepare data¶

A key step in building a neural network is to properly prepare the data. Variables at different scales make it difficult for the network to efficiently grasp the correct weights. We have provided the code to load and prepare the data below. You will learn more about this code soon!

data_path = 'Bike-Sharing-Dataset/hour.csv'

rides = pd.read_csv(data_path)

rides.head()

Data Introduction ¶

This data set contains the number of bikes per hour per day from January 1, 2011 to December 31, 2012. Cycling users are divided into temporary users and registered users, and the cnt column is a summary of the number of cycling users. You can see the first few rows of data at the top.

The figure below shows the number of riders in the first 10 days of the data set (some days are not necessarily 24 entries, so it is not an exact 10 days). You can see the hourly rent here. These data are complicated! The number of rides on weekends is less, and during workdays, it is the peak of cycling. We can also see temperature, humidity and wind speed information from the above data, all of which will affect the number of riders. You need to use your model to display all of this data.

rides[:24*10].plot(x='dteday', y='cnt')

<matplotlib.axes._subplots.AxesSubplot at 0x22af0961550>

Virtual variable (dummy variable) ¶

Below are some categorical variables such as season, weather, and month. To include this data in our model, we need to create a binary dummy variable. Pandas library use get_dummies()can be easily achieved.

dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']
for each in dummy_fields:
    dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)
    rides = pd.concat([rides, dummies], axis=1)

fields_to_drop = ['instant', 'dteday', 'season', 'weathersit', 
                  'weekday', 'atemp', 'mnth', 'workingday', 'hr']
data = rides.drop(fields_to_drop, axis=1)
data.head()

Adjust the target variable ¶

To make it easier to train the network, we will normalize each continuous variable by transforming and adjusting the variables so that their mean is 0 and the standard deviation is 1.

We save the conversion factor so that we can restore the data when we use the network for forecasting.

quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']
# Store scalings in a dictionary so we can convert back later
scaled_features = {}
for each in quant_features:
    mean, std = data[each].mean(), data[each].std()
    scaled_features[each] = [mean, std]
    data.loc[:, each] = (data[each] - mean)/std

Split data into training, testing, and validation data sets¶

We save approximately the last 21 days of data as test data sets that will be used after training the network. We will use this data set for forecasting and compare it to the actual number of riders.

# Save data for approximately the last 21 days 
test_data = data[-21*24:]

# Now remove the test data from the data set 
data = data[:-21*24]

# Separate the data into features and targets
target_fields = ['cnt', 'casual', 'registered']
features, targets = data.drop(target_fields, axis=1), data[target_fields]
test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fields]

We split the data into two data sets, one for training and one for verifying the network after the network is trained. Because the data is time-series, we train with historical data and then try to predict future data (verify the data set).

# Hold out the last 60 days or so of the remaining data as a validation set
train_features, train_targets = features[:-60*24], targets[:-60*24]
val_features, val_targets = features[-60*24:], targets[-60*24:]

Start building the network¶

Below you will build your own network. We have built the structure and the reverse pass part. You will implement the forward delivery part of the network. You also need to set the hyperparameters: the learning rate, the number of hidden units, and the number of training passes.

The network has two levels, a hidden layer and an output layer. The hidden hierarchy will use the sigmoid function as the activation function. The output layer has only one node for recursion, and the output of the node is the same as the input of the node. That is, the activation function is $f (x) = x$ $f(x)=x$ . This function takes the input signal and produces an output signal, but takes into account the threshold, called the activation function. We complete each level of the network and calculate the output of each neuron. All outputs of one level become inputs to the next level of neurons. This process is called forward propagation.

We use weights in the neural network to propagate signals from the input layer to the output layer. We also use weights to propagate errors from the output layer back to the network in order to update the weights. This is called backpropagation.

Hint : You need to calculate the output activation function for backpropagation ( $f (x) = x$ $f(x) = x$ The derivative of ). If you are not familiar with calculus, the function is equivalent to the equation $y = x$ $y = x$ $y = x$ $y = x$ . What is the slope of this equation? That is the derivative $f (x)$ $f(x)$ .

You need to complete the following tasks:

Implement the S-type activation function. The __init__in self.activation_function to your S-type function.
In trainthe implemented prior to delivery process.
In the trainimplementation of back propagation algorithm includes calculating the output error.
In runthe implemented prior to delivery process.

class NeuralNetwork(object):
    def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5, 
                                       (self.input_nodes, self.hidden_nodes))

        self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5, 
                                       (self.hidden_nodes, self.output_nodes))
        self.lr = learning_rate
        
        #### TODO: Set self.activation_function to your implemented sigmoid function ####
        #
        # Note: in Python, you can define a function with a lambda expression,
        # as shown below.
        self.activation_function = lambda x : 1/(1 + np.exp(-x))  # Replace 0 with your sigmoid calculation.
        
        ### If the lambda code above is not something you're familiar with,
        # You can uncomment out the following three lines and put your 
        # implementation there instead.
        #
        #def sigmoid(x):
        #    return 0  # Replace 0 with your sigmoid calculation here
        #self.activation_function = sigmoid
                    
    
    def train(self, features, targets):
        ''' Train the network on batch of features and targets. 
        
            Arguments
            ---------
            
            features: 2D array, each row is one data record, each column is a feature
            targets: 1D array of target values
        
        '''
        n_records = features.shape[0]
        delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)
        delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)
        for X, y in zip(features, targets):
            #### Implement the forward pass here ####
            ### Forward pass ###
            # TODO: Hidden layer - Replace these values with your calculations.
            hidden_inputs = np.dot(X, self.weights_input_to_hidden) # signals into hidden layer
            hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer

            # TODO: Output layer - Replace these values with your calculations.
            final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals into final output layer
            final_outputs = final_inputs # signals from final output layer
            # IMPORTANT: NOTE THAT YOU SHOULD NOT ADD ACTIVATION FUNCTION HERE AGAIN!! HATE DEBUGGING!
            
            #### Implement the backward pass here ####
            ### Backward pass ###

            # TODO: Output error - Replace this value with your calculations.
            error = y - final_outputs # Output layer error is the difference between desired target and actual output.
            
            # TODO: Calculate the hidden layer's contribution to the error
            hidden_error = np.dot(self.weights_hidden_to_output,error) # IMPORTANT: note that error here is actually output_error_term
            
            # TODO: Backpropagated error terms - Replace these values with your calculations.
            output_error_term = error # IMPORTANT: I do not have activation function, so I do not look like the guy below me!
            hidden_error_term = hidden_error * (hidden_outputs * (1-hidden_outputs)) # IMPORTANT: (hidden_outputs * (1-hidden_outputs)) this stuff comes from the activation function

            # Weight step (input to hidden)
            delta_weights_i_h += hidden_error_term * X[:, None] # IMPORTANT: [:, None] means **(-1) to a matrix. Remember to transform!
            # Weight step (hidden to output)
            delta_weights_h_o += output_error_term * hidden_outputs[:, None] # IMPORTANT: Here should be hidden_outputs, not final_inputs!

        # TODO: Update the weights - Replace these values with your calculations.
        self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records # update hidden-to-output weights with gradient descent step
        self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records # update input-to-hidden weights with gradient descent step
 
    def run(self, features):
        ''' Run a forward pass through the network with input features 
        
            Arguments
            ---------
            features: 1D array of feature values
        '''
        
        #### Implement the forward pass here ####
        # TODO: Hidden layer - replace these values with the appropriate calculations.
        hidden_inputs = np.dot(features, self.weights_input_to_hidden) # signals into hidden layer
        hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer
        
        # TODO: Output layer - Replace these values with the appropriate calculations.
        final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals into final output layer
        final_outputs = final_inputs # signals from final output layer 
        # IMPORTANT: NOTE THAT YOU SHOULD NOT ADD ACTIVATION FUNCTION HERE AGAIN!! HATE DEBUGGING!
        
        return final_outputs

def  MSE ( y ,  Y ): 
    return  np . mean (( and - Y ) ** 2 )

Unit test¶

Run these unit tests to check if your network implementation is correct. This will help you ensure that the network is implemented correctly before you start training the network. These tests must be successful in order to pass this project.

import  unittest

inputs = np.array([[0.5, -0.2, 0.1]])
targets = np.array([[0.4]])
test_w_i_h = np.array([[0.1, -0.2],
                       [0.4, 0.5],
                       [-0.3, 0.2]])
test_w_h_o = np.array([[0.3],
                       [-0.1]])

class TestMethods(unittest.TestCase):
    
    ##########
    # Unit tests for data loading
    ##########
    
    def test_data_path(self):
        # Test that file path to dataset has been unaltered
        self.assertTrue(data_path.lower() == 'bike-sharing-dataset/hour.csv')
        
    def test_data_loaded(self):
        # Test that data frame loaded
        self.assertTrue(isinstance(rides, pd.DataFrame))
    
    ##########
    # Unit tests for network functionality
    ##########

    def test_activation(self):
        network = NeuralNetwork(3, 2, 1, 0.5)
        # Test that the activation function is a sigmoid
        self.assertTrue(np.all(network.activation_function(0.5) == 1/(1+np.exp(-0.5))))

    def test_train(self):
        # Test that weights are updated correctly on training
        network = NeuralNetwork(3, 2, 1, 0.5)
        network.weights_input_to_hidden = test_w_i_h.copy()
        network.weights_hidden_to_output = test_w_h_o.copy()
        
        network.train(inputs, targets)
        self.assertTrue(np.allclose(network.weights_hidden_to_output, 
                                    np.array([[ 0.37275328], 
                                              [-0.03172939]])))
        self.assertTrue(np.allclose(network.weights_input_to_hidden,
                                    np.array([[ 0.10562014, -0.20185996 ],  
                                              [ 0.39775194 ,  0.50074398 ],  
                                              [ - 0.29887597 ,  0.19962801 ]])))

    def test_run(self):
        # Test correctness of run method
        network = NeuralNetwork(3, 2, 1, 0.5)
        network.weights_input_to_hidden = test_w_i_h.copy()
        network.weights_hidden_to_output = test_w_h_o.copy()

        self.assertTrue(np.allclose(network.run(inputs), 0.09998924))

suite = unittest.TestLoader().loadTestsFromModule(TestMethods())
unittest.TextTestRunner().run(suite)

.....
-------------------------------------------------- --------------------
Ran 5 tests in 0.055s

OK

<unittest.runner.TextTestResult run=5 errors=0 failures=0>

Training network¶

Now you will set the hyperparameters of the network. The strategy is to set the hyperparameter so that the error on the training set is small but the data does not over fit. If the network training time is too long, or there are too many hidden nodes, it may be too specific for a specific training set and cannot be generalized to the verification data set. That is, when the loss of the training set is reduced, the loss of the verification set will begin to increase.

You will also train the network using the Random Gradient Descent (SGD) method. For each training, random sample data is obtained instead of the entire data set. Compared to the normal gradient drop, the number of trainings is more, but each time is shorter. In this case, network training is more efficient. You will learn more about SGD later.

Select the number of iterations ¶

This is the number of batches sampled from the training data when training the network. The more iterations, the more the model fits the data. However, if the number of iterations is too large, the model will not be well generalized to other data, which is called overfitting. You need to choose a number that will make the training loss low and the verification loss to be medium. When you start fitting, you will find that the training loss continues to drop, but the verification loss begins to rise.

Select learning rate¶

The rate can adjust the weight update range. If the rate is too large, the weight will be too large, causing the network to fail to fit the data. It is recommended to start with 0.1. If the network is experiencing problems with the data, try reducing the learning rate. Note that the lower the learning rate, the smaller the step size of the weight update and the longer the neural network converges.

Choose the number of hidden nodes¶

The more hidden nodes, the more accurate the prediction results of the model. Try different number of hidden nodes to see how it affects performance. You can look at the loss dictionary and look for network performance metrics. If the number of hidden units is too small, then the model does not have enough space to learn. If there are too many, there are too many choices in the learning direction. The trick to choosing the number of hidden units is to find the right balance.

import sys

### TODO:Set the hyperparameters here, you need to change the defalut to get a better solution ### 
iterations  =  5000  # Actually training to 2000 has already been train_loss and val_loss is gone, no need at least 5000 epoch bar 
learning_rate  =  0.8 
hidden_nodes  =  15 
output_nodes  =  1  # Since the output_nodes option is given, can output_nodes be >1? Why?

N_i = train_features.shape[1]
network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)

losses = {'train':[], 'validation':[]}
for ii in range(iterations):
    # Go through a random batch of 128 records from the training data set
    batch = np.random.choice(train_features.index, size=128)
    X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt']
                             
    network.train(X, y)
    
    # Printing out the training progress
    train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values)
    val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values)
    sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \
                     + "% ... Training loss: " + str(train_loss)[:5] \
                     + " ... Validation loss: " + str(val_loss)[:5])
    sys.stdout.flush()
    
    losses['train'].append(train_loss)
    losses['validation'].append(val_loss)

Progress: 0.0% ... Training loss: 0.960 ... Validation loss: 1.628

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\ipykernel_launcher.py:16: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
Http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  app.launch_new_instance()

Progress: 100.0% ... Training loss: 0.061 ... Validation loss: 0.197

plt.plot(losses['train'], label='Training loss')
plt.plot(losses['validation'], label='Validation loss')
plt.legend()
_ = plt.ylim()

Check the forecast results¶

Use test data to see how the network is performing on data modeling. If it's completely wrong, make sure that every step in the network is implemented correctly.

fig, ax = plt.subplots(figsize=(8,4))

mean, std = scaled_features['cnt']
predictions = network.run(test_features).T*std + mean
ax.plot(predictions[0], label='Prediction')
ax.plot((test_targets['cnt']*std + mean).values, label='Data')
ax.set_xlim(right=len(predictions))
ax.legend()

dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\ipykernel_launcher.py:10: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
Http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  # Remove the CWD from sys.path while we load stuff.

Optional: Think about your results (we won't evaluate the answer to this question) ¶

Please answer the following questions for your results. How does the model predict the data? Where is the problem? Why is there a problem?

Note : You can edit the text by double clicking on the unit. If you want to preview the text, press Control + Enter

Please fill in your answer below ¶

Answer ¶

The model made the prediction really well using some-what deep neuro-network But there is a question I still have: According to the tranning loss plot the model never overfits. I do not understand why. My guese is that we only have one hidden layer, making it impossible to overfit with 3-5 nodes.(I am using my laptop, I did not try some crazy numbers for hyper-paremeters. This may be why it never overfits)

	temp	hum	casual	registered	cnt	season_1	...	weekday_6
0	0.24	0.81	3	13	16	1	...	1
1	0.22	0.80	8	32	40	1	...	1
2	0.22	0.80	5	27	32	1	...	1
3	0.24	0.75	3	10	13	1	...	1
4	0.24	0.75	0	1	1	1	...	1

	instant	dteday	season	mnth	hr	weekday	weathersit	temp	atemp	hum	casual	registered	cnt
0	1	2011-01-01	1	1	0	6	1	0.24	0.2879	0.81	3	13	16
1	2	2011-01-01	1	1	1	6	1	0.22	0.2727	0.80	8	32	40
2	3	2011-01-01	1	1	2	6	1	0.22	0.2727	0.80	5	27	32
3	4	2011-01-01	1	1	3	6	1	0.24	0.2879	0.75	3	10	13
4	5	2011-01-01	1	1	4	6	1	0.24	0.2879	0.75	0	1	1