Skip to content

guilhermedom/deep-regression-bicycle-rental

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Neural Network Regression Estimating Bicycle Rental

TensorFlow deep regression model predicting bicycle rental.


Problem Overview

A dataset has the number of bicycles rented during a day in a community, and other data related to the amount of bicycle renting, such as the weather situation on that day. The dataset has 731 instances and 15 variables. There are 7 categorical and 5 numerical variables among the explanatory variables. The remaining 3 variables can be considered explained, or dependent, variables. We aim to create a model to estimate the number of rented bicycles in a day. The next table summarizes our data:

Attribute Summary
Instant Sequential ID for each day.
Dteday Date for the instance, formatted as M/D/YYYY.
Season Season (1: spring; 2: summer; 3: fall; 4: winter).
Yr Year (0: 2011; 1: 2012).
Mnth Month (1 to 12).
Holiday If day is a holiday or not (0: not holiday; 1: holiday).
Weekday Day of the week.
Workingday If day is holiday/weekend or not (0: holiday/weekend; 1: working day).
Weathersit Weather type (1: clear; 2: mist; 3: light snow or light rain; 4: snow or heavy rain).
Temp Normalized temperature in Celsius.
Hum Normalized humidity.
Windspeed Normalized wind speed.
Casual Count of casual users.
Registered Count of registered users.
Cnt Total bike rentings, including both casual and registered.

Linear regression can estimate values for a variable given other correlated variables. It does that by fitting a mathematical function with coefficients associated with each variable composing the function. When multiple explanatory variables are used to predict an explained variable, a multiple linear regression is performed. When multiple variables are present, more complex regression models may generate better results. However, it is important to control the complexity of a regressor model so that it remains efficient and interpretable. Fortunately, we can control the complexity of a regressor model when we are dealing with regressors built using neural networks.

Analysis Introduction

Deep neural networks are capable of learning both linear and non-linear relationships between variables. Like this, it becomes possible to understand deeper connections between the independent and dependent variables. On the other hand, it has been known that deep neural networks with 3 hidden layers are enough to model pratically all patterns in the data. This provides an upper bound on the complexity of a neural network regressor, as its complexity is determined by the number of layers and neurons in it.

Using a neural network regressor with 3 hidden layers, we have set up a model that predicts bicycle rental with a close linear relationship with the actual numbers:

deep_regression_bicycle_sharing_actual_vs_predicted

With an R2 score of 0.89, we argue that this neural network model is able to work pretty well with the bicycle rental dataset. A mean absolute error (MAE) of about 5% of the dependent variable's range confirms this notion. Additionally, the MAE only deviates from the mean by roughly 10% of the average number of bicycles rented.