Grand Framework
A naive Neural Network and Tensor library, for learning purposes.
Table of contents
About
The Grand Framework is meant to be a personal learning project for neural network model architecture, mathematics, and training. My intention is to showcase what I have learned, as well as deepen my understanding, in regards to machine learning and deep neural networks. The project started as a C++/CUDA based implementation, but after falling in love with Python, it was soon converted.
The framework is built on Numpy, a popular Python library for matrices and multi-dimensional arrays that is commonly used in the field of machine learning. CUDA kernels will eventually be used to transfer training operations to the GPU for faster runtime.
Readings, resources, and design inspirations
- Machine Learning Specialization coursework - Andrew Ng
- Deep Learning with Python, Second Edition - Francois Chollet
- Neural Networks from Scratch (NNFS) - Harrison Kinsley (Sentdex)
Goals
- Design a working Tensor object with relevent utilities and functions.
- Implement CUDA kernels for running operations on a NVIDIA GPU.
- Design layer and model objects to construct a working MLP (Multilayer Perceptron) neural network, built on custom Tensor objects.
- Implement forward passing, utilizing linear regression and common activation functions.
- Designed to allow for use with Tensorflow Datasets.
- Implement back propogation, utilizing common cost/loss functions and optimizers.
- Final test: Build a functional model, running on either CPU or GPU, trained on the MNIST fashion dataset with loss < 5%.
Recreate the Tensorflow Keras Classification Tutorial from the bottom up.
Research
This section will contain a breakdown of the mathematics behind a simple MLP model implementation for classification, including Tensorflow examples.
What is a neural network?
Feedforward Process
Cost/Loss Function
Backpropogation
Optimization
Development
Dependencies
- Python
- Numpy
- Python CUDA
- CUDA / NVCC
Tensors
Tensors are the backbone of any neural network framework. These objects contain the data, dimensionality, and allow for complex mathematical operations.
Design
The design choices for the Tensor object were simple. I wanted the user to be able to pass in any type of array, list, or np.ndarray to the constructor to initialize new Tensors. This allows the object to be intuitive, and easy to implement.
The tensor object also contains built-in class methods for different types of Tensors (Similar to Numpy). These methods take dimensions as parameters and construct a new Tensor object with specific values dependent on the method:
- Zeros, a Tensor initialized with Zeros.
- Ones, a Tensor initialized with Ones.
- Random, a Tensor initialized with random values.
- Empty, a Tensor initialized with NaN values.
Below is an example of this implementation:
These are helpful for initializing weights, biases, and inputs in layers prior to starting training.
Operations
Tensors need to be able to perform operations with one another, like matrix multiplcation and addition, so that we can perform linear regression on nodes in the neural network. This functionality can be built into Python classes and executed intuitively.
For example, matrix addition:
Testing
I created a custom test bench to verify that the Tensor operations were error free, as well as comparing the results to Numpy equivalent operations.
These comparisons on the CPU are redundant, as the Tensor objects are constructed with Numpy arrays, but the testbench will support checking for GPU and CPU equivalence in the future.
The testbench constructor takes in a function, or the operation that will be tested, as a parameter. Below is the Testbench class, a function operation that will be executed in the testbench, and the resulting output:
TEST tensor_tensor_add Runs: 0/1000 PASS
TEST tensor_tensor_add Runs: 100/1000 PASS
TEST tensor_tensor_add Runs: 200/1000 PASS
TEST tensor_tensor_add Runs: 300/1000 PASS
TEST tensor_tensor_add Runs: 400/1000 PASS
TEST tensor_tensor_add Runs: 500/1000 PASS
TEST tensor_tensor_add Runs: 600/1000 PASS
TEST tensor_tensor_add Runs: 700/1000 PASS
TEST tensor_tensor_add Runs: 800/1000 PASS
TEST tensor_tensor_add Runs: 900/1000 PASS
TEST tensor_tensor_add Runs: 1000/1000 PASS
Duration: 0.2440s
Layers
The layers purpose is to make the mathematical, functional, connections between neurons in the network. I chose to follow a Tensorflow approach to integrating activation functions into the layer object itself, instead of creating a seperate layer. This, in my opinion, will allow for easier-to-read model construction.
Layer
- 1-2D Tensor data
- Varying datatypes
- Integrated activation functions
- Weights and biases
Below is the parent Layer object that I designed. The paremeter of units for the layer is the number of output neurons the layer will have.
The layer object has a builder method for constructing and connecting a models layers together upon initialization. This method takes in the previous layers shape and constructs the weigths and biases based on these dimensions:
Dense
The dense layer is the main layer type used in Grand. This layer is a conventional dense layer used to construct an MLP neural network model.
The forward method computes the linear regression function of the inputs, weights, and biases for each neuron in the layer. This activation function gets applied to this output before being returned, and passed to the next layer:
Flatten
The flatten layer is a ‘filler’ layer, that flattens input data to a specified shape. I designed this layer to not have trainable parameters, like weights and biases, and to only reshape the data.
Model
The model object is responsible for building the neural network and performing training and validation operations on training and testing data. The model will build the connections between each layer in the network, and ensure the data shapes match. This object will also contain the loss and optimizer functions for back propogation and model training.
The compile method makes the shape connections between each layer in the network. The first layer is singled out as the input layer, and will be constructed without weights and biases.