Grand Framework

A naive Neural Network and Tensor library, for learning purposes.

About
1. Readings, resources, and design inspirations
Goals
Research
Development
1. Dependencies
2. Tensors
3. Layers
  1. Layer
  2. Dense
  3. Flatten
4. Model

About

The Grand Framework is meant to be a personal learning project for neural network model architecture, mathematics, and training. My intention is to showcase what I have learned, as well as deepen my understanding, in regards to machine learning and deep neural networks. The project started as a C++/CUDA based implementation, but after falling in love with Python, it was soon converted.

The framework is built on Numpy, a popular Python library for matrices and multi-dimensional arrays that is commonly used in the field of machine learning. CUDA kernels will eventually be used to transfer training operations to the GPU for faster runtime.

Readings, resources, and design inspirations

Machine Learning Specialization coursework - Andrew Ng
Deep Learning with Python, Second Edition - Francois Chollet
Neural Networks from Scratch (NNFS) - Harrison Kinsley (Sentdex)

Goals

Design a working Tensor object with relevent utilities and functions.
Implement CUDA kernels for running operations on a NVIDIA GPU.
Design layer and model objects to construct a working MLP (Multilayer Perceptron) neural network, built on custom Tensor objects.
Implement forward passing, utilizing linear regression and common activation functions.
- Designed to allow for use with Tensorflow Datasets.
Implement back propogation, utilizing common cost/loss functions and optimizers.
Final test: Build a functional model, running on either CPU or GPU, trained on the MNIST fashion dataset with loss < 5%.

Recreate the Tensorflow Keras Classification Tutorial from the bottom up.

Research

This section will contain a breakdown of the mathematics behind a simple MLP model implementation for classification, including Tensorflow examples.

What is a neural network?

Feedforward Process

Cost/Loss Function

Backpropogation

Optimization

Development

Dependencies

Python
- Numpy
- Python CUDA
CUDA / NVCC

Tensors

Tensors are the backbone of any neural network framework. These objects contain the data, dimensionality, and allow for complex mathematical operations.

Design

The design choices for the Tensor object were simple. I wanted the user to be able to pass in any type of array, list, or np.ndarray to the constructor to initialize new Tensors. This allows the object to be intuitive, and easy to implement.

class Tensor:
    """
    Name: Tensor()
    Purpose: Parent constructor for the Tensor object.
    Parameters: data (list, np.ndarray), dtype (Any)
    Return: None

    Uses: \n
    t = Tensor([1, 2, 3])
    t = Tensor(np.array([1, 2, 3]))
    """
    def __init__(self, data, dtype=np.float32, device=0):

        if not isinstance(data, (np.ndarray, list)):
            raise TypeError("ERROR: Data must be of type list, np.ndarray")
        data = np.array(data, dtype=dtype)
        self.data = data
        self.shape = self.data.shape
        self.dtype = dtype

The tensor object also contains built-in class methods for different types of Tensors (Similar to Numpy). These methods take dimensions as parameters and construct a new Tensor object with specific values dependent on the method:

Zeros, a Tensor initialized with Zeros.
Ones, a Tensor initialized with Ones.
Random, a Tensor initialized with random values.
Empty, a Tensor initialized with NaN values.

Below is an example of this implementation:

@classmethod
def zeros(cls, *shape, dtype=np.float32):
    """
    Name: Tensor.zeros()
    Purpose: Class method for creating Tensor object with zeros, given shape.
    Parameters: *shape (N number of integers), dtype (Any)
    Return: Tensor

    Uses: \n
    t = Tensor.zeros(1, 2)
    """

    for d in shape:
        if not isinstance(d, int):
            raise TypeError("ERROR: shape must be of type int")
    t = cls(data=np.zeros(shape, dtype=dtype), dtype=dtype)
    return t

These are helpful for initializing weights, biases, and inputs in layers prior to starting training.

Operations

Tensors need to be able to perform operations with one another, like matrix multiplcation and addition, so that we can perform linear regression on nodes in the neural network. This functionality can be built into Python classes and executed intuitively.

For example, matrix addition:

def __add__(self, b):
    """
    Name: __add__()
    Purpose: Addition function, between 2 Tensors or Tensor and integer/float
    Parameters: b (Tensor, integer, float)
    Return: Tensor

    Uses: \n
    Tensor = Tensor + Tensor
    Tensor = Tensor + integer/float
    """

    if isinstance(b, Tensor):
        assert self.shape == b.shape, "ERROR: A and B input Tensors must be the same size"
        return Tensor(self.data + b.data)
    elif isinstance(b, (int, float)):
        return Tensor(self.data + b)
    else:
        raise TypeError("ERROR: Data B must be of type Tensor, int, float")

Testing

I created a custom test bench to verify that the Tensor operations were error free, as well as comparing the results to Numpy equivalent operations.

These comparisons on the CPU are redundant, as the Tensor objects are constructed with Numpy arrays, but the testbench will support checking for GPU and CPU equivalence in the future.

The testbench constructor takes in a function, or the operation that will be tested, as a parameter. Below is the Testbench class, a function operation that will be executed in the testbench, and the resulting output:

class Testbench:
    """
    Name: Testbench()
    Purpose: Constructor for the Testbench object.
    Parameters: func (mathematical operation function), runs (int)
    Return: None

    Uses: \n
    Testbench(tensor_tensor_add)
    """
    def __init__(self, func, runs=1000):
        self.func = func
        self.runs = runs

        tic = time.perf_counter()
        for run in range(runs):

            dim1, dim2 = random.randint(1, 100), random.randint(1, 100)
            np1 = np.array(np.random.rand(dim1, dim2), dtype=np.float32)
            np2 = np.array(np.random.rand(dim1, dim2), dtype=np.float32)

            if not func(dim1=dim1, dim2=dim2, np1=np1, np2=np2):
                print(f"TEST {func.__name__} {run}/{runs} {Colors.FAIL}FAIL{Colors.ENDC}")
            else:
                if run % 100 == 0:
                    print(f"TEST {func.__name__} Runs: {run}/{runs} {Colors.OKGREEN}PASS{Colors.ENDC}")
        toc = time.perf_counter()
        
        print(f"TEST {func.__name__} Runs: {runs}/{runs} {Colors.OKGREEN}PASS{Colors.ENDC}") 
        print(f"Duration: {toc - tic:0.4f}s")

def tensor_tensor_add(**kwargs):
    """
    Name: tensor_tensor_add()
    Purpose: Test function for addition between 2 tensor's.
    Parameters: **kwargs(dim1, dim2, np1, np2)
    Return: Bool

    Uses: \n
    Intended to be used inside of Testbench(func)
    """
    tensor_result = np.around((Tensor(kwargs['np1']) + Tensor(kwargs['np2'])).data, decimals=4)
    np_result = np.around(kwargs['np1'] + kwargs['np2'], decimals=4)
    return (tensor_result == np_result).all()

Testbench(tensor_tensor_add)

TEST tensor_tensor_add Runs: 0/1000 PASS
TEST tensor_tensor_add Runs: 100/1000 PASS
TEST tensor_tensor_add Runs: 200/1000 PASS
TEST tensor_tensor_add Runs: 300/1000 PASS
TEST tensor_tensor_add Runs: 400/1000 PASS
TEST tensor_tensor_add Runs: 500/1000 PASS
TEST tensor_tensor_add Runs: 600/1000 PASS
TEST tensor_tensor_add Runs: 700/1000 PASS
TEST tensor_tensor_add Runs: 800/1000 PASS
TEST tensor_tensor_add Runs: 900/1000 PASS
TEST tensor_tensor_add Runs: 1000/1000 PASS
Duration: 0.2440s

Layers

The layers purpose is to make the mathematical, functional, connections between neurons in the network. I chose to follow a Tensorflow approach to integrating activation functions into the layer object itself, instead of creating a seperate layer. This, in my opinion, will allow for easier-to-read model construction.

Layer

1-2D Tensor data

Varying datatypes

Integrated activation functions

Weights and biases

Below is the parent Layer object that I designed. The paremeter of units for the layer is the number of output neurons the layer will have.

class Layer:
    def __init__(self, units=0, *, input_shape=0, dtype=np.float32, activation=Activation.Linear):
        if input_shape != 0:
            if not isinstance(input_shape, tuple):
                raise TypeError("ERROR: input_shape must be of type tuple")
            self.shape = (None,) + tuple(input_shape,)
        else: 
            self.shape = (None, units)

        self.dtype = dtype
        self.activation = activation
        
        self.weights = None
        self.biases = None
        self.output = None

        self._is_input = False
        self._has_built = False

The layer object has a builder method for constructing and connecting a models layers together upon initialization. This method takes in the previous layers shape and constructs the weigths and biases based on these dimensions:

def _build(self, prev_shape, is_input=False):
    if is_input:
        self._is_input = True
        return
    if not isinstance(prev_shape, tuple):
            raise TypeError("ERROR: input_shape must be of type tuple")
    
    self.weights = Tensor.rand(prev_shape[-1], self.shape[-1])
    self.biases = Tensor.rand(1, self.shape[-1])
    self._has_built = True

Dense

The dense layer is the main layer type used in Grand. This layer is a conventional dense layer used to construct an MLP neural network model.

class Dense(Layer):
    def __init__(self, units, *, input_shape=0, dtype=np.float32, activation=Activation.Linear):
        super(Dense, self).__init__(units, input_shape=input_shape, dtype=dtype, activation=activation)

The forward method computes the linear regression function of the inputs, weights, and biases for each neuron in the layer. This activation function gets applied to this output before being returned, and passed to the next layer:

def forward(self, data):
    if not isinstance(input, Tensor):
        raise TypeError("ERROR: Input must be of type Tensor")
    if self.weights == None or self.biases == None:
        raise Exception("ERROR: Weights and biases have not been created")
    
    self.output = (data @ self.weights) + self.biases
    return self.activation(self.output)

Flatten

The flatten layer is a ‘filler’ layer, that flattens input data to a specified shape. I designed this layer to not have trainable parameters, like weights and biases, and to only reshape the data.

class Flatten(Layer):
    def __init__(self, input_shape=0, dtype=np.float32):
        if not isinstance(input_shape, tuple):
                raise TypeError("ERROR: input_shape must be of type tuple")
        
        self.reshaped = 1
        for s in input_shape:
            self.reshaped *= s
        super(Flatten, self).__init__(0, input_shape=(self.reshaped,), dtype=dtype)

    def forward(self, data):
        if not isinstance(data, Tensor):
            raise TypeError("ERROR: Input data must be of type Tensor")
        
        self.output = data.reshape(data[0], self.reshaped)
        return self.activation(self.output)

Model

The model object is responsible for building the neural network and performing training and validation operations on training and testing data. The model will build the connections between each layer in the network, and ensure the data shapes match. This object will also contain the loss and optimizer functions for back propogation and model training.

class Model:
    def __init__(self, layers=[]):
        self.layers = layers
        for layer in self.layers:
            if not isinstance(layer, Layer):
                raise TypeError("ERROR: Model inputs must all be layers")

        self.loss = None
        self.optimizer = None

The compile method makes the shape connections between each layer in the network. The first layer is singled out as the input layer, and will be constructed without weights and biases.

def compile(self, loss, optimizer):
    self.layers[0]._build(is_input=True)
    for i in range(1, len(self.layers)):
        self.layers[i]._build(self.layers[i-1].shape)

    self.loss = loss
    self.optimizer = optimizer