tiny_dnn
1.0.0
A header only, dependency-free deep learning framework in C++11
|
This section describes how to create a new layer incorporated with tiny-dnn. Let's create simple fully-connected layer for example.
Note: This document is old, and doesn't fit to current tiny-dnn. We need to update.
Let's define your layer. All of layer operations in tiny-dnn are derived from layer
class.
the layer
class prepares input/output data for your calculation. To do this, you must tell layer
's constructor what you need.
For example, consider calculating fully-connected operation: y = Wx + b
. In this caluculation, Input (right hand of this eq) is data x
, weight W
and bias b
. Output is, of course y
. So it's constructor should pass {data,weight,bias} as input and {data} as output.
the vector_type::data
is some input data passed by previous layer, or output data consumed by next layer. vector_type::weight
and vector_type::bias
represents trainable parameters. The only difference between them is default initialization method: weight
is initialized by random value, and bias
is initialized by zero-vector (this behaviour can be changed by network::weight_init method). If you need another vector to calculate, vector_type::aux
can be used.
There are 5 methods to implement. In most case 3 methods are written as one-liner and remaining 2 are essential:
Returns name of your layer.
Returns input/output shapes corresponding to inputs/outputs. Shapes is defined by [width, height, depth]. For example fully-connected layer treats input data as 1-dimensional array, so its shape is [N, 1, 1].
Execute forward calculation in this method.
the in_data/out_data
is array of input/output data, which is ordered as you told layer
's constructor. The implementation is simple and straightforward, isn't it?
the in_data/out_data
are just same as forward_propagation, and in_grad/out_grad
are its gradient. Order of gradient values are same as in_data/out_data
.
Note: Gradient of weight/bias are collected over mini-batch and zero-cleared automatically, so you can't use assignment operator to these elements (layer will forget previous training data in mini-batch!). like this example, use
operator +=
instead. Gradient of data (prev_delta
in the example) may already have meaningful values if two or more layers share this data, so you can't overwrite this value too.
It is always a good idea to check if your backward implementation is correct. network
class provides gradient_check
method for this purpose. Let's add following lines to test/test_network.h and execute test.
Congratulations! Now you can use this new class as a tiny-dnn layer.