Siamese networks were in the firsts networks to use multiple inputs. The idea behind is to map images to a lower dimension and then compare them using a standard distance metric (like euclidean distance) to compare them. The process is similar to PCA-based approach. The main drawback of PCA-based approach is that they are very sensitive to transformation like scaling, rotation, etc. Siamese networks are able to overcome these issues. To train this type of network, you have to provide true and false pair of data to the network.

The authors explain the loss as follow : The Loss L is designed in such a way that its minimization will decrease the energy of genuine pairs (true pairs) and increase the energy of impostor pairs (false pair).