I modified Laurent Dihn’s dataset class to make available the preceding and following phones and phonemes. It is now available in my fork of Laurent’s code on Github. By preceding phone, I mean the phone that was pronounced before the current one (and not the phone being pronounced in the preceding frame, like in João’s experiment). I chose to have those values precomputed at loading time, hoping to save time during training. I am not sure it will prove the most efficient way to implement it, considering memory constraints. Although I have tested all the methods of the class independently, I have still to try it in the training of a model. I hope this can be useful to others.
The next step for me is to figure out how to use the Theano scan function to implement the biases of one MLP being computed by another MLP.