Using more than samples as inputs and other aborted experiments

This week I learned more about using pylearn2, and about its internals. However, I encountered some problems which prevented me from getting experimental results.

First, I dissected the code of the pylearn2 TIMIT dataset class by Vincent Dumoulin and its associated YAML file example, and also the YAML file example provided by Jean-Philippe Raymond. I wanted to know how to take as inputs not only the sound samples, but also the phones or phonemes associated with them. I discovered that the class already included this possibility, but I did not know how to specify that in the YAML training file. So I went to the LISA lab and asked Vincent about it. He told me there was a bug in another class that converted phones or phonemes indexes to floats, causing some problems. Another researcher named Bart had found a workaround around this bug, for his own purposes. He explained me the changes he had made in the FiniteDatasetIterator class (conversion of indexes to float), but also about composite layers and specifying data sources in the constructor of the MLP model.

After considering the question, I decided to tackle an easier problem for now. So I went on to writing a YAML training specification, based on the two example files mentioned above, to test an hypotheses on the number of neurons in each layer of the MLP. Specifically, I wanted to know if it would be beneficial to have progressively fewer neurons in the deeper layers of the network. The idea was that maybe it would force the network to learn more abstract features the deeper you go in the network. So I made two YAML files, one using this idea and the other with the same number of units in each layer. I also wanted to use a monitor to stop training based on the performance on the validation set.

Unfortunately, I was not able to train the models. After a couple of hours of debugging, there was still an error message displayed when I tried to launch training: “ValueError: Can’t convert between VectorSpaces of different sizes (100 to 1).” The two YAML files and the associated terminal logs are on my git repository:

https://github.com/vaudrypl/ift6266-speech-synth

Leave a comment