Hi, I’m currently working on my first machine learning project - using neural networks to try and syllabify words using the Moby Hyphenator II dataset.
I am treating this as a multi-label classification problem in which words and their syllables are encoded in the following format:
t e n - s o r - f l o w 0 0 1 0 0 1 0 0 0 0
I have been padding all inputs to a length of 15 characters, so
tensorflow would be encoded as
I need to implement a linear chain conditional random field as my classifier because the online guides that I have based my project around suggest that its inclusion can greatly boost accuracy - this guide achieves 96.89% validation accuracy after hyperparameter tuning without one, but this model achieves near 100% accuracy when including a Linear Chain CRF output layer.
I have seen a guide that implements a linear chain CRF in PyTorch, but I am unsure as to how to recreate this in TensorFlow. This guide also includes special characters which are checked for in order to avoid padding being included in the computations, but this isn’t a problem that I am currently concerned with - my main problem is being able to implement a linear chain CRF in Tensorflow as the final output layer.
I looked at the official TensorFlow CRF layer implementation as well as the TFA module but I have no idea as to how to use them with the form of data that I have nor do I understand which specific functions to use. The second example model I referenced uses this CRF implementation but I again do not know how to use it - I tried to use it in my model as per the comment in the code:
# As the last layer of sequential layer with # model.output_shape == (None, timesteps, nb_classes) crf = ChainCRF() model.add(crf) # now: model.output_shape == (None, timesteps, nb_classes)
However, using this leads to an output shape of
(None, 15, 64) - this is different from my currently working dense output layer applied after global max pooling which has an output shape of
(None, 15) and I am unsure of how to remedy this, as I believe that I need the output shape to be
(None, 15) for the model to work.