I have several questions regarding running mobile bert model on Android.
- Does google mobile bert multilingual support Chinese as well (TensorFlow Hub)?
- Does this model have metadata? (Android studio shows there is not metadata, but archive has a separate file keras-metadata.pb)
- If this model should use metadata, how to open and add via Android Studio API?
- Default code shown by Android Studio after model import is not correct. It operates with DataType.INT32 when tensor flow lite does not support it. I assume IDE just does conversion what model by itself uses, but could we add a validator to show the model has not been supported yet?
- Code for the model is not quite clear either. It allocates several arrays for features, but for each feature it uses the same text input. If input is the same, why do we need to duplicate it several times? I expect just text for classification converted into a tensor
- Issue with unsupported model types seems to be possible to overcome via custom interpreter. However, I haven’t figured out yet how to properly build it for my aim. Especially when the initial usage sample of the model is misleading. Can you show how it should be used for Chinese (or any) text classification?
- What should be in the output array? I assume it should be a dictionary of words which the model can recognise. However, the initial usage sample of the model again shows something different.
update (14.08): some more investigation results and questions: https://stackoverflow.com/questions/76889244/if-mobile-bert-model-requires-datatype-int32-is-it-possible-to-run-on-android-a
Below is a sample code Android Studio shows after model has been imported:
val model = MobileBertMultilingual.newInstance(context) // Creates inputs for reference. val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 1), DataType.INT32) inputFeature0.loadBuffer(byteBuffer) val inputFeature1 = TensorBuffer.createFixedSize(intArrayOf(1, 1), DataType.INT32) inputFeature1.loadBuffer(byteBuffer) val inputFeature2 = TensorBuffer.createFixedSize(intArrayOf(1, 1), DataType.INT32) inputFeature2.loadBuffer(byteBuffer) // Runs model inference and gets result. val outputs = model.process(inputFeature0, inputFeature1, inputFeature2) val outputFeature0 = outputs.outputFeature0AsTensorBuffer val outputFeature1 = outputs.outputFeature1AsTensorBuffer val outputFeature2 = outputs.outputFeature2AsTensorBuffer val outputFeature3 = outputs.outputFeature3AsTensorBuffer val outputFeature4 = outputs.outputFeature4AsTensorBuffer val outputFeature5 = outputs.outputFeature5AsTensorBuffer val outputFeature6 = outputs.outputFeature6AsTensorBuffer val outputFeature7 = outputs.outputFeature7AsTensorBuffer val outputFeature8 = outputs.outputFeature8AsTensorBuffer val outputFeature9 = outputs.outputFeature9AsTensorBuffer val outputFeature10 = outputs.outputFeature10AsTensorBuffer val outputFeature11 = outputs.outputFeature11AsTensorBuffer val outputFeature12 = outputs.outputFeature12AsTensorBuffer val outputFeature13 = outputs.outputFeature13AsTensorBuffer val outputFeature14 = outputs.outputFeature14AsTensorBuffer val outputFeature15 = outputs.outputFeature15AsTensorBuffer val outputFeature16 = outputs.outputFeature16AsTensorBuffer val outputFeature17 = outputs.outputFeature17AsTensorBuffer val outputFeature18 = outputs.outputFeature18AsTensorBuffer val outputFeature19 = outputs.outputFeature19AsTensorBuffer val outputFeature20 = outputs.outputFeature20AsTensorBuffer val outputFeature21 = outputs.outputFeature21AsTensorBuffer val outputFeature22 = outputs.outputFeature22AsTensorBuffer val outputFeature23 = outputs.outputFeature23AsTensorBuffer val outputFeature24 = outputs.outputFeature24AsTensorBuffer val outputFeature25 = outputs.outputFeature25AsTensorBuffer val outputFeature26 = outputs.outputFeature26AsTensorBuffer val outputFeature27 = outputs.outputFeature27AsTensorBuffer val outputFeature28 = outputs.outputFeature28AsTensorBuffer val outputFeature29 = outputs.outputFeature29AsTensorBuffer val outputFeature30 = outputs.outputFeature30AsTensorBuffer val outputFeature31 = outputs.outputFeature31AsTensorBuffer val outputFeature32 = outputs.outputFeature32AsTensorBuffer val outputFeature33 = outputs.outputFeature33AsTensorBuffer val outputFeature34 = outputs.outputFeature34AsTensorBuffer val outputFeature35 = outputs.outputFeature35AsTensorBuffer val outputFeature36 = outputs.outputFeature36AsTensorBuffer val outputFeature37 = outputs.outputFeature37AsTensorBuffer val outputFeature38 = outputs.outputFeature38AsTensorBuffer val outputFeature39 = outputs.outputFeature39AsTensorBuffer val outputFeature40 = outputs.outputFeature40AsTensorBuffer val outputFeature41 = outputs.outputFeature41AsTensorBuffer val outputFeature42 = outputs.outputFeature42AsTensorBuffer val outputFeature43 = outputs.outputFeature43AsTensorBuffer val outputFeature44 = outputs.outputFeature44AsTensorBuffer val outputFeature45 = outputs.outputFeature45AsTensorBuffer val outputFeature46 = outputs.outputFeature46AsTensorBuffer val outputFeature47 = outputs.outputFeature47AsTensorBuffer val outputFeature48 = outputs.outputFeature48AsTensorBuffer val outputFeature49 = outputs.outputFeature49AsTensorBuffer val outputFeature50 = outputs.outputFeature50AsTensorBuffer val outputFeature51 = outputs.outputFeature51AsTensorBuffer // Releases model resources if no longer used. model.close()
A few words about context. I am extending my personal app which helps to me to learn Chinese. I want app to split just entered new sentence on Chinese into separate words and do translation to English for each of them. I am able to automatically translate Chinese text already with help of standard Google ml models, but for text classification seems BERT-like models works the best. The task is just word tokenisation, but because Chinese language doesn’t have spaces between separate words, the task become non-trivial.