Hi, few months ago I was working on a project of creating a prototype application which uses a custom BERT extending model using flutter for summarization of long form text. During the entire project cycle of creating that prototype application I faced a ton of issues and was able to find a work around for a few of them (as stated below) .
This summer I was thinking about working on them in general for community, and if they are big enough/viable issues to be a GSoC project, if not I will still work on them as an extension so people can use it if they wanted. Obviously as a GSoC project I would able to learn more and gain experience too. This post is just a discussion point with more experienced people in community than me. Also I don’t know if this is a good strategy to post it in the forum but I would rather like a open discussion about the issues etc.
Also, it would be nice if there was a GSoC tag for tags section, if that’s possible.
Note: While I was researching this, I found out that the current am15h/tflite_flutter_plugin package started out as actually a 2020 GSoC project which was cool.
Preprocessing: The first thing to with any NLP model is tokenization, there are many tokenization methods but currently as per my research BERT models and others like electra, ALbert , distilBert etc. use one of two major methods Sentencepiece or another preprocessing layer like bert_en_uncased_preprocess which can be easily loaded and added before embedding model as a layer. The problem being that there is no generalized built in method to do preprocessing in TFLite or its support packages as far as I can tell nor in native android or flutter. (To be noted I do know that preprocessing for certain implementation is present (tflite_flutter_helper/lib/src/task/text at master · am15h/tflite_flutter_helper) which just calls native Helper library). Also no, the preprocessing layer cannot be converted to tflite without maybe making a flex delegate support for it.
Possible Solutions: 1 : I faced this issue too as I was trying out different embedding models. The method I used then was, for Sentencepiece based models like ALBERT I compiled the sentencepiece lib for android and made a wrapper essentially using dart ffi as a plugin for flutter (obviously its minimum viable product I needed to get basics working ).
2 : For the bert preprocessing layer, I went with mimicking the basic code of TFText which already has a BERT tokenizer whose major parts starts from text/bert_tokenizer.py at v2.8.0 · tensorflow/text here in dart. Essentially it just removes things from different defined Unicode ranges and normalizes them using NFD and just canonically decomposes them. My version doesn’t have a normalization currently as the current package that I was using 3 months ago unorm_dart turned out to be painfully slow increasing processing times by about 10x-100x converting sentences to 128 tokens length arrays (might have changed) and obviously is only for english-ish(supports Japanese and Chinese chars).
3 : Another possibility in solving this would be building in support for the TFText’s tokenizer in Flex delegates or TFLite built-ins. I believe before the issues was tf.nn.embedding_lookup_sparse but that might not be 100 percent of the reason.
3.5 : There is already qa and nl classifier support in helper libraries. I haven’t researched enough just a thought but why not just expose the function from which preprocessing is being done. Obviously if possible.
For starting two this can be either have its own repo or added to helper libraries for tflite.
TFHub bert based Models: This is something I absolutely cannot achieve on my own fast enough and would require more research but this results in next problem which can be easily avoided if this one can be improved even a little bit. Currently there is ONLY 1 fully TFLite compatible (I mean which don’t require use of Flex delegates) which is a problem because this causes downstream models which use BERT based models as embedding layers to also use Flex delegates regardless of the fact that their code is entirely tflite compatible.
Possible Solution: Looking at the source code of ALBERT it makes this possible by replacing
reshape + matmulcombination (please correct me if I am wrong). If I remember correctly other models that I considered like SmallBert used einsum too, maybe retraining these small embedding models (which are more likely to be used as they are < 60 mb in model size making them easier to run on edge devices) will help reduce space consumption from FlexDelegate binaries and also allow downstream model to take advantage of NNAPI etc improving performance.
I am sorry for lack of research on this front compared to last problem, but this is the reason why I wanted an open discussion with people who are more experienced than me (maybe of TFhub team) to correct me on this… or tell me if this is possible
Flutter tflite plugin and binaries: There are few changes I would like to make with the plugin itself and for Documentation and automation of FAT binary generation and slimed FlexDelegate per model (I couldn’t generate slimed FlexDelegate for my life ). Currently the most recent binary provided in the package (which supports GPU acceleration and NNAPI) isn’t even compatible with TF 2.6. Also, there is no support for passing strings to models atm, I have a fork where I experimented with it.
But all these things are something I discussed with Mr. Amish Garg, the person who’s GSoC project this package started as. He would like me to have a concrete method for automating Fat binary generation with GitHub actions or its alternatives but for string support, like he said, this package is used by a lot of people and I would only try to add this if someone from TFLite team or any other related member would review the code to be production level. I would like to think that I am decent enough but I have no base when it comes to writing production level code. So, string support is basically rolling if this can be a GSoC project and someone can overview the code for the flutter plugin. Otherwise, I will try to do this as an individual contribution to the package.
I appreciate anyone who took their time read this all lol. I really tried to keep it small enough but have enough information for anyone to assess if even some parts of it can be included as a GSoC project. In my small world these are big improvements to current workflow but obviously this depends on TFHub, TFlite etc. team’s priorities and goals. Any assessment about these problems and improvements is appreciated, as my goal at the end of the day is to write code which helps others while doing it I will get to learn more from more experienced people.
Thanks, Sid (on Side note I wrote this while going to sleeping without proof reading )