Detecting units of measure in text?

Hemmings · October 4, 2021, 2:39am

Apologies in advance if this has been covered elsewhere in the forum and also for my lack of basic knowledge of TensorFlow. I was hoping someone could explain whether it is possible for TensorFlow to detect and parse basic units of measurements from text such as articles, cooking recipes, directions etc. specifically unstructured information on the web, the kind of stuff people write … well, for people.

And by basic, I mean centimetres (cm), litres (L), ounces (oz), miles per hour (mph) but in all forms such as the full word, abbreviated, with or without slashes (/) and or other characters on their own or a combination of all? does anyone know if such as model exists, and if not, could one be taught? and what might that process look like?

I’ve looked at a few examples of code that has similar functionality and so far it reads as a spaghetti soup of regex, with countless edge cases to handle exceptional one-offs. This led me to think that replicating something similar would be code-intensive and in my opinion difficult to scale to wider units of measure. Naturally, as a people we find it easy to infer context from the text itself, and can even go as far as making inferences from the source. This has led me down the AI / ML route. I’m fairly new to the tech and wanted to get some helpful insight before diving headfirst with reckless abandon dpwn the rabbit hole.

I appreciate I may not be asking the right questions, or rather be asking the right audience but I’m hiping someone may be able to point in the direction of a group who does.

Renu_Patel · February 1, 2024, 6:52am

Hi @Hemmings

Welcome to the TensorFlow Forum!

You may need to preprocess the data using some custom functions as per your requirement and then need to train the model. Please refer to the TensorFlow Text tutorial’s Text Preprocessing section where different methodologies are mentioned for data preprocessing which might be helpful. Thank you.