Date Parsing as a Model Layer in Tensorflow-Keras

barmanroys · July 20, 2021, 10:54am

Basically, training my model after some feature engineering from the raw data. The feature engineering includes converting an Order_date column (from a pandas dataframe having dtype: Object) to four separate features, i.e. date in month, month, year and weekday. See the following screenshot.

Screenshot from 2021-07-20 18-38-56

It is a simple task using pandas functionalities, but recently, I read from some keras and tensorflow documentation that the best models are end to end, i.e. take the raw data, then implement all feature engineering as layers before the neural network. So trying to adhere to that philosophy, but being somewhat noob in tensorflow api, not sure if this is possible here. I also looked up the tf.data pipelines and methods available tf.feature_column, but seems they do not fit my use case. So any help regarding the cleanest way to achieve it while sticking with tensorflow layer philosophy?

I can potentially use a custom layer subclassing from the base layer class which takes a batch of yyyy-mm-dd strings and spits out the four features as output, using pandas timestamp (which I want to avoid) and actually have not implemented that yet. There should still be a cleaner way using tf APIs.

Bhack · July 20, 2021, 4:15pm

I think you want something like the second bullet point in:

github.com/tensorflow/transform

Transforming date values

opened 08:07AM - 23 Apr 19 UTC

Malonl

stat:awaiting tensorflower type:feature

Hi. I have a use case where I want to use date features as input values for a… predictive model. I need to transform the date features to be useful. Examples I would like to be able to do: - Given two date columns, generate a new column with the difference between them in days. The days can be days, weeks, or even years apart - From a column with dates, create three different columns with extracted data, one for day, one for month, one for year If we could use some python library (datetime for example), it would be trivial. Without the library, we would need to implement the knowledge about the calendar (number of days in each month etc) I believe we cannot use a conventional python library because if we use it, the transformation would not be written to the graph, and thus we would not be able to have it at serving time. My question is if it is possible to do this within TF Transform and if not, is this a feature that is coming up? If there is no way such operations to the graph, we would need to implement a piece of pipeline transforming the data both before training and before serving, outside of the graph. Thanks in advance! PS: For transparancy, I have asked this question on [tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/issues/27893#issuecomment-484983991) and [tensorflow/tfx](https://github.com/tensorflow/tfx/issues/41#issuecomment-485289180) as well, but have been redirected to this forum.