Keras Newsletter (April 7, 2023)

haifeng · April 7, 2023, 7:16pm

keras.utils.FeatureSpace (Keras 2.12 release highlights)

An cleaner way to do feature indexing, preprocessing and encoding.
(Tutorial Link)

feature_space = FeatureSpace(
    features={
        # Categorical features encoded as integers
        "sex": FeatureSpace.integer_categorical(num_oov_indices=0),
        "cp": FeatureSpace.integer_categorical(num_oov_indices=0),
        "fbs": FeatureSpace.integer_categorical(num_oov_indices=0),
        "restecg": FeatureSpace.integer_categorical(num_oov_indices=0),
        "exang": FeatureSpace.integer_categorical(num_oov_indices=0),
        "ca": FeatureSpace.integer_categorical(num_oov_indices=0),
        # Categorical feature encoded as string
        "thal": FeatureSpace.string_categorical(num_oov_indices=0),
        # Numerical features to normalize
        "age": FeatureSpace.float_discretized(num_bins=30),
        # Numerical features to normalize
        "trestbps": FeatureSpace.float_normalized(),
        "chol": FeatureSpace.float_normalized(),
        "thalach": FeatureSpace.float_normalized(),
        "oldpeak": FeatureSpace.float_normalized(),
        "slope": FeatureSpace.float_normalized(),
    },
    # Specify feature cross with a custom crossing dim.
    crosses=[
        FeatureSpace.cross(feature_names=("sex", "age"), crossing_dim=64),
        FeatureSpace.cross(
            feature_names=("thal", "ca"),
            crossing_dim=16,
        ),
    ],
    output_mode="concat",
)

Specify feature types and feature crosses
Call .adapt(dataset) to index feature values
Call it on dict data like a layer (e.g. in a tf.data pipeline)
Easy to customize by creating custom feature types

Example: creating a custom feature to encode a text paragraph with BERT embeddings

Keras V3 Saving Format (`.keras`) (Keras 2.12 release highlights)

A new saving format is now released in TF/Keras 2.12, marked by the .keras extension. You can start using it with:

model.save("your_model.keras", save_format="keras_v3")

More robust, config-based saving.

Idempotent saving — “what you reload is what you saved”
No reliance on loading via bytecode or pickling — a big advancement for secure ML!
Easier debugging via a more detailed serialization format for the model’s config file (JSON)

Wider support for exotic states.

Non-numerical states, such as vocabulary files and lookup tables
Exotic state elements in custom layers, such as FIFOQueue

Files inside .keras folder:

Config JSON
Metadata
weights files (.weights.h5)

Things to be aware of:

Python lambdas are disallowed at loading time.
- If you trust the source of the model and want to use lambdas, pass safe_mode=False at loading time.
Register your custom objects.
- The new saving format must have access to your custom objects. We recommend using the @keras.utils.register_keras_serializable decorator on the custom object definition.
The legacy formats (h5 and SavedModel) will continue to be supported indefinitely.

Note: Starting in TF 2.13,keras_v3 will become default for all files with the .keras extension.

New feature: `model.export(filepath)` (Keras 2.12 release highlights)

Documentation link

Create a lightweight SavedModel archive for serving (e.g. via TF-Serving)
Not idempotent!! Also targets Python-less runtimes
Customize serving signatures via keras.export.ExportArchive class

KerasNLP text generation (v0.5 preview)

from keras_nlp.models import GPT2CausalLM

model = GPT2CausalLM.from_preset(
    "gpt2_base_en",
)
model.compile(...)
model.fit(cnn_dailymail_dataset)
model.generate(
    "Snowfall in Buffalo",
    max_length=40,
)
>>> 'Snowfall in Buffalo, New York, was expected to reach '
    '2 feet by the end of the day, according to the National '
    'Weather Service.'

More in 0.5…
- Performant generation with XLA by default.
- Contrastive search (current SOTA for sampling LLMs).
- Masked language model training.
In the works…
- Seq2Seq: T5 and BART.
- Audio transcriber: Whisper.

KerasCV

Preview of unified KerasCV / KerasNLP API

from keras_cv.models import (
    ResNetBackbone, ImageClassifier,
)
backbone = ResNetBackbone.from_preset(
    "resnet50_imagenet",
)
model = ImageClassifier(
    backbone=backbone,
    num_classes=2,
)
model.compile(...)
model.fit(dataset)

from keras_cv.models import (
    ResNetBackbone, RetinaNet,
)
backbone = ResNetBackbone.from_preset(
    "resnet50_imagenet",
)
model = RetinaNet(
    backbone=backbone,
    num_classes=20,
    bounding_box_format="xywh",
)
model.compile(...)
model.fit(dataset)

Announcement: `import keras` is back

import keras is the recommended import style going forward instead of from tensorflow import keras.
Already on in keras-nightly
Will become standard in 2.13
Only exposes the public API
If you want to keep accessing private API symbols (please don’t):
- Use keras.src namespace
  - e.g. keras.utils.tf_utils → keras.src.utils.tf_utils
- (But seriously, just stick to the public API)

Community project highlights

Combine Transformers and RNNs to improve generalization. (link)
The Capsa project: A data- and model-agnostic neural network wrapper for risk-aware decision making. (link)

Fadi_Badine · April 10, 2023, 8:34am

Very interesting feature!
Regarding the KerasCV API, ImageClassifier, do we have access to it right now through a nightly version of keras_cv?
Thanks!

Mog · April 14, 2023, 11:07am

I love this setup! It was really weird having to repeat adapt for each feature.
Is it possible to export the feature space preprocessor to an environment without tensorflow, just with a runtime like a model in .keras or a TF Lite model?

If vocabulary files and lookup tables are going into the .keras models does that mean that it will be possible to include unicode normalization and tokenization in models without having all of TF installed when doing inference?

Thanks!

bischof · April 14, 2023, 9:05pm

Yes, it is available now although we are still in the process of converting all our classifier Backbones.