Creating float tensors from BufferedImage in Java/Kotlin

Continuing a thread started on Gitter:

Hello, I want to run a Tensorflow model I found with a Java app, but I am having difficulty with getting the input just right. Below you can see the result from the layer analysis. I found a few examples for one-dimensional input (mnist) and I got another model working that required integers, but creating Tensor with dimensions {batch, height, width, channels} is a difficult task. I would like some help. The input is just a JPG, basically BufferedImage as I want to keep my options open.

Often TF Java users are looking for a snippet showing how this can be done easily, I’m sharing one here written in Kotlin (warning, I did not test it out after modifying it, but basically the logic should be good):

   fun preprocess(sourceImages: List<BufferedImage>, imageHeight: Int, imageWidth: Int, imageChannels: Int): TFloat32 {
        val imageShape = Shape.of(sourceImages.size.toLong(), imageHeight.toLong(), imageWidth.toLong(), imageChannels.toLong())
        
        return TFloat32.tensorOf(imageShape) { tensor ->
           
            // Copy all images to the tensor
            sourceImages.forEachIndexed { imageIdx, sourceImage ->
                
                // Scale the image to required dimensions if needed
                val image = if (sourceImage.width != imageWidth || sourceImage.height != imageHeight) {
                    val scaledImage = BufferedImage(imageWidth, imageHeight, BufferedImage.TYPE_3BYTE_BGR)
                    scaledImage.createGraphics().apply {
                        setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR)
                        drawImage(sourceImage, 0, 0, imageWidth, imageHeight, null)
                        dispose()
                    }
                    scaledImage
                } else {
                    sourceImage
                }
                
                // Converts the image to floats and normalize by subtracting mean values
                var i = 0
                for (h in 0L until imageHeight) {
                   for (w in 0L until imageWidth)  {
                       // "caffe"-style normalization
                       tensor.setFloat(image.data.dataBuffer.getElemFloat(i++) - 103.939f, imageIdx.toLong(), h, w, 0)
                       tensor.setFloat(image.data.dataBuffer.getElemFloat(i++) - 116.779f, imageIdx.toLong(), h, w, 1)
                       tensor.setFloat(image.data.dataBuffer.getElemFloat(i++) - 123.68f, imageIdx.toLong(), h, w, 2)
                   }
                }
            }
        }
    }

So the idea is simply to resample your image if it is not already of the right size and to normalize its pixel values when feeding the tensor. The “caffe”-style normalization is the one used by default by Keras in Python so the mean values to subtract were picked from Keras sources directly.

UPDATED : here’s the Java version

    TFloat32 preprocess(List<BufferedImage> sourceImages, int imageHeight, int imageWidth, int imageChannels) {
        Shape imageShape = Shape.of(sourceImages.size(), imageHeight, imageWidth, imageChannels);
        
        return TFloat32.tensorOf(imageShape, tensor -> {
            // Copy all images to the tensor
            int imageIdx = 0;
            for (BufferedImage sourceImage : sourceImages) {
                // Scale the image to required dimensions if needed
                BufferedImage image;
                if (sourceImage.getWidth() != imageWidth || sourceImage.getHeight() != imageHeight) {
                    image = new BufferedImage(imageWidth, imageHeight, BufferedImage.TYPE_3BYTE_BGR);
                    Graphics2D graphics = image.createGraphics();
                    graphics.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR);
                    graphics.drawImage(sourceImage, 0, 0, imageWidth, imageHeight, null);
                    graphics.dispose();
                } else {
                    image = sourceImage;
                }

                // Converts the image to floats and normalize by subtracting mean values
                int i = 0;
                for (long h = 0; h < imageHeight; ++h) {
                    for (long w = 0; w < imageWidth; ++w)  {
                        // "caffe"-style normalization
                        tensor.setFloat(image.getData().getDataBuffer().getElemFloat(i++) - 103.939f, imageIdx, h, w, 0);
                        tensor.setFloat(image.getData().getDataBuffer().getElemFloat(i++) - 116.779f, imageIdx, h, w, 1);
                        tensor.setFloat(image.getData().getDataBuffer().getElemFloat(i++) - 123.68f, imageIdx, h, w, 2);
                    }
                }
                ++imageIdx;
            }
        });
    }
2 Likes

Sorry I can’t add links but there’s also some example java code in the tensorflow-java models github repository. You need to drill down to the cnn FasterRcnnInception directory

2 Likes

The example @Keith_Hall is referring to is here - java-models/tensorflow-examples/src/main/java/org/tensorflow/model/examples/cnn/fastrcnn at master · tensorflow/java-models · GitHub

1 Like

Yes this other example is valid also but takes a different approach, it uses TensorFlow to decode and resize the images. The goal of my previous example is to demonstrate how to do it when using image utilities coming with the JDK.

1 Like

I have no experience with Kotlin, but it does look like it is a step in the right direction. I would like to take you up on your offer Karl to try and convert this to Java.

2 Likes

Please @James2026 , see above my initial post, I’ve added the same snippet but in Java

2 Likes

Hello,

I am moving to this thread from a github issue since this topic isn’t germane to the reason it was opened, but is one that I am still working on. I have had some discussion with Craig, Keith and Karl there already about converting BufferedImage to Tensors.

I have a JavaFX app, which uses the AWT Robot class to take BufferedImage screenshots of a game, which I want to feed into a TF2 model to make predictions on. I then want to capture the bounding box information, send it back to the JavaFX portion of my app and draw them onto the image. The goal of this is to get it as close to real time as possible. I also want to use the bounding box information to feed coordinates of objects back to some other handler that will use the Robot class key presses to avoid them. Disclaimer: I am working on this for a Thesis project and not as a botter.

I was able to get the reading/writing to file example to work, but I believe instead of being able to send the bounding box coordinates back, it wrote a new image with bounding boxes on it to a file. I’m hesitant to go this route because it seems like it will be a lot of writing to disk.

I have also tried to add the preprocessing solution @karllessard has come up with and have attempted to use threading to speed it up but run into memory access errors. (context: I am inexperienced with concurrency).

Is there a solution, where I can use something like the DecodePng feature but instead of having it read from a file, just have it take in a Buffered Image? Or, is there a concurrent solution to doing it just within the JDK methodology?

If you need this to be fast, you’ll want to avoid BufferedImage as much as possible. If all that you need from the Robot class is taking screenshots, that can be achieved a lot more efficiently with FFmpeg and JavaCV. There is some sample code for that here, among other places:

FFmpegFrameGrabber.grab() returns Frame objects, but what you want from them is Frame.image[0], which is typically just a ByteBuffer in BGR24 format from which we can easily create a Tensor.

And while you’re at it, you may want to try TF Lite since it is probably going to give you lower latency than TF Core:

1 Like

If you already have your image in memory and can access easily its raw pixels (no matter if it’s a BufferedImage or something else), you can certainly feed them directly to your tensor without passing through a file. The technique above shows only one way to do it, using AWT.

That being said, when you allocate a Tensor, you have directly access to its memory, using the Java NdArray library There are many accessors that allows you to transfer your pixel data to your tensor. Depending on your model, you’ll want to feed your tensor in BGR or RGB. Also, pixel data need to be normalized as floats between 0 and 1, while your PNG will probably have integer values.

If performance matter, you can try to apply these transformations first on the raw data of original image (e.g. normalization + channel reordering), using any Java techniques for doing it, then you could transfer that data directly to your tensor buffer like this:

    byte[] normalizedPixels = ....;

    try (TFloat32 tensor = TFloat32.tensorOf(Shape.of(w, h), t -> t.asRawTensor().data().write(normalizedPixels))) {
         ...
    }

That’s the most direct way I can think of right now, but there are also other ways to achieve something close if that doesn’t work for you.

About normalization, I’ve gave the “caffe-style” one in example, as it is the default used by Keras, but there are other valid ways to do it, e.g . float f = (x/127.5 - 1). Pick the best approach for your needs.

it wrote a new image with bounding boxes on it to a file. I’m hesitant to go this route because it seems like it will be a lot of writing to disk.

You definitely don’t need to do this. You can again read directly the data from your detectionBoxes and other output tensors and pass it any other handle you have or tool for drawing efficiently the bounding boxes in your frame. Again, check at the various read operations available for float buffers from the NdArray library.

1 Like

Sometimes you might need to reverse the BGR to RGB as well with a tensor. The Reverse op will do this for you.
e.g.
Reverse reverse = tf.reverse(tf.constant(someImageTensor), tf.constant(new long[]{2L}));

Thanks for this interesting information!