TF Image Processing?

I’m looking into implementing high-quality image-processing operations using TF. For example, I’d like to have a higher-quality downsampling method, like Lanczos as a TF model. Please forward any references to this sort of work you are aware of.

For example, a basic Gaussian blur can be implemented by passing a custom-width kernel to tf.conv2d() (I’m using TFJS). This works great, but has the expected issues along the image boundary. Production-quality image processing tools solve this edge problem in one of a few ways, typically by adjusting the kernel weights outside the image to zero. However, I’m not experienced enough at how to set different kernels along the image boundaries.

Can anyone provide some tips?

For more context, here’s code that does a simple NxN Gaussian blur, without handling the borders. I’d love to figure out how to enhance this code to provide different kernels along the boundary rows and columns to do a better job of handling the edges (ie. not blending with zero).

const lanczos = (x, a) => {
  if (x === 0) return 1
  if (x >= -a && x < a) {
    return (a * Math.sin(Math.PI * x) * Math.sin(Math.PI * (x / a))) / (Math.PI * Math.PI * x * x)
  }
  return 0
}

const gaussian = (x, theta = 1 /* ~ -3 to 3 */) => {
  const C = 1 / Math.sqrt(2 * Math.PI * theta * theta)
  const k = -(x * x) / (2 * theta * theta)
  return C * Math.exp(k)
}

const filters = {
  Lanczos3: x => lanczos(x, 3),
  Lanczos2: x => lanczos(x, 2),
  Gaussian: x => gaussian(x, 1),
  Bilinear: () => 1,
  Nearest: () => 1,
}

const normalizedValues = (size, filter) => {
  let total = 0
  const values = []
  for (let y = -size; y <= size; ++y) {
    const i = y + size
    values[i] = []
    for (let x = -size; x <= size; ++x) {
      const j = x + size
      values[i][j] = []
      const f = filter(x) * filter(y)
      total += f
      for (let c = 0; c < 3; ++c) {
        values[i][j][c] = [ f, f, f ]
      }
    }
  }
  const kernel = values.map(row => row.map(col => col.map(a => a.map(b => b / total))))
  // for (let x = -size; x <= size; ++x) values[x + size] = filter(x)
  // const kernel = tf.einsum('i,j->ij', values, values)
  // const sum = tf.sum(values)
  const normalized = tf.div(kernel, total * 3)
  return normalized
}

const frame = async (tensor, args) => {
  const filter = filters[args.filter]
  // const [ height, width ] = tensor.shape
  // const res = args.resolution === 'Source' ? [ width, height ] : resolutions[args.resolution]
  // const strides = [ width / res[0], height / res[1] ]
  const { zoom, kernelWidth } = args
  const strides = Math.max(1, zoom)
  const size = Math.max(3, kernelWidth) * strides
  const kernel = normalizedValues(size, filter)
  const pad = 'valid' // sample to the edge, even when filter extends beyond image
  const dst = tf.conv2d(tensor, kernel, strides, pad)
  return { tensor: dst }
}

Could you be more specific here?

You always have the option to slice the image into (overlapping): center, edges, corners. Apply the Conv to the center portion, apply modified convs to the edges and corners. And then 2d stack them back together. But there may be a short-cut depending on how you plan to modify the kernel at the edges.

Right - slicing the image up into “center” and then a set of rows and columns is one option. The other is to change the kernel for those “pixels” and keep the image in a single buffer. Specifically, we need to adjust the kernel by renormalizing it after excluding values “outside” the source buffer, rather than by assuming those values are zero and using them with a constant kernel as in tf.conv2d. Wikipedia calls this approach “kernel crop”. Note that “extending” the edges is also acceptable in most situations with image processing, and is often easier to implement on the CPU. The “wrap” and “mirror” options are more specific to texture rendering in 3D.

Assume a 5x5 kernel for simplicity (it should always be an odd width). Ignoring the corners for now, the first row would want a 3x5 kernel, renormalized after removing the two rows that are outside the source buffer. The second row would require a renormalized 4x5 kernel.

Note that the GPU hardware supports the “extend”, “mirror” and “wrap” modes in the core texture sampling hardware. I don’t know how this maps to the various TF backends, but I’m curious to learn. The hardware also supports non-integer bilerp sampling, which would also help, but that’s another question. So, in addition to the slicing you propose, we can also pad the input by extending the final row and column out to half the kernel width.

I’m happy to dive in and learn these items and was mostly looking for tips as to where to start as this is my first spelunking into the building of models from scratch in TF.

One simple but inefficient pattern I’ve used is to make an image of 1s.

Run the weight kernel over the image of 1s, with zero padding. The result is an weight-image where the value of each pixel is weight of the kernel that was valid at that location.

Then run the actual kernal over the actual image, with zero padding. to get the conv-image.

Then divide the conv-image by the weight image.

I’m pretty sure that’s equivalent.

My hope is to find a near optimal, single-pass algorithm that avoids data copies. Otherwise, I think it makes more sense to get the Tensor’s texture ID and perform the IP using WebGL shaders.