Java API help/example for Audio (with yamnet)

Paul_Allen · August 3, 2023, 4:04am

Hi, I’m new to TensorFlow and have been trying to follow the Java API examples:

I would like to load the yamnet model and detect a simple sound (like water). Ideally returning a category ‘water’ and probability ‘0.91’…

This is as far as I got…

build.gradle:
    // tensorflow-core-platform
    implementation 'org.tensorflow:tensorflow-core-platform:0.5.0'

-----
TensorFlowTest.java:

	@Test
	public void tfTest() throws Exception {
		final String test = "src/test/resources/files/audio/wav/water.wav";

		String version = TensorFlow.version();
		LOGGER.info("TensorFlow: " + version);
		assertNotNull(version);

		// https://tfhub.dev/google/yamnet/1
		SavedModelBundle model = SavedModelBundle.load("models/yamnet_1/");
		printSignature(model);


		try (Graph g = new Graph(); Session s = new Session(g)) {
			Ops tf = Ops.create(g);
			Constant<TString> fileName = tf.constant(test);
			ReadFile readFile = tf.io.readFile(fileName);

			Session.Runner runner = s.runner();

			//DecodeWav.Options options = DecodeWav.desiredChannels(1L);
			DecodeWav decodeWav = tf.audio.decodeWav(readFile.contents());

			runner.fetch(decodeWav.audio());
			try(Result result = runner.run()) {
				for(String set : result.keySet()) {
					Optional<Tensor> r = result.get(set);
				}
			}

			//...
		}
	}

If there are some API docs I missed, or general flow of detection, explanation of: Graph, Session, Model (explanation of the calls to “image_tensor”, “detection_scores”, “detection_classes”…) or if you can help with an example, that would be great.

Kind regards,
Paul

Paul_Allen · August 3, 2023, 2:31pm

I tried reshaping the data…

Shape audioShape = runner.fetch(decodeWav.audio()).run().get(0).shape();
Reshape<TFloat32> reshape = tf.reshape(decodeWav.audio(),
   tf.array(1,
      audioShape.asArray()[0]
   )
);

and found there is a “serving_default” function, which take a Map of names and Tensor data…

try (Result result = runner.fetch(reshape).run()) {
   Map<String, Tensor> feedDict = new HashMap<>();
   feedDict.put("waveform", result.get(0));
   Result outputTensorMap = model.function("serving_default").call(feedDict);

This returned the error…

Input to reshape is a tensor with 21248 values, but the requested shape has 10624
	 [[{{node Reshape}}]]

I realised the audio sample was stereo so added a DecodeWav.Options option…

DecodeWav.Options options = DecodeWav.desiredChannels(1L);
DecodeWav decodeWav = tf.audio.decodeWav(readFile.contents(), options);

now I get the error…

The first dimension of paddings must be the rank of inputs[1,2] [10624,1]
	 [[{{node yamnet_frames/tf_op_layer_Pad/Pad}}]]

karllessard · August 4, 2023, 12:58pm

Hi Paul,

if your intent is simply to run inference on yamnet, you simply need to invoke the call method of your loaded model.

Now if you want to use TensorFlow also to decode your input wave file, yes you can build a separate Graph, a new ConcreteFunction like in this tutorial, or use TF eager execution directly like I’ll show here:

var tf = Ops.create(); // runs in eager mode by default

try (var model = SavedModelBundle.load("models/yamnet_1/")) {
    var file = tf.io.readFile(tf.constant("/.../"));
    var decodedWave = tf.audio.decodeWav(file.contents());

    try (var audio = decodeWave.audio().asTensor()) {
         var inputs = Map.of("waveform", audio);

         try (var result = model.call(inputs)) {
             var scores = result.get("scores").get();
             ...
         }
    }
}

Something like that, I haven’t tried it myself, please let us know how that went, thanks!

Paul_Allen · August 7, 2023, 10:21am

Hi Karl,

Thank you for the example. I made a few minor adjustments…

	@Test
	public void tfExample() throws Exception {
		final String test = "src/test/resources/files/audio/wav/blop1CH.wav";

		Ops tf = Ops.create(); // runs in eager mode by default

		try (SavedModelBundle model = SavedModelBundle.load("models/yamnet_1/")) {
			ReadFile file = tf.io.readFile(tf.constant(test));
			DecodeWav decodedWave = tf.audio.decodeWav(file.contents());

			try (Tensor audio = decodedWave.audio().asTensor()) {
				Map<String, Tensor> inputs = Map.of("waveform", audio);

				try (Result result = model.call(inputs)) {
					Tensor scores = result.get("scores").get();
					assertNotNull(scores);
				}
			}
		}
	}

However I get the same error I had with my attempt?

The first dimension of paddings must be the rank of inputs[1,2] [10624,1]
	 [[{{node yamnet_frames/tf_op_layer_Pad/Pad}}]]
org.tensorflow.exceptions.TFInvalidArgumentException: The first dimension of paddings must be the rank of inputs[1,2] [10624,1]
	 [[{{node yamnet_frames/tf_op_layer_Pad/Pad}}]]
	at app//org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
	at app//org.tensorflow.Session.run(Session.java:835)
	at app//org.tensorflow.Session$Runner.runHelper(Session.java:558)
	at app//org.tensorflow.Session$Runner.run(Session.java:485)
	at app//org.tensorflow.SessionFunction.call(SessionFunction.java:115)
	at app//org.tensorflow.SavedModelBundle.call(SavedModelBundle.java:457)
	at app//***.functional.TensorFlowTest.tfExample(TensorFlowTest.java:52)
	...snip...