Model pruner failed

Lokesh_Tanwar_B21EE0 · July 11, 2023, 5:11am

I am trying to train a model some time ago it was training properly but now it is not able to converse the loss is not decreasing and a warning is showing up. naming model pruner failed
2023-07-10 17:03:41.026490: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node AssignAddVariableOp.
2023-07-10 17:03:41.168811: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node AssignAddVariableOp.

chunduriv · July 11, 2023, 9:22am

@Lokesh_Tanwar_B21EE0,

Welcome to the Tensorflow Forum,

Could you please share standalone code to reproduce and Tensorflow version in which it was working and not working to debug further?

Thank you!

Lokesh_Tanwar_B21EE0 · July 11, 2023, 4:09pm

This is the code i was using to initialize tensorflow which i got from some tutorial.

PATH_TPU_WORKER = ''

def check_tpu():
    """
    Detect TPU hardware and return the appopriate distribution strategy
    """
    
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver() 
        print('Running on TPU: {}'.format(tpu.master()))
    except ValueError:
        tpu = None

    if tpu:
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        tpu_strategy =tf.distribute.TPUStrategy(tpu)
    else:
        tpu_strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

    print("Num. replicas: {}".format(tpu_strategy.num_replicas_in_sync))
    
    return tpu, tpu_strategy
    
tpu, tpu_strategy = check_tpu()
PATH_TPU_WORKER = tpu.master()
NUM_REPLICAS = tpu_strategy.num_replicas_in_sync

the code for the model which i am trying to train

with tpu_strategy.scope():
    imgA = Input(shape = IMG_SHAPE) 
    imgB = Input(shape = IMG_SHAPE) 
    imgAC = tf.keras.layers.Concatenate()([imgA, imgA, imgA]) 
    imgBC = tf.keras.layers.Concatenate()([imgB, imgB, imgB]) 
    featureExtractor = tf.keras.applications.VGG19(
    include_top=False,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling="max",
    classes=1000,
    classifier_activation="softmax",
) 
    featsA = featureExtractor(imgAC)
    featsB = featureExtractor(imgBC)
    distance = Lambda(euclidean_distance)([featsA, featsB])
    outputs = Dense(1, activation="sigmoid")(distance)
    model = Model(inputs=[imgA, imgB], outputs=outputs)
model.summary()

the below code is for training the model

if evaluate == False:
    print("[INFO] training model...")
    history = model.fit(
        [img_data[:, 0], img_data[:, 1]], labelTrain[:],
        validation_data=([pairTest[:, 0], pairTest[:, 1]], labelTest[:]),
        batch_size=1024, 
        epochs=10,callbacks=[shed])

and i think there is no change in tensorflow model as i was training the model in kaggle only 2-3 days before and there is no new release of version in between

and i am also new in use of tpus so can you please help if i am writing the right code for tpus

ali_raza713 · July 12, 2023, 5:05am

First of all terminal node AssignAddVariableOp is used for assigning, updating and naming the value of a specified variable. There are few other issues you are encountering while performing model either you did not install or import required library or older version of tf. tf is incompetible with model pruner. look for data structure again is there anything missing in the code and carefully inspect the code. best wishes

Shai_Ronen · July 17, 2023, 4:52am

I get the same error as @Lokesh_Tanwar_B21EE0 model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node AssignAddVariableOp.
I’m also running it on Kaggle TPU.
The same code with no change runs for me on local GPU and on kaggle cloud gpu.
Running with TF version is 12.0. TF version on kaggle and some other machine learning packages are pre-installed by Kaggle.

Shai_Ronen · July 17, 2023, 4:52am

Follow up: code is able to run despite this error, although I’ve not checked convergence.
I am able to reprocude the error with a kaggle demo tpu statup kernel here:
I am able to reproduce the error with a Kaggle TPU demo kernel here:

In the demo kernel, the training is able to proceed despite the error message.

Lokesh_Tanwar_B21EE0 · July 17, 2023, 4:13pm

I have fine tuned the model now model is conversing but error is still there what does this error I want to know and what is the role of model pruner,