Negative values in normalized data

Narsis · February 16, 2024, 4:12pm

I have a dataset excel file which is normalized but it contains negative values that causes some of my prediction results become negative. Is it normal to have negative values among my normalized dataset? because as I know a normalized dataset values are between 0 and 1.
I tried to normalize it again but my MAPE value become a very large number something like 7568839.743593587, which shows that I have an issue.

tagoma · February 16, 2024, 9:15pm

Hi @Narsis
If you applied a method/formula to have your normalized data in the [0, 1] range but get negative values, there is obviously an issue. Your large MAPE values also seems to suggest there is something wrong.
Please share data, and code, and comment further.
Hopefully this relates to Tensorflow.

Tim_Wolfe · February 17, 2024, 6:55am

The presence of negative values in a normalized dataset is not unusual, but it depends on the normalization technique used. It seems there might be a confusion between normalization and scaling methods.

Tim_Wolfe · February 17, 2024, 6:56am

Here are some key points to clarify this:

Normalization vs. Scaling:

Normalization typically refers to the process of adjusting values measured on different scales to a notionally common scale. A common normalization technique is Min-Max scaling, which indeed scales the data to a fixed range (usually 0 to 1). However, not all normalization techniques restrict the data to positive values only.
Scaling can involve various methods, such as Standard Scaling (or Z-score normalization), which centers the data around 0 with a standard deviation of 1. This method can result in negative values, especially for data points that are below the mean.

Negative Values in Normalized Data:

It’s perfectly normal to have negative values in a dataset that has been scaled using methods like Z-score normalization. This is because the process involves subtracting the mean from each data point and then dividing by the standard deviation, which can result in negative values for points that are below the mean.
If you’re using Min-Max scaling and still getting negative values, it’s worth checking the implementation for any errors. Min-Max scaling should indeed result in values between 0 and 1 (or another specified range like -1 to 1 if intentionally designed that way).

Issues with Re-Normalization and MAPE:

Re-normalizing already normalized data can lead to unintended consequences, as it may distort the underlying distribution and relationships within your data. This could be the reason why you’re seeing a large Mean Absolute Percentage Error (MAPE) value. The MAPE becomes particularly sensitive and unreliable when dealing with values close to zero, which can happen with improper re-normalization.
The extremely large MAPE suggests that your model’s predictions are significantly diverging from the actual values, possibly due to the re-normalization distorting the data’s structure.

Solution:

Review Your Normalization Approach: Ensure that you’re using an appropriate normalization or scaling method for your data and your use case. If your model’s output should only be positive, consider using Min-Max scaling to keep all features within the 0 to 1 range. However, remember that this might not be suitable for all types of data and models.
Avoid Re-Normalization: If your data is already normalized, avoid re-normalizing it unless you’re applying a different method for a specific reason. If you must re-scale or re-normalize, ensure to reverse any previous normalization on your predictions to evaluate performance accurately.
Consider the Model and Data: If negative predictions are problematic for your use case (e.g., predicting quantities that can’t be negative), you might need to adjust your model or post-process your predictions to ensure they fall within a valid range.

It might be helpful to revisit your data preprocessing steps and model setup to ensure they align with your data characteristics and prediction goals.

Narsis · February 17, 2024, 8:53am

Here is three samples of my data in excel file, which is network traffic:
3805.664166
3761.303008
3799.392863

Here is the normalized data of these samples:
-0.0327453
-0.0566811
-0.0361291

I tried to normalized them again using this formula:
(X - X_min) / (X_max - X_min)
The problem of negative values solved but I encounter a large MAPE. Here is my Python code:

scaler = MinMaxScaler(feature_range=(0, 1))
data[‘traffic’] = scaler.fit_transform(data[‘traffic’].values.reshape(-1,1))

tagoma · February 18, 2024, 8:14am

You shall probably share your whole snippet code if you want people to be able to help you.

Narsis · February 18, 2024, 9:39am

Thank you, I just solved the issue.