# Compare Soil Scanner Data between Years

Soil scanners are essential tools for precision agriculture, enabling the collection of high-resolution data on soil properties such as moisture, organic matter, and nutrient levels. Comparing two soil scanner datasets is crucial for understanding changes over time, validating different scanning methods, or calibrating new devices. This article explores various mathematical approaches to measure deviation between two soil scanner datasets, providing actionable insights for researchers and agronomists.

### Understanding Deviation in Soil Scanner Data

The deviation between two soil scanner datasets refers to the differences in measured values at the same locations, which may arise due to variations in measurement conditions, sensor calibration, or soil dynamics. The most common types of deviations include:

* Absolute Differences: Direct subtraction of values between datasets.
* Relative Differences: Comparison based on the magnitude of measurements.
* Error Metrics: Statistical measures like Mean Absolute Error (MAE) and Normalized Difference.

Two soil scanner datasets with potassium for 2024 and 2025 were chosen.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2FC2QduX7YR6gTjWUZGqMX%2Fimage.png?alt=media&#x26;token=f1609570-7428-4883-b589-963d4d8e2767" alt=""><figcaption><p>Initial soil scanner datasets</p></figcaption></figure>

### Choosing the Right Deviation Method

| Method                              | Best for                                            |
| ----------------------------------- | --------------------------------------------------- |
| Direct Difference                   | Simple visualization of positive/negative changes   |
| Relative Difference                 | Comparing datasets with different scales            |
| Normalized Difference               | Standardized analysis across different datasets     |
| Relative Deviation                  | Proportional differences, useful for trend analysis |
| Mean Absolute Error (MAE) per Pixel | Identifying areas with large absolute differences   |

### Direct Difference Calculation

This Direct Difference method simply subtracts one dataset from the other to visualize the changes in soil attributes directly.

The usage of `geopard.calculate_difference(dataset_1, dataset_2)` with parameters explanation is documented [here](https://docs.geopard.tech/geopard-tutorials/product-tour-web-app/equation-based-analytics/catalog-of-custom-functions#calculate_difference).

Pros:

* Clearly shows positive and negative changes.
* Easy to interpret and visualize.

Cons:

* The difference values may be hard to compare if datasets have different scales.
* High variation can dominate interpretation.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2FwgA9AsI1dfinjjQwJ8C9%2Fimage.png?alt=media&#x26;token=20e684fd-02fb-4cbb-9c9d-b34aa62ed38f" alt=""><figcaption><p>Direct Difference Calculation</p></figcaption></figure>

### Relative Difference Calculation

The Relative Difference method calculates the percentage change between the datasets based on the second dataset, offering another perspective on deviation.

The usage of `geopard.calculate_relative_difference(dataset_1, dataset_2)` with parameters explanation is documented [here](https://docs.geopard.tech/geopard-tutorials/product-tour-web-app/equation-based-analytics/catalog-of-custom-functions#calculate_relative_difference).

Pros:

* Good for understanding how much one dataset has changed in proportion to another.
* Normalizes differences across varying magnitudes.

Cons:

* Can become unstable when values in the second dataset are close to zero.
* Less intuitive when absolute differences are important.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2Fdvu0tLmgObpZ3WsvGrKQ%2Fimage.png?alt=media&#x26;token=fbe018a8-a8dc-421f-9fe9-19b78869b18f" alt=""><figcaption><p>Relative Difference Calculation</p></figcaption></figure>

### Normalized Difference Calculation

The Normalized Difference method normalizes the datasets by their global maximum value before computing differences, ensuring that variations are comparable across different scales.

The usage of `geopard.calculate_normalized_difference(dataset_1, dataset_2)` with parameters explanation is documented [here](https://docs.geopard.tech/geopard-tutorials/product-tour-web-app/equation-based-analytics/catalog-of-custom-functions#calculate_normalized_difference).

Pros:

* Effective for datasets with different dynamic ranges.
* Reduces the impact of extreme values.

Cons:

* Small variations may appear exaggerated if not scaled properly.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2FSfC0I2ieL3Tsb7zP0wlg%2Fimage.png?alt=media&#x26;token=2d0c3bc9-5c7d-4fb7-971f-2ea23d6b8402" alt=""><figcaption><p>Normalized Difference Calculation</p></figcaption></figure>

### Relative Deviation per Pixel

The Relative Deviation method calculates the deviation as a percentage relative to the first dataset. It helps in understanding proportional differences rather than absolute differences.

The usage of `geopard.calculate_per_pixel_relative_deviation(dataset_1, dataset_2)` with parameters explanation is documented [here](https://docs.geopard.tech/geopard-tutorials/product-tour-web-app/equation-based-analytics/catalog-of-custom-functions#calculate_per_pixel_relative_deviation).

Pros:

* Useful when comparing datasets with different scales.
* Expresses deviation in an interpretable percentage format.

Cons:

* Can be misleading if the original values are very small.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2FdAxMNaTGC9JR857B6IdN%2Fimage.png?alt=media&#x26;token=1d81e310-a29a-4963-81ad-24b7d24c6e87" alt=""><figcaption><p>Relative Deviation per Pixel</p></figcaption></figure>

### Mean Absolute Error (MAE) per Pixel

The Mean Absolute Error (MAE) method measures the absolute differences between corresponding values in two datasets. It provides a clear view of where the highest discrepancies occur.

The usage of `geopard.calculate_per_pixel_mae(dataset_1, dataset_2)` with parameters explanation is documented [here](https://docs.geopard.tech/geopard-tutorials/product-tour-web-app/equation-based-analytics/catalog-of-custom-functions#calculate_per_pixel_mae).

Pros:

* Simple and intuitive.
* Highlights large differences clearly.
* Works well for datasets with similar scales.

Cons:

* Doesn't show the direction of the difference (i.e., positive or negative change).
* Sensitive to outliers.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2F2vU1J5nHGE6WVdGdyoOh%2Fimage.png?alt=media&#x26;token=79fcb18b-0a60-4dbe-8f9a-900d28ccb0e8" alt=""><figcaption><p>Mean Absolute Error (MAE) per Pixel</p></figcaption></figure>

### Conclusion

Comparing soil scanner datasets requires a variety of mathematical approaches to extract meaningful differences. Whether using absolute metrics like MAE, relative deviations, or normalized comparisons, selecting the right method depends on the use case. By leveraging these techniques, agronomists and researchers can improve soil analysis, detect field variations, and enhance precision agriculture workflows.
