# 84. Mutation: Calibrate and Clean YieldDataset

Calibrating the "YieldDataset" is a functionality that corrects the distribution of values in alignment with mathematical principles, enhancing the overall integrity of the data. It bolsters the quality of decision-making and renders the dataset valuable for further in-depth analysis.

Common use cases for this functionality include:

* Synchronizing data when multiple harvesters have worked either simultaneously or over several days, ensuring consistency.
* Making the dataset more homogeneous and accurate by smoothing out variations.
* Removing data noise and extraneous information that can cloud insights.
* Eliminating turnarounds or abnormal geometries, which may distort the actual patterns and trends in the field.
* Adjusting yield data to correspond with established averages or totals for each attribute.

For a more detailed exploration and examples, please refer to our [Yield Calibration & Cleaning](https://docs.geopard.tech/geopard-tutorials/agronomy/yield-calibration-and-cleaning) use case.

### Original YieldDataset with WetMass attribute

Five harvesters worked in parallel on the 30ha Field shown below. The calibration of one of the harvesters was not synchronized with the others, resulting in <mark style="color:orange;">orange</mark> spots, indicating that additional `CALIBRATION` is required. Additionally, there are numerous turn-around <mark style="color:red;">red</mark> spots closer to the "Field" edges that need to be eliminated.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2Fj40y3lH0F0ZfJNH2WhTf%2Fimage.png?alt=media&#x26;token=6bf7baaa-3cda-4863-98e1-e0a92ff2b765" alt=""><figcaption><p>Original YieldDataset with WetMass attribute</p></figcaption></figure>

### Calibrated and Cleaned YieldDataset with WetMass attribute

The result below shows the dataset after applying automatic `CALIBRATE` and `CLEAN` operations using default parameters. The resulting "YieldDataset" has become homogeneous, without outliers or abrupt changes between neighboring geometries.

<figure><img src="https://3272281156-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYICBELdyAXXebKAzfLOR%2Fuploads%2Fu3zjlXg03WoWrjOOT9v7%2Fimage.png?alt=media&#x26;token=d298f632-b0fe-462f-a395-64a64c02ea55" alt=""><figcaption><p>Calibrated and Cleaned YieldDataset with WetMass attribute</p></figcaption></figure>

### Pathwise vs Average/Total vs Conditional options

**Pathwise calibration** corresponds to the machine's tracks. Each machine track is processed as an individual region for calibration purposes. The GeoPard team suggests using this method as the standard approach.

**Average/Total calibration** focuses on redistributing attribute values. If the geospatial patterns are accurate but the absolute figures deviate from the actuals, this method proves beneficial. For optimal results, GeoPard advises combining it with Pathwise calibration: first applying Pathwise, then adjusting to known Average/Total values.

**Conditional calibration** adjusts attribute values based on provided min and max thresholds. This method is especially valuable when the geospatial patterns are precise, but the distribution of values requires adjustments, particularly when known min and max values exist. For the best outcomes, GeoPard recommends pairing it with Pathwise calibration: starting with Pathwise, followed by adjustments to align with the known min and max values.

### Hints

{% hint style="warning" %}
**Hint for Data Anomalies**

If a user encounters anomalies in the data, such as values at or near zero, or unusually large values (for instance, an average of 10 with a maximum of 8000), the **Clean & Calibration** workflow is advised. It is configured using parameters `actions: [CLEAN, CALIBRATE]`.&#x20;

Prioritizing data Cleaning before Calibration ensures the removal of errors, missing values, or inconsistencies, thereby enhancing data quality and accuracy.
{% endhint %}

{% hint style="warning" %}
**Hint for Data without Initial Errors**&#x20;

For datasets initially free from errors, missing values, or inconsistencies, and when multiple harvesters are known to be involved, consider the **Calibration & Clean** workflow. It is configured using parameters `actions:  [CALIBRATE, CLEAN]`.

Cleaning the data post-calibration helps to refine the dataset further by potentially eliminating any artifacts introduced during calibration.
{% endhint %}

### Documentation: Mutation details

The default standard configuration enables auto calibration and cleansing of the "YieldDataset".

{% code overflow="wrap" lineNumbers="true" %}

```graphql
mutation AutoCalibrateYieldDataset {
  calibrateYieldDataset(input: {
    fieldUuid: "<placeholder_of_field_uuid>"
    yieldDatasetUuid: "<placeholder_of_dataset_uuid>"
    actions: {
      calibrateAction: {
        calibrationAttributes:["WetMass", "VRYIELDVOL"]
        conditionPathwiseCalibration: {
          calibrationBasis: "WetMass"
          maxHomogeneityRegion: true
          syntheticMachinePath: true
        }
      }
      cleanAction: {
        condtionAutoClean: {
          targetAttribute: "WetMass"
        }
      }
      
      actions:  [CALIBRATE, CLEAN]
    }
  }) {
    uuid
    fieldUuid
    originalUuid
    statusCode
    statusMessage
  }
}
```

{% endcode %}

A more advanced sample provides manual control of min/max ranges and incorporates additional attributes.

{% code overflow="wrap" lineNumbers="true" %}

```graphql
mutation CustomCalibrateYieldDataset {
  calibrateYieldDataset(input: {
    fieldUuid: "<placeholder_of_field_uuid>"
    yieldDatasetUuid: "<placeholder_of_dataset_uuid>"
    actions: {
      calibrateAction: {
        calibrationAttributes:["WetMass", "VRYIELDVOL"]
        smoothWindowSize: 11
        conditionPathwiseCalibration: {
          calibrationBasis: "Machine"
          maxHomogeneityRegion: true
          syntheticMachinePath: false
        }
        conditionAvgTotalCalibration:[{
          calibrationAttribute: "WetMass"
          average:8.0
        }]
      }
      cleanAction: {
        condtionAutoClean: {
          targetAttribute: "WetMass"
        }
        conditionMinMaxClean: [{
          cleanAttribute: "Distance"
          min: 1.5
          max: 10.0
        }]
      }
      actions:  [CALIBRATE, CLEAN]
    }
  }) {
    uuid
    fieldUuid
    originalUuid
    statusCode
    statusMessage
  }
}
```

{% endcode %}

{% hint style="info" %}
To follow the USDA protocol for the `CLEAN` operation, you must either mention ALL columns in the `cleanAction` -> `conditionMinMaxClean` or specify a portion of them in `cleanAction` -> `conditionMinMaxClean` and the remaining ones in `condtionAutoClean` -> `excludedAttributes`.
{% endhint %}

Input parameters:

* `actions` as an array, allowing you to choose *the correcting actions and their sequence* of application; supported values include `CLEAN` and `CALIBRATE`.
* `calibrateAction` as an object containing configuration details related to the `CALIBRATE` operation.&#x20;
  * `calibrationAttributes` as an array of attributes requiring calibration, typically linked to the Yield column.
  * `smoothWindowSize` as an odd integer that smoothens the result values, reducing abrupt jumps in the values.
  * `conditionPathwiseCalibration` as an object with the **Pathwise** calibration corresponds to the machine's tracks. Each machine track is processed as an individual region for calibration purposes.
    * `calibrationBasis` as a string representing the attribute used as the basis for calibration.
    * `maxHomogeneityRegion` as a boolean that indicates whether the maximum homogeneity region is used as the referenced region for calibration.
    * `syntheticMachinePath` as a boolean that indicates the simulation of machine routes, it is beneficial when the precise machine path attribute is absent and needs simulation based on timestamps or a similar attribute.
  * `conditionAvgTotalCalibration` as an object with the **Average/Total** calibration focuses on redistributing attribute values. If the geospatial patterns are accurate but the absolute figures deviate from the actuals, this method proves beneficial.
    * `calibrationAttribute` as a string representing the attribute used to be calibrated.
    * `average` as a number representing the average values of the attribute; the attribute values should align with this average. Only one option, either `average` or `total`, should be utilized at a time.
    * `total` as a number representing the total sum of the attribute values; the aggregate of these values should match the total. Only one option, either `average` or `total`, should be utilized at a time.
  * `conditionMinMaxCalibration` as an object with the **Conditional** calibration adjusts attribute values based on provided min and max thresholds.
    * `calibrationAttribute` as a string representing the attribute used to be calibrated.
    * `min` as a number representing the minimum values of the attribute, serving as the lowest range for calibration.
    * `minIncluded` as a boolean indicating whether or not to include the minimum value
    * `max` as a number representing the maximum values of the attribute, serving as the highest range for calibration.
    * `maxIncluded` as a boolean indicating whether or not to include the maximum value.
* `cleanAction` as an object that includes the configuration specifics tied to the `CLEAN` operation.
  * `conditionAutoClean` as an object that includes the configurations specific to the auto-clean algorithm.
    * `targetAttribute` as a string representing target Yield values.
    * `excludedAttributes` as an array of strings defining attributes that don't influence the cleaning operation.
  * `conditionMinMaxClean` as an array of objects containing the described cleaning rules, every object includes the following parameters.
    * cleanAttribute as a string specifying the column name for the rule.
    * `min` as a number indicating the minimum value.
    * `max` as a number indicating the maximum value.

{% hint style="info" %}
To view the inputs and access the latest available values of enumerations (such as `operations`), it is recommended to utilize [Altair](https://docs.geopard.tech/geopard-tutorials/api-docs/getting-started).
{% endhint %}

### Documentation: YieldDataset query

As a GeoPard API consumer, you can retrieve details regarding the corrections applied to YieldDatasets through the attributes `appliedCorrections` and `appliedCorrectionsVersion`. The former provides a list of corrections made (e.g., `CALIBRATE` and `CLEAN`), with the order of execution denoted by their sequence in the array. Meanwhile, `appliedCorrectionsVersion` indicates the version of the algorithm employed.

{% code overflow="wrap" lineNumbers="true" %}

```graphql
query DetailsAboutAppliedCalibrations {
  getFields (filter: {fieldUuid: "<placeholder_of_field_uuid>"}){
    fields {
      uuid
      yieldDatasets  {
        uuid
        appliedCorrections
        appliedCorrectionsVersion
      }
    }
  }
}
```

{% endcode %}
