84. Mutation: Calibrate and Clean YieldDataset
API calls to clean and calibrate Yield datasets
Last updated
API calls to clean and calibrate Yield datasets
Last updated
Calibrating the "YieldDataset" is a functionality that corrects the distribution of values in alignment with mathematical principles, enhancing the overall integrity of the data. It bolsters the quality of decision-making and renders the dataset valuable for further in-depth analysis.
Common use cases for this functionality include:
Synchronizing data when multiple harvesters have worked either simultaneously or over several days, ensuring consistency.
Making the dataset more homogeneous and accurate by smoothing out variations.
Removing data noise and extraneous information that can cloud insights.
Eliminating turnarounds or abnormal geometries, which may distort the actual patterns and trends in the field.
Adjusting yield data to correspond with established averages or totals for each attribute.
For a more detailed exploration and examples, please refer to our Yield Calibration & Cleaning use case.
Five harvesters worked in parallel on the 30ha Field shown below. The calibration of one of the harvesters was not synchronized with the others, resulting in orange spots, indicating that additional CALIBRATION
is required. Additionally, there are numerous turn-around red spots closer to the "Field" edges that need to be eliminated.
The result below shows the dataset after applying automatic CALIBRATE
and CLEAN
operations using default parameters. The resulting "YieldDataset" has become homogeneous, without outliers or abrupt changes between neighboring geometries.
Pathwise calibration corresponds to the machine's tracks. Each machine track is processed as an individual region for calibration purposes. The GeoPard team suggests using this method as the standard approach.
Average/Total calibration focuses on redistributing attribute values. If the geospatial patterns are accurate but the absolute figures deviate from the actuals, this method proves beneficial. For optimal results, GeoPard advises combining it with Pathwise calibration: first applying Pathwise, then adjusting to known Average/Total values.
Conditional calibration adjusts attribute values based on provided min and max thresholds. This method is especially valuable when the geospatial patterns are precise, but the distribution of values requires adjustments, particularly when known min and max values exist. For the best outcomes, GeoPard recommends pairing it with Pathwise calibration: starting with Pathwise, followed by adjustments to align with the known min and max values.
Hint for Data Anomalies
If a user encounters anomalies in the data, such as values at or near zero, or unusually large values (for instance, an average of 10 with a maximum of 8000), the Clean & Calibration workflow is advised. It is configured using parameters actions: [CLEAN, CALIBRATE]
.
Prioritizing data Cleaning before Calibration ensures the removal of errors, missing values, or inconsistencies, thereby enhancing data quality and accuracy.
Hint for Data without Initial Errors
For datasets initially free from errors, missing values, or inconsistencies, and when multiple harvesters are known to be involved, consider the Calibration & Clean workflow. It is configured using parameters actions: [CALIBRATE, CLEAN]
.
Cleaning the data post-calibration helps to refine the dataset further by potentially eliminating any artifacts introduced during calibration.
The default standard configuration enables auto calibration and cleansing of the "YieldDataset".
A more advanced sample provides manual control of min/max ranges and incorporates additional attributes.
To follow the USDA protocol for the CLEAN
operation, you must either mention ALL columns in the cleanAction
-> conditionMinMaxClean
or specify a portion of them in cleanAction
-> conditionMinMaxClean
and the remaining ones in condtionAutoClean
-> excludedAttributes
.
Input parameters:
actions
as an array, allowing you to choose the correcting actions and their sequence of application; supported values include CLEAN
and CALIBRATE
.
calibrateAction
as an object containing configuration details related to the CALIBRATE
operation.
calibrationAttributes
as an array of attributes requiring calibration, typically linked to the Yield column.
smoothWindowSize
as an odd integer that smoothens the result values, reducing abrupt jumps in the values.
conditionPathwiseCalibration
as an object with the Pathwise calibration corresponds to the machine's tracks. Each machine track is processed as an individual region for calibration purposes.
calibrationBasis
as a string representing the attribute used as the basis for calibration.
maxHomogeneityRegion
as a boolean that indicates whether the maximum homogeneity region is used as the referenced region for calibration.
syntheticMachinePath
as a boolean that indicates the simulation of machine routes, it is beneficial when the precise machine path attribute is absent and needs simulation based on timestamps or a similar attribute.
conditionAvgTotalCalibration
as an object with the Average/Total calibration focuses on redistributing attribute values. If the geospatial patterns are accurate but the absolute figures deviate from the actuals, this method proves beneficial.
calibrationAttribute
as a string representing the attribute used to be calibrated.
average
as a number representing the average values of the attribute; the attribute values should align with this average. Only one option, either average
or total
, should be utilized at a time.
total
as a number representing the total sum of the attribute values; the aggregate of these values should match the total. Only one option, either average
or total
, should be utilized at a time.
conditionMinMaxCalibration
as an object with the Conditional calibration adjusts attribute values based on provided min and max thresholds.
calibrationAttribute
as a string representing the attribute used to be calibrated.
min
as a number representing the minimum values of the attribute, serving as the lowest range for calibration.
minIncluded
as a boolean indicating whether or not to include the minimum value
max
as a number representing the maximum values of the attribute, serving as the highest range for calibration.
maxIncluded
as a boolean indicating whether or not to include the maximum value.
cleanAction
as an object that includes the configuration specifics tied to the CLEAN
operation.
conditionAutoClean
as an object that includes the configurations specific to the auto-clean algorithm.
targetAttribute
as a string representing target Yield values.
excludedAttributes
as an array of strings defining attributes that don't influence the cleaning operation.
conditionMinMaxClean
as an array of objects containing the described cleaning rules, every object includes the following parameters.
cleanAttribute as a string specifying the column name for the rule.
min
as a number indicating the minimum value.
max
as a number indicating the maximum value.
To view the inputs and access the latest available values of enumerations (such as operations
), it is recommended to utilize Altair.
As a GeoPard API consumer, you can retrieve details regarding the corrections applied to YieldDatasets through the attributes appliedCorrections
and appliedCorrectionsVersion
. The former provides a list of corrections made (e.g., CALIBRATE
and CLEAN
), with the order of execution denoted by their sequence in the array. Meanwhile, appliedCorrectionsVersion
indicates the version of the algorithm employed.