Home About Us Contact Us Links
Anomaly Location of a system response deemed to warrant further investigation by the demonstrator for consideration as an emplaced munitions item.
Detection An anomaly location that is within Rhalo of an emplaced munitions item.
Military Munitions
(MM)
Specific categories of military munitions that may pose unique explosive safety risks, including UXO as defined in 10 USC 101(e)(5), DMM as defined in 10 USC 2710(e)(2), and/or munitions constituents (e.g. TNT, RDX) as defined in 10 USC 2710(e)(3) that are present in high enough concentrations to pose an explosive hazard.
Emplaced Munitions An munitions item buried by the government at a specified location in the test site.
Emplaced Clutter A clutter item (i.e., non-munitions item) buried by the government at a specified location in the test site.
Rhalo A pre-determined radius about the an emplaced item (clutter or munitions) within which an anomaly identified by the demonstrator as being of interest is considered a detection of that item. For the purpose of this program, a circular halo 0.5 meters in radius is placed around the center of the object for all clutter and munitions items less than 0.6 meters in length. When munitions items are longer than 0.6 meters, the halo becomes an ellipse where the minor axis is 1 meter and the major axis is equal to the length of the munitions plus 1 meter.
Small Munitions Caliber of munitions less than or equal to 40mm (includes 20mm projectile, 40mm projectile, submunitions BLU-26, BLU-63, and M42).
Medium Munitions Caliber of munitions greater than 40mm and less than or equal to 81mm (includes 57mm projectile, 60mm mortar, 2.75-inch rocket, and 81mm mortar).
Large Munitions Caliber of munitions greater than 81mm (includes 105mm HEAT, 105mm projectile, and 155mm projectile).
Shallow Items buried less than 0.3 meters below ground surface.
Medium Items buried greater than or equal to 0.3 meters and less than 1 meter below ground surface.
Deep Items buried greater than or equal to 1 meter below ground surface.
Response Stage
Noise Level
The level that represents the signal level below which anomalies are not considered detectable. Demonstrators are required to provide the recommended noise level for the Blind Grid Test Area.
Discrimination
Stage Threshold
The demonstrator selected threshold level that is expected to provide optimum performance of the system by retaining all detectable munitions and rejecting the maximum amount of clutter. This level defines the subset of anomalies the demonstrator would recommend digging based on discrimination.
Binomially Distributed
Random Variable
A random variable of the type which has only two possible outcomes, say success and failure, is repeated for n independent trials with the probability p of success and the probability 1-p of failure being the same for each trial. The number of successes x observed in the n trials is an estimate of p and is considered to be a binomially distributed random variable.
Response and Discrimination Stage Data
The scoring of the demonstrator's performance is conducted in two stages. These two stages are termed the RESPONSE STAGE and DISCRIMINATION STAGE. For both stages, the probability of detection (Pd) and the false alarms are reported as receiver operating characteristic (ROC) curves. False alarms are divided into those anomalies that correspond to emplaced clutter items, measuring the probability of clutter detection (Pcd) or probability of false positive (Pfp) and those that do not correspond to any known item, termed background alarms.

The RESPONSE STAGE is a measure of whether the sensor can detect an object of interest. For a channel instrument, this value should be closely related to the amplitude of the signal. The demonstrator must report the response level (threshold) below which target responses are deemed insufficient to warrant further investigation. At this stage, minimal processing can be performed. This includes filtering long and short scale variations, bias removal, and scaling. This processing should be detailed in the data submission.

For a multi-channel instrument, the demonstrator must construct a quantity analogous to amplitude. The demonstrator should consider what combination of channels provides the best test for detecting any object that the sensor can detect. The average amplitude across a set of channels is an example of an acceptable Response Stage quantity. Other methods may be more appropriate for a given sensor. Again, minimal processing can be performed and the demonstrator should explain how this quantity was constructed in their data submission.

The DISCRIMINATION STAGE evaluates the demonstrator's ability to correctly identify munitions as such and to reject clutter. For the same locations as in the RESPONSE STAGE anomaly list, the DISCRIMINATION STAGE list contains the output of the algorithms applied in the discrimination-stage processing. This list is prioritized based on the demonstrator's determination that an anomaly location is likely to contain munitions. Thus, higher output values are indicative of higher confidence that an munitions item is present at the specified location. For electronic signal processing, priority ranking is based on algorithm output. For other systems, priority ranking is based on human judgment. The demonstrator also selects the threshold that the demonstrator believes will provide "optimum" system performance (i.e., that retains all the detected munitions and rejects the maximum amount of clutter).

Note: The two lists provided by the demonstrator contain identical numbers of potential target locations. They differ only in the priority ranking of the declarations.

Group Scoring Factors
Based on configuration of the GT at the standardized sites and the defined scoring methodology, there exists munitions groups defined as having overlapping halos. In these cases, the following scoring logic is implemented (see figs. A-1 through A-9):
  1. Overall site scores (i.e., Pd) will consider only isolated munitions and clutter items.
  2. GT items that have overlapping halos (both munitions and clutter) will form a group and groups may form chains.
  3. Groups will have a complex halos composed of all the composite halos of all its GT items.
  4. Groups will have three scoring factors: Groups Found; Groups Identified; and Group Coverage. Scores will be based on 1:1 matches of anomalies and GT.
    1. Groups Found (Found): the number of Groups that have one or more GT items matched divided by the total number of Groups. Demonstrators will be credited with detecting a group if any item within the group is matched to an anomaly in their list.
    2. Groups Identified (ID): the number of Groups that have two or more GT items matched divided by the total number of Groups. Demonstrators will be credited with identifying that a group is present if multiple items within the composite halo are matched to anomalies in their list.
    3. Group Coverage (Coverage): the number of GT items matched within Groups divided by the total number of GT items within Groups. This metric measures the demonstrator accuracy in determining the number of anomalies within a group. If five items are present and only two anomalies are matched, the demonstrator will score 0.4. If all five are matched the demonstrator will score 1.0.
  5. Location error will NOT be reported for Groups.
  6. Demonstrators will not be asked to call out groups in their scoring submissions. If multiple anomalies are indicated in a small area, the demonstrator will report all individual anomalies.
  7. Excess alarms within a halo will be disregarded.
    image: A-1. Example of detected item.
    A-1. Example of detected item.
    image: A-2. Example of group found (found).
    A-2. Example of group found (found).
    image: A-3. Example of group identified (ID).
    A-3. Example of group identified (ID).
    image: A-4. Example of excess alarms disregarded.
    A-4. Example of excess alarms disregarded.
    image: A-5. Example of a group.
    A-5. Example of a group.
    image: A-6. Example of group (1/4=0.25).
    A-6. Example of group (1/4=0.25).
    image: A-7. Example of group (2/4=0.5).
    A-7. Example of group (2/4=0.5).
    image: A-8. Example of group (3/4=0.75).
    A-8. Example of group (3/4=0.75).
    image: A-9. Example of group (4/4=1.0).
    A-9. Example of group (4/4=1.0).
Response Stage Definitions
Response Stage Probability
of Detection (Pdres)
Pdres = (No. of response-stage detections)/(No.of emplaced munitions in the test site).
Response Stage
Clutter Detection(cdres)
An anomaly location that is within Rhalo of an emplaced clutter item.
Response Stage Probability
of Clutter Detection (Pcdres)
Pcdres = (No. of response-stage clutter detections)/(No. of emplaced clutter items).
Response Stage
Background Alarm (bares)
An anomaly in a blind grid cell that contains neither emplaced munitions nor an emplaced clutter item. An anomaly location in the open field or scenarios that is outside Rhalo of any emplaced munitions or emplaced clutter item.
Response Stage Probability
of Background Alarm (Pbares)
Blind Grid only: Pbares = (No. of response-stage background alarms)/(No. of empty grid locations).
Response Stage Background Alarm Rate (BARres) Open Field, and any Challenge Areas (including the direct and indirect firing sub areas) only: BARres = (No. of response-stage background alarms)/(arbitrary constant).
Note that the quantities Pdres, Pcdres, Pbares, and BARres are functions of tres, the threshold applied to the response-stage signal strength. These quantities can therefore be written as Pdres(tres), Pcdres(tres), Pbares(tres), and BARres (tres).
Discrimination Stage Definitions
Discrimination The application of a signal processing algorithm or human judgment to sensor data that discriminates munitions from clutter. Discrimination should identify anomalies that the demonstrator has high confidence correspond to munitions, as well as those that the demonstrator has high confidence correspond to non-munitions or background returns. The former should be ranked with highest priority and the latter with lowest.
Discrimination Stage
Probability of Detection (Pddisc)
Pddisc = (No. of discrimination-stage detections)/(No. of emplaced munitions in the test site).
Discrimination Stage
False Positive (fpdisc)
An anomaly location that is within Rhalo of an emplaced clutter item.
Discrimination Stage Probability
of False Positive (Pfpdisc)
Pfpdisc = (No. of discrimination stage false positives)/(No. of emplaced clutter items).
Discrimination Stage
Background Alarm (badisc)
An anomaly in a blind grid cell that contains neither emplaced munitions nor an emplaced clutter item. An anomaly location in the open field or scenarios that is outside Rhalo of any emplaced munitions or emplaced clutter item.
Discrimination Stage Probability
of Background Alarm
(Pbadisc)
Pbadisc = (No. of discrimination stage background alarms)/(No. of empty grid locations).
Discrimination Stage
Background Alarm Rate (BARdisc)
BARdisc = (No. of discrimination-stage background alarms)/(arbitrary constant)
Note that the quantities Pddisc, Pfpdisc, Pbadisc, and BARdisc are functions of tdisc, the threshold applied to the discrimination-stage signal strength. These quantities can therefore be written as Pddisc (tdisc), Pfpdisc(tdisc), Pbadisc(tdisc), and BARdisc (tdisc).
Receiver-Operating Characteristic (ROC) Curves
ROC curves at both the response and discrimination stages can be constructed based on the above definitions. The ROC curves plot the relationship between Pd vs. Pcd or Pfp and Pd vs. BAR or Pba as the threshold applied to the signal strength is varied from its minimum (tmin) to its maximum (tmax) value.1 Figure 1 shows how Pd vs. Pcd and Pd vs. BAR are combined into ROC curves. Note that the "res" and "disc" superscripts have been suppressed from all the variables for clarity.

uxo graph
Figure A-1. ROC curves for open-field testing. Each curve applies to both the response and discrimination stages.

1Strictly speaking, ROC curves plot the Pd vs. Pba over a pre-determined and fixed number of detection opportunities (some of the opportunities are located over munitions and others are located over clutter or blank spots). In an open field scenario, each system suppresses its signal strength reports until some bare-minimum signal response is received by the system. Consequently, the open field ROC curves do not have information from low signal-output locations, and, furthermore, different contractors report their signals over a different set of locations on the ground. These ROC curves are thus not true to the strict definition of ROC curves as defined in textbooks on detection theory. Note, however, that the ROC curves obtained in the Blind Grid test sites are true ROC curves.
Metrics to Characterize the Discrimination Stage
The demonstrator is also scored on efficiency and rejection ratio, which measure the effectiveness of the discrimination stage processing. The goal of discrimination is to retain the greatest number of munitions detections from the anomaly list, while rejecting the maximum number of anomalies arising from non-munitions items. The efficiency measures the fraction of detected munitions retained by the discrimination, while the rejection ratio measures the fraction of false alarms rejected. Both measures are defined relative to the entire response list, i.e., the maximum munitions detectable by the sensor and its accompanying clutter detection/false positive rate or background alarm rate.
Efficiency (E) E = Pddisc(tdisc)/Pdres(tminres); Measures (at a threshold of interest), the degree to which the maximum theoretical detection performance of the sensor system (as determined by the response stage tmin) is preserved after application of discrimination techniques. Efficiency is a number between 0 and 1. An efficiency of 1 implies that all of the munitions initially detected in the response stage was retained at the specified threshold in the discrimination stage, tdisc.
False Positive
Rejection Rate (Rfp)
Rfp = 1 - [Pfpdisc(tdisc)/Pcdres(tminres)]; Measures (at a threshold of interest) the degree to which the sensor system's false positive performance is improved over the maximum false positive performance (as determined by the response stage tmin). The rejection rate is a number between 0 and 1. A rejection rate of 1 implies that all emplaced clutter initially detected in the response stage were correctly rejected at the specified threshold in the discrimination stage.
Background Alarm
Rejection Rate (Rba)
BLIND GRID
OPEN FIELD
Rba = 1 - [Pbadisc(tdisc)/Pbares(tminres)]
Rba = 1 - [BARdisc(tdisc)/BARres(tminres)]).
Measures the degree to which the discrimination stage correctly rejects background alarms initially detected in the response stage. The rejection rate is a number between 0 and 1. A rejection rate of 1 implies that all background alarms initially detected in the response stage were rejected at the specified threshold in the discrimination stage.
Chi-square Comparison

The Chi-square test for differences in probabilities (or 2 x 2 contingency table) is used to analyze two samples drawn from two different populations to see if both populations have the same or different proportions of elements in a certain category. More specifically, two random samples are drawn, one from each population, to test the null hypothesis that the probability of event A (some specified event) is the same for both populations (ref 3).

The test statistic of the 2 x 2 contingency table is the Chi-square distribution with one degree of freedom. When an association between a more challenging terrain feature and relatively degraded performance is sought, a one-sided test is performed. A two-sided 2 x 2 contingency table is used in the Standardized UXO Technology Demonstration Site Program to compare performance between any two areas or sub-areas when the direction of degradation cannot be predetermined.

For a one-sided test, a significance level of 0.05 is used to set the critical decision limit. It is a critical decision limit because if the test statistic calculated from the data exceeds this value, the lower proportion tested will be considered significantly less than the greater one (degraded). If the test statistic calculated from the data is less than this value, than no degradation can be said to exist due to the terrain feature introduced.

For a two-sided test, a significance level of 0.10 is used to allow .05 on either side of the decision. It is a critical decision limit because if the test statistic calculated from the data exceeds this value, the two proportions tested will be considered significantly different. If the test statistic calculated from the data is less than this value, the two proportions tested will be considered not significantly different.

An exception must be applied when either a 0 or 100 percent success rate occurs in the sample data. The Chi-square test cannot be used in these instances. Instead, Fischer's test is used and the critical decision limit for one-sided tests is the chosen significance level, which in this case is 0.05. With Fischer's test, if the test statistic is less than the critical value, the proportions are considered to be significantly different.

An example follows that illustrates Standardized UXO Technology Demonstration Site blind grid results compared to those from the open field legacy. It should be noted that a significant result does not prove a cause and effect relationship exists between the two populations of interest; however, it does serve as a tool to indicate that one data set has experienced a degradation or change in system performance at a large enough level than can be accounted for merely by chance or random variation. Note also that a result that is not significant indicates that there is not enough evidence to declare that anything more than chance or random variation within the same population is at work between the two data sets being compared.

Example: Demonstrator X achieves the following overall results after surveying the blind grid and open field (legacy) using the same system (results indicate the number of munitions detected divided by the number of munitions emplaced):
Blind grid Open field
Pdres 100/100 = 1.0 8/10 = .80

Pdres: BLIND GRID versus OPEN FIELD (legacy). Using the example data above to compare probabilities of detection in the response stage, all 100 munitions out of 100 emplaced munitions items were detected in the blind grid while 8 munitions out of 10 emplaced were detected in the open field. Fischer's test must be used since a 100 percent success rate occurs in the data. Fischer's test uses the four input values to calculate a test statistic of 0.0075 that is compared against the critical value of 0.05. Since the test statistic is less than the critical value, the smaller response stage detection rate (0.80) is considered to be significantly less at the 0.05 level of significance. While a significant result does not prove a cause and effect relationship exists between the change in survey area and degradation in performance, it does indicate that the detection ability of demonstrator X's system seems to have been degraded in the open field relative to results from the blind grid using the same system. This is an example of a one-sided Chi-squared test.

Standardized UXO Technology Demonstration Site
Standardized UXO Technology
Demonstration Site

For more information, please contact the
Army Environmental Hotline
E-mail: Environmental Hotline
Phone: 800-USA-3845 (800-327-3845)

Last modified on
Problems? Suggestions? Administrative Notice