SEMIF Database

Semi-Automated Field Database - Optimized for machine learning training with precise bounding boxes and segmentation masks.

Table of contents

  1. Overview
  2. Key Features
  3. Schema Documentation
    1. 1. Temporal & Batch Information
    2. 2. Image Metadata
    3. 3. File Paths & Storage
    4. 4. Annotation & Detection
    5. 5. Weed Classification
    6. 6. Spatial Information
    7. 7. Taxonomic Classification
    8. 8. Plant Characteristics
    9. 9. Reference & Visualization
    10. 10. Cutout Image Characteristics
  4. External Integrations
  5. Data Quality Indicators
  6. Accessing the Data
  7. Version History
  8. Next Steps

Overview

Database Name: SEMIF (Semi-Automated Field Database)

Purpose: Stores processed image cutouts/crops of individual plants with detailed bounding box annotations, taxonomy, and image characteristics. Optimized for machine learning training and object detection applications.

62 Attributes
50K+ Cutouts
150+ Species
98% Quality


Key Features

What Makes SEMIF Special: Comprehensive bounding box management, quality metrics, dual path storage, version control, and multiple area estimation methods.

  • Bounding Box Management: Comprehensive coordinate tracking and overlap detection
  • Quality Metrics: Blur detection, component counting, and RGB statistics
  • Dual Path Storage: Both cropout_path and cutout_path for redundancy
  • Version Control: Tracked through bbot_version, version, and batch_id fields
  • Area Estimation: Multiple area metrics (pixel, measured cmΒ², estimated cmΒ²) with binning

Schema Documentation

1. Temporal & Batch Information

Track when and how data was processed.

Field Type Description
season String Growing season identifier
datetime String Timestamp of image capture
bbot_version String Version of the bounding box annotation tool used
batch_id String Identifier for the processing batch
version Float Dataset or annotation version number

2. Image Metadata

Original source image information and camera settings.

Field Type Description
image_id String Unique identifier for the source image
fullres_height Integer Height of the full-resolution source image in pixels
fullres_width Integer Width of the full-resolution source image in pixels
exif_meta String EXIF metadata from the original image
camera_info String Camera model and settings information
lens_model String Camera lens model used for capture

3. File Paths & Storage

Locations of images, masks, and metadata files.

Field Type Description
ncsu_nfs String NCSU network file system location
image_path String Path to the full source image
mask_path String Path to the segmentation mask file
json_path String Path to associated JSON metadata
cutout_ncsu_nfs String NCSU NFS location for cutout images
cropout_path String Path to the cropped/cutout image
cutout_path String Alternate path to cutout image
cutout_mask_path String Path to the cutout’s segmentation mask
cutout_json_path String Path to cutout’s JSON metadata

Storage Redundancy: Both cropout_path and cutout_path are provided for system reliability.


4. Annotation & Detection

Bounding boxes, masks, and detection metadata.

Field Type Description
has_masks Integer Boolean flag indicating presence of segmentation masks (0/1)
is_primary Integer Boolean flag marking primary annotation in overlapping cases
cutout_exists Float Boolean flag indicating if cutout file exists
cutout_id String Unique identifier for this cutout
bbox_xywh String Bounding box coordinates in [x, y, width, height] format
category_class_id Integer Numeric class identifier for the category
overlapping_cutout_ids String IDs of other cutouts that overlap with this one

Example bbox_xywh format:

[245, 180, 120, 95] # [x_top_left, y_top_left, width, height]

5. Weed Classification

Non-target weed detection and confidence scoring.

Field Type Description
non_target_weed Float Boolean flag indicating if this is a non-target weed species
non_target_weed_pred_conf Float Confidence score for non-target weed prediction (0-1)

Quality Filtering: Use non_target_weed_pred_conf to filter predictions by confidence threshold.


6. Spatial Information

Geographic location and area measurements.

Field Type Description
local_coordinates String Coordinates within the image frame
global_coordinates String GPS or field-level coordinates
pixel_area Float Area in pixels
bbox_area_cm2 Float Measured bounding box area in square centimeters
estimated_bbox_area_cm2 Float Estimated bounding box area in square centimeters
estimated_area_bin String Categorical size bin for the estimated area
state String US state where image was captured

Area bins typically include: small, medium, large, extra_large


7. Taxonomic Classification

Complete taxonomic hierarchy from kingdom to species.

Field Type Description
category_usda_symbol String USDA PLANTS database symbol code
category_eppo_code String European and Mediterranean Plant Protection Organization code
category_group String High-level taxonomic or functional group
category_class String Taxonomic class
category_subclass String Taxonomic subclass
category_order String Taxonomic order
category_family String Taxonomic family
category_genus String Taxonomic genus
category_species String Taxonomic species name
category_common_name String Common name of the plant
category_authority String Taxonomic authority citation
category_multispecies String Flag or notes for multi-species annotations

Example taxonomy:

Common Name: Barley
USDA Symbol: HOVU
Family: Poaceae
Genus: Hordeum
Species: vulgare

8. Plant Characteristics

Growth and life cycle information.

Field Type Description
category_growth_habit String Growth habit (e.g., forb, grass, shrub, vine)
category_duration String Life cycle duration (annual, biennial, perennial)

Growth habits:

  • forb - Herbaceous flowering plant
  • grass - Graminoid
  • shrub - Woody plant
  • vine - Climbing/trailing plant

9. Reference & Visualization

Links to external databases and display properties.

Field Type Description
category_usda_link String URL to USDA PLANTS database entry
category_taxonomic_notes String Additional taxonomic notes or clarifications
category_hex String Hexadecimal color code for visualization
category_rgb String RGB color values for visualization
category_alias String Alternative or simplified category name

Example color values:

Hex: #4CAF50
RGB: (76, 175, 80)

10. Cutout Image Characteristics

Technical properties of the cropped image.

Field Type Description
cutout_height Float Height of the cutout image in pixels
cutout_width Float Width of the cutout image in pixels
blur_effect Float Quantitative measure of image blur
num_components Float Number of connected components in the segmentation
cropout_rgb_mean String Mean RGB values of the cutout image
cropout_rgb_std String Standard deviation of RGB values
extends_border Float Boolean flag indicating if cutout extends to image border

Quality Filtering: Use blur_effect < 50 for high-quality images. Filter by num_components to find clean single-plant images.

Example quality thresholds:

# High quality images
blur_effect < 50

# Single plant detection
num_components <= 2

# Not extending to border
extends_border == 0

External Integrations

USDA PLANTS Database
Access via category_usda_symbol and category_usda_link

Visit USDA PLANTS β†’

EPPO Database
Access via category_eppo_code

Visit EPPO β†’

NCSU Network File System
Primary storage infrastructure

Fields: ncsu_nfs, cutout_ncsu_nfs


Data Quality Indicators

Use these fields to filter for high-quality data:

Indicator Field Recommended Value
Mask Availability has_masks 1 (masks exist)
File Existence cutout_exists 1 (file exists)
Primary Annotation is_primary 1 (primary in overlaps)
Weed Confidence non_target_weed_pred_conf > 0.8 (high confidence)
Image Quality blur_effect < 50 (sharp images)

Accessing the Data

Query this database using the AgIR-CVToolkit. The toolkit provides powerful filtering, sampling, and export capabilities.

Learn How to Query SEMIF β†’ View Complete Query Guide β†’


Version History

The SEMIF database includes version tracking:

  • bbot_version: Annotation tool version
  • version: Dataset version number
  • batch_id: Processing batch identifier

This enables reproducibility and tracking of data processing changes over time.


Next Steps

πŸ“Š View Statistics
Explore species distribution and data characteristics

Statistics β†’

πŸ–ΌοΈ See Examples
Browse sample images and annotations

Gallery β†’

πŸ”§ Access the Data
Learn how to query with AgIR-CVToolkit

Query Documentation β†’

πŸ“š Compare with FIELD
See the Field observation database

FIELD Database β†’