System Architecture

System Architecture

High-Level Overview

System Architecture

Loading diagram...

Data Flow:

  1. Collect: Read sensors every 5 seconds
  2. Batch: Accumulate 180 readings (15 minutes)
  3. Write: Save as Hive-partitioned Parquet
  4. Sync: Upload to S3 automatically
  5. Analyze: Query with DuckDB from anywhere

Component Architecture

Component Architecture

Loading diagram...

Components:

  • CLI: User interface (setup, start, sync, status)
  • Collector: Reads sensors, manages batches
  • Polars: Fast columnar data processing
  • ObStore: Efficient S3 sync (Rust-based)
  • Hive Partitioning: Time-based organization

Data Flow

Sensor to Cloud Pipeline

Sensor to Cloud Pipeline

Loading diagram...

Timing:

  • Read Interval: 5 seconds
  • Batch Duration: 900 seconds (15 minutes)
  • Batch Size: ~180 readings
  • Sync Interval: 15 minutes (configurable)

Batch Processing Flow

Batch Processing Flow

Loading diagram...

Storage Structure

Your data is organized using Hive Partitioning, which makes querying efficient and cost-effective.

output/
└── station={UUID}/
    └── year={YYYY}/
        └── month={MM}/
            └── day={DD}/
                ├── data_0900.parquet  (15-min batch)
                ├── data_0915.parquet
                └── ...