Buses and Multitasking: UART, I²C, SPI and the RTOS

Why chips need to talk — and why two wires beat fifty

Open a smartwatch and you will not find one chip but a small town of them: a microcontroller in the centre, a heart-rate sensor here, a tiny pressure sensor for altitude there, a flash memory holding your music, a screen driver, a Bluetooth radio. They must gossip constantly — "new heartbeat", "draw this pixel", "battery at 41 %" — yet the board is the size of a postage stamp. You cannot run sixteen parallel wires between every pair of chips; there is no room, and every extra pin costs money and solder. The answer, almost universally, is to send the bits one after another down a single wire: serial communication.

The catch with serial is timing. If I send you the byte 0110 1001 as a stream of highs and lows, how do you know where one bit ends and the next begins? Two strategies exist, and they split the entire world of buses in half. Synchronous buses ship a separate clock wire alongside the data: every tick of the clock says "read the data line *now*", so sender and receiver march in perfect lockstep. Asynchronous buses send no clock at all; the two sides agree in advance on a speed and trust their own internal timers to stay aligned for the length of one short message. That single choice — extra clock wire, or no clock wire — is the deepest fork in the road, and the three buses below sit on different sides of it.

UART — two wires, a handshake of trust, no clock

The UART is the oldest and simplest link on the board, and the one you will use to print debug messages for the rest of your career. It needs just two signal wires that cross over: my TX (transmit) goes to your RX (receive), and vice versa — plus a shared ground. There is no clock wire. Instead, both sides agree beforehand on a baud rate — the number of bit-slots per second, classically 9600, 115200, and so on. When the line is idle it sits high. To send a byte, the transmitter pulls it low for one bit-time (the start bit), which is the receiver's cue to begin counting; it then clocks out the 8 data bits at the agreed pace and finishes with a high stop bit.

UART frame, 8N1, sending byte 0x69 = 0110 1001  (LSB first)

        start  D0 D1 D2 D3 D4 D5 D6 D7  stop
idle ___       __    __ __       __    ___ idle
        |     |  |  |     |     |  |  |
        |__ __|  |__|     |_____|  |__|
         0   1  0  0  1  0  1  1  0   1
          \__/
        receiver sees the falling edge here,
        then samples each bit in the MIDDLE of its slot,
        timed by its OWN internal clock at the agreed baud.

  No shared clock wire — only a promise about speed.

One UART character: a start bit triggers the receiver, which then samples each data bit at the centre of its time-slot using its own timer.

Because each side keeps time on its own oscillator, the two clocks must not drift more than about half a bit apart across the whole frame — roughly 10 bits — which means they must agree to within a few percent. That is why UART tolerates only short, framed bursts and a fixed pre-agreed speed; set one side to 9600 and the other to 115200 and you get fluent garbage. The payoff is beautiful simplicity and the fact that it is full-duplex: TX and RX are separate wires, so both sides can talk at the same time.

I²C — one party line, addresses instead of new wires

Suppose your watch has a dozen slow sensors and you refuse to spend two dozen pins on them. I²C, invented by Philips in 1982, solves this with the elegance of an old rural party-line telephone: every household shares one pair of wires, and you reach a specific home by announcing its number first. I²C uses just two wires for the entire bus — SDA (serial data) and SCL (serial clock) — no matter how many chips hang off it. A single controller drives the clock; every peripheral chip is burned at the factory with a unique 7-bit address (giving up to 128 possible devices).

A transaction reads like a polite phone call. The controller issues a START (it yanks SDA low while SCL is still high — a deliberately illegal pattern during normal data, so everyone recognises it as "attention"). It then clocks out the 7-bit address plus one read/write bit. The one chip whose address matches replies by holding SDA low for a single clock — an ACK, "I'm here" — and all the other chips, hearing a number that isn't theirs, fall silent. Data bytes flow, each acknowledged, until a STOP condition hangs up the line. Because many chips share one SDA line, the wires are open-drain: any device can only pull the line *low*, and resistors pull it back high, so two chips talking at once collide safely instead of shorting.

I2C: ONE bus, MANY devices, sorted by address

      +3V3
       |   |       (pull-up resistors, ~4.7k)
      [R] [R]
       |   |
  SDA  o---+---+-------+-------+-----  (data)
       |   |   |       |       |
  SCL  o---|---+---+---+---+---+-----  (clock)
       |   |   |   |   |   |   |
     [MCU] [Temp] [Accel] [RTC] ...
           0x48   0x68     0x51   <- unique addresses

  Add a 4th sensor? Tap the SAME two wires.
  Give it an address nobody else has. Done.

I²C hangs every device on the same SDA/SCL pair; adding a chip costs no new pins, only a free address.

SPI — four wires, a clock, and raw speed

When you must move *fast* — refresh a colour display, stream from an SD card, slurp samples off a high-speed firmware-driven ADC — neither UART's clockless guessing nor I²C's polite addressing keeps up. Enter SPI, the speed demon. SPI abandons addresses and brings back a real clock wire, then adds a brilliant idea: a shift register on each side, joined into one big loop. Imagine two people each holding a column of beads on a string; every clock tick, each pushes one bead into the other's hand. After eight ticks they have completely swapped their bytes. That is SPI: data goes out on MOSI (controller→peripheral) and comes back on MISO (peripheral→controller) *on the very same clock edges*, so it is naturally full-duplex.

How does SPI address many chips without addresses? With brute-force wiring. Every peripheral shares the same three lines — SCLK (clock), MOSI, MISO — but each gets its own private chip-select line (CS or SS), pulled low only when it is that chip's turn. CS low means "this message is for *you*; everyone else, hold still and ignore the clock". The controller drives the clock itself and can crank it brutally fast — tens of megahertz routinely, 50–100 MHz on good boards — because there's a real clock wire so no one has to guess timing.

SPI: shared clock + data, ONE chip-select per device

            SCLK  >----+-------+-------+----  clock
            MOSI  >----+-------+-------+----  out
            MISO  <----+-------+-------+----  in
  [ MCU ]   CS0   >---->|       |       |     (display)
            CS1   >------------>|       |     (flash)
            CS2   >-------------------->|     (ADC)

  Pull ONE CS low -> that chip listens, others ignore clock.
  Cost: every extra device burns one more CS pin.

  One 8-bit exchange (full-duplex):
  SCLK  _|‾|_|‾|_|‾|_|‾|_|‾|_|‾|_|‾|_|‾|_
  MOSI   D7  D6  D5  D4  D3  D2  D1  D0    (we send)
  MISO   d7  d6  d5  d4  d3  d2  d1  d0    (we receive, same edges)

SPI shares clock and data lines but gives each peripheral its own chip-select; one byte out and one byte in ride the same clock edges.

From the bare-metal loop to many tasks at once

In rung 4 you built firmware the bare-metal way: a single `while(1)` super-loop that does everything in turn, with interrupts firing to handle urgent events. For one or two jobs this is perfect — small, predictable, no overhead. But watch it buckle. Imagine a loop that must (a) read a sensor over I²C every 10 ms, (b) refresh a display over SPI every 30 ms, (c) blink a status LED, and (d) parse commands arriving on UART. The moment the display refresh takes 12 ms, your 10 ms sensor read is already late. You start sprinkling timers and flags and half-finished state machines, and soon the loop is a snarl of `if (timeToDoX)` clauses that nobody, including you, can reason about.

A real-time operating system — RTOS — is the cure. The key idea is the task: you write each job as its own little self-contained program, an independent `while(1)` loop that *thinks it owns the whole CPU*. The sensor task loops forever reading the sensor; the display task loops forever drawing; the UART task loops forever parsing. None of them knows the others exist. A piece of the RTOS called the scheduler then plays a sleight of hand: it freezes one task mid-stride — saving its registers and stack — and thaws another, switching between them so fast (often every 1 ms) that on a single core they all *appear* to run at once. This trick is the context switch, and it is the heart of multitasking.

Priorities, deadlines, and the meaning of "real-time"

What makes it *real-time* rather than just *multitasking* is priority plus the promise of meeting deadlines. On a phone running Linux, "multitasking" means everything gets a fair slice and eventually finishes — fine for a web browser, fatal for an airbag. An RTOS instead lets you stamp each task with a priority number, and its scheduler is ruthlessly pre-emptive: at every instant it runs the *highest-priority task that is ready*, and if a more important task suddenly becomes ready, it pre-empts — yanks the CPU away from whatever lower task was running, mid-line, right now. The motor-safety task outranks the logging task, always, so a late log can never delay a motor shutdown.

Define tasks. Carve the firmware into independent jobs — `SensorTask`, `DisplayTask`, `CommsTask` — each an infinite loop, and hand each a priority and a stack.
Block, don't busy-wait. A task waiting for data calls something like `vTaskDelay()` or waits on a queue/semaphore; the scheduler then runs *other* tasks instead of spinning uselessly on the CPU.
Let the scheduler pick. On every tick — and on every interrupt — it picks the highest-priority ready task and switches to it, pre-empting anything lower without asking.
Communicate safely. Tasks pass data through RTOS-provided queues, and guard any shared resource with a mutex, so two tasks never corrupt the same variable mid-update.