Last semester I took the Computer and Network Security course in my degree. The final project was this group thing where we had to build a secure ad-hoc network over Bluetooth, in a tree topology, that routes messages between nodes through a central sink, with proper crypto all the way down. Ended up being a ton of fun to work on, mostly because it touched so many different things at once: wireless transport, ad-hoc routing, PKI, key exchange, message integrity, end-to-end encryption, and a GUI to watch it all happen. Writing about it because there’s more than enough interesting pieces in there to fill a post.

The rough idea

The setup is basically a tree. There’s one Sink at the root and a bunch of Nodes hanging off it in layers. Each node forwards messages upward to the Sink, the Sink routes them down to wherever they need to go, and end-to-end everything is encrypted so intermediate nodes can’t actually read the payloads they’re carrying. The whole thing runs over Bluetooth Low Energy, using the Linux BlueZ stack underneath, and the nodes form the tree automatically by picking the best parent they can find.

The interesting bit for a security course is the layered protection: everything between two nodes is mutually authenticated with certificates, every hop is HMAC-protected against tampering, and end-to-end traffic gets its own AES-256-GCM layer so even intermediate nodes on the path only see routing headers, never content. Kind of like DTLS, but cobbled together over BLE GATT characteristics.

Why BLE and why a tree

BLE mostly because the spec asked for it, but also because it’s actually pretty convenient: discovery and pairing primitives are built in, you get small broadcasts for advertising, and GATT gives you a clean request/response channel once connected. Tree topology because it’s about the simplest routing structure that still lets arbitrary nodes talk through a central authority (the Sink), and because it limits how much state any one node has to carry.

Nodes pick a parent by scanning around, reading the hop count each candidate advertises, and going with the lowest one they can find. So if a node hears a neighbor that’s 2 hops from the Sink, it’ll prefer that one over a neighbor that’s 4 hops away, and its own hop count becomes 3. Enough to get a working tree without any central coordinator.

Tree topology with Sink at hop=0 and nodes fanning out by hop count The tree forms on its own as nodes pick the lowest-hop parent they can see.

Dual-role BLE (aka “a node is both a client and a server”)

Each node has to accept connections from children below it while also maintaining its own uplink to its parent, which means it needs to be a GATT server and a GATT client at the same time. Python doesn’t really have one library that does both cleanly, so we mixed two:

  • bleak for the client side (uplink towards the parent)
  • dbus-python + GLib for the server side (downlink from children)

The GUI runs on top of that in Tkinter, so at runtime each node has three concurrent contexts going: GLib for D-Bus/BlueZ callbacks, asyncio for the Bleak client, and Tkinter for the UI. Keeping those three cooperating takes a bit of care, especially around anything touching shared state.

Dual-role node: asyncio/bleak uplink, GLib/dbus-python downlink, Tkinter GUI, all coordinating through shared state One process, three concurrent contexts, one block of shared state holding them together.

The security side

This is the part the course was actually about.

Provisioning and PKI

Before anything runs, every device gets provisioned with its own credentials:

  • A P-521 ECC keypair
  • An X.509 certificate signed by a shared CA
  • A copy of the CA cert so it can verify others

The device identity (a 128-bit NID) is embedded into the Subject Alternative Name of its certificate, so you can’t just present any valid cert, the NID in the cert has to match the one the device is advertising over BLE.

Mutual auth on pairing

When two nodes pair, both read each other’s certificate off a known GATT characteristic, verify the signature against the CA public key, and check that the NID in the cert matches the one being advertised. If either check fails, the pairing just doesn’t happen. Pretty standard mutual auth, but wiring it into GATT read/write flows in both directions is its own little adventure.

Session keys via ephemeral ECDH

Once both sides trust each other, they do an ephemeral ECDH handshake. Each generates a fresh P-521 keypair, they exchange public halves, compute the shared secret, and run it through HKDF-SHA256 to derive a session key. Because the keypairs are ephemeral, every connection has its own session key, so past traffic stays safe even if a long-term key leaks later. That’s basically Perfect Forward Secrecy, at least for the link layer.

Sequence diagram of mutual authentication followed by the ephemeral ECDH handshake Certs first, then ephemeral key exchange. The session key is derived locally on both sides and never actually travels on the wire.

Hop-by-hop integrity

Every message on the wire gets prepended with a sequence counter and an HMAC-SHA256 computed with the session key, then the routing headers, then the payload. The receiver verifies the HMAC, checks the sequence counter is strictly greater than the last one it saw from that sender (that’s the replay protection), and only then processes the message. This layer is always on, including for end-to-end encrypted traffic.

End-to-end, DTLS-style

Routing headers have to be readable by intermediate nodes (otherwise the Sink can’t route anything), but the payload shouldn’t be. So we added a second layer: the Node and Sink do their own mini-handshake (ClientHello / ServerHello with nonces), derive a separate end-to-end key via HKDF, and encrypt the payload with AES-256-GCM before handing it off. Intermediate hops see the headers, forward the encrypted blob, and that’s the extent of what they can read.

Message layout: seq + HMAC (hop-by-hop), routing headers (plain), AES-GCM payload (E2E) Outer layer is re-HMACed at every hop; the inner AES-GCM payload is only readable by the Node and the Sink.

Heartbeats and recovery

The Sink signs a heartbeat every 5 seconds with an incrementing counter. Nodes verify the signature, update their state, and forward it down to their children. If a node misses 3 heartbeats in a row, it assumes its uplink is dead, sets its own hop count to -1, broadcasts that to its children, who then recursively disconnect their own subtrees and go back into scanning mode to find a new parent. That cascading disconnect was fiddly to get right but is probably the most satisfying piece to watch in the GUI.

Cascading disconnect: 3 missed heartbeats trigger a hop=-1 broadcast that propagates down the subtree A single broken uplink takes the whole subtree back to scanning mode, layer by layer.

What I took from it

A few things. Writing crypto glue code (as opposed to crypto primitives, which the cryptography library handles for you) is where the subtle bugs hide: off-by-one on the sequence counter, forgetting to HKDF the shared secret, passing the wrong nonce into GCM, stuff like that. Second, designing a protocol that works across unreliable wireless links forces you to take state recovery seriously in a way purely wired setups don’t.

And third, honestly the single biggest challenge of the whole project was just working with the Bluetooth stack on Linux, because BLE and BlueZ have their fair share of quirks that you only really find out about by running into them. The D-Bus API surface changes subtly between BlueZ versions, GATT MTU negotiation doesn’t always give you the size you asked for (so big writes get silently fragmented or rejected, don’t even ask about the headache this was), pairing state sometimes lingers across reboots in ways that make you question reality, and switching roles between central and peripheral on the same adapter can leave hci0 in weird states that only hciconfig reset fixes. The error messages are also pretty cryptic, so you end up spending a lot of time reading dbus-monitor output and cross-referencing source comments in BlueZ itself to figure out what’s actually going on

If you want to poke at the code, it’s on GitHub at tfdmendes/SIC-final-project. Happy to answer questions if anyone else is running into the same BLE-on-Linux potholes 😵‍💫

Anyway, turns out teaching a bunch of BLE nodes to trust each other is mostly an exercise in not getting state wrong :)