Calculating wireless transmission rates and file sizes of voltage signals
How much bandwidth and storage space do you need when receiving and recording signals to file?
When designing software that receives and records a live data signal, file size is an important criterion for choosing the most appropriate methods for transferring the files downstream. The following is a supplement to my article “A scalable design for recording biological signals with iOS and transferring data securely to the research team using Firebase and Node.js” to explain how to calculate file sizes for beginners. You can calculate the file size based on the size of the data package, sampling rate, and duration of recording.
Consider a data signal transmitted wirelessly and serially (i.e. one long string of data), in which case the number of bytes transferred is different than the characters in a text file. You need to know the format of this signal in order to parse the data. I will discuss transmission of the signal in the form of data packets.
Bits, Bytes and Binary: when 11 = 3
What is a byte? All data, including numbers like 347 or 993 are stored as bits. One bit can hold a
1 , and two bits can store two
1 to allow four possible values:
1,1. To calculate the decimal value of these switches, you have to think in 2’s place rather than 10s place. Add up the right most value, and multiply the next left value by 2 and add that together, so the actual numbers stored are:
00 = 0,
01 = 1,
10 = 2, and
11 = 3.
I am describing binary numbers, where the total number of possible values given
n number of bits is expressed as
2^n. 8 bits gives you
256 possible values (
255). 1 byte equals 8 bits. If the sensor value has a range of 0–1023, then the number requires 10 bits (
2^10) to be expressed. The Arduino sends data in byte units via Bluetooth, so the number of bytes allocated to each 10-bit sensor value is rounded to two bytes, equivalent to 16 bits which allow for
2^16 possible values.
I will use the term data package as a snapshot of data for a given point in time. I am avoiding using the term packet which usually refers specifically to the transmission unit that Bluetooth LE uses when transmitting data (size is by default 20 bytes).
Rather than have the sender generate a timestamp for each snapshot and include that in the package, I prefer to generate the timestamp on the receiving end so there is less data to transmit wirelessly.
Suppose you have a system with three hardware sensors reporting data. An incoming data package could be designed with the general format:
headerByte sensorA sensorB sensorC trailingByte
Each component here represents one or more bytes; the specific number of bytes depends on the possible range of values each sensor may report. I will call the data between the
trailingByte the payload:
sensorA sensorB sensorC
For each sensor with values 0–1023, two bytes will be sent. If one byte fails to be received somehow, the bytes received thereafter will be offset by one place, creating a frameshift just like a DNA deletion mutation. See how a frameshift is created with 3 data values:
// 250 300 340 translates to the following bytes:
// 00000000 11111010 00000001 00101100 00000001 01010100// Suppose the third byte is lost in transmission:
// 00000000 11111010 00101100 00000001 01010100 // Now parsing every 2 bytes (expecting a perfect signal)
// produces the wrong decimal values:
// 250 11265 84 // (or the app might crash at the end of the series because it expects one more byte)
To fix this, one can pass a header byte for each packet, such as
42, and a trailing byte such as
\n is a one-byte character that represents a new line (see below). Continuing with the biology analogy, these are like start and stop codons of messenger RNA. I picked these values arbitrarily, but your Arduino sketch would need to change any sensor values that are actually
10 to something negligibly close e.g.
Now anytime the parser sees a
* byte in its pair of bytes, it knows that a new packet is definitely starting. When the parser encounters the
\n byte, it knows the packet is finished. If the packet is not the expected length, it can discard the incomplete packet and carry on. It is a good idea to keep track of the packet loss rate, because you may need to improve your hardware or experimental setup.
Calculate the wireless throughput
To calculate the amount of data transferred (wirelessly in our example), multiply the package size by sample frequency:
8 bytes/package * 10 packages/second = 80 bytes per second
Bytes per second (Bps) is a more relevant unit to me, but bits per second (bps) is the common unit used to describe transfer rate. The conversion is easy; multiply by 8 bits:
80 bytes/s * 8 bits/byte = 640 bps
The wireless component sending the data must be set to transmit data at a baud or bits per second rate that is faster than the data coming through to prevent a backup.
Aside: Internet speeds
If I may digress: The subtle unit difference of bps vs Bps has persisted through for decades as an Internet service provider (ISP) advertisement trick. People are generally used to megabytes and gigabytes units, but the ISPs advertise in Megabits and Gigabits per second. In recent radio and television ads, one company only says the unit’s prefix: “100 Meg Internet!” In this case, like others, “100 Meg” is not implying 100 megabytes per second, but rather 100 megabits per second. Divide by 8 to calculate the rate in terms of the more familiar megabytes:
100 Mbps / 8b = only 12.5 Megabytes per second!
Outgoing! (File Storage)
Once each package is received and parsed, I timestamp it. Unix timestamp is a common format used; it is the number of seconds since the “Unix Epoch,” Jan 1, 1970. When enough of the data has been parsed, it can be written to a file, specifically in a comma separated value format. Note that the whole signal does not have to be received and stored in memory before the file is created; you could write to file every 50 lines or so for example.
Here is what the comma separated value text file would look like (with headers):
The classic standard for encoding text into bytes was the American Standard Code for Information Interchange (ASCII). ASCII supports 128 characters including letters, punctuation, and control characters, and each character requires one byte of space. The modern standard encoding is called UTF-8 and expands beyond the ASCII set to support letters in multiple languages as well as emojis. UTF-8 character size varies from one to four bytes, but the ASCII characters are all one byte each.
Calculating text file size
Calculating the number of bytes in a UTF-8 encoded text file is different than calculating the bytes in serial transmission. Assuming the file is composed of ASCII characters, each character requires 1 byte. So while the number
765 is sent via Bluetooth as a two-byte integer, it is a series of three bytes when encoded as UTF-8 text:
In the example above, each row (excluding the header) is 23 bytes (add up the digits, commas, and the invisible new line character
\n). So a 10 minute recording at 10 Hz would be 138,000 bytes plus 34 bytes of the header
timestamp,sensorA,sensorB,sensorCfor a total of 138,034 bytes.
For a more thorough under-the-hood explanation of bits and bytes and creating text files, see this short article “Plain text formats.”
Transferring the file elsewhere
By knowing the file size of a recording, one can decide what transmission methods are feasible. While the ten minute recording example above is 138KB, which can be transmitted easily via most methods (e.g. email, iMessage, FTP, are a few) the total file size can change drastically by increasing the amount of data per package, the sample rate, and the recording time. If the same three sensor system was recorded for 8 hours as part of an overnight sleep study for example, the file size would increase to approximately 6.6 megabytes.
23 bytes/package * 10 packages/sec * 60 sec/min * 60 min/hour * 8 hours + 34 bytes/header = 6,624,034 bytes
In such a case, email may not be the best method for transferring this larger file.
In this article, I have reviewed how information is stored as bytes, transmitted and encoded into text for transfer elsewhere. Please comment if there is something unclear!
George Marzloff, MD is a physician specializing in Rehabilitation Medicine with a subspecialty in Spinal Cord Injury Medicine. His interests include assistive technology and rehabilitation engineering.