Skip to main content

Command Palette

Search for a command to run...

Parsing Bitcoin Transactions in Rust Programming Language Standard Library

Updated
12 min read

A series of articles on bitcoin transactions — Part 1

Bitcoin transactions are used to transfer value or inscribe data onchain in an immutable way. In this series of articles we will discuss how to encode and decode a raw hex transaction, contructing a Bitcoin transaction and scripts.

Part 1. Hex Encoded Bitcoin Transactions

Bitcoin transactions can be encoded in raw hexadecimal format. The hexadecimal format is base16 which is represented by characters 0–9 and A-Fwhich create the hex alphabet 0123456789ABCDEF. Each character represents two bytes of data.

Let’s look at how to decode the bitcoin transaction
010000000269adb42422fb021f38da0ebe12a8d2a14c0fe484bcb0b7cb365841871f2d5e24000000006a4730440220199a6aa56306cebcdacd1eba26b55eaf6f92eb46eb90d1b7e7724bacbe1d19140220101c0d46e033361c60536b6989efdd6fa692265fcda164676e2f49885871038a0121039ac8bac8f6d916b8a85b458e087e0cd07e6a76a6bfdde9bb766b17086d9a5c8affffffff69adb42422fb021f38da0ebe12a8d2a14c0fe484bcb0b7cb365841871f2d5e24010000006b48304502210084ec4323ed07da4af6462091b4676250c377527330191a3ff3f559a88beae2e2022077251392ec2f52327cb7296be89cc001516e4039badd2ad7bbc950c4c1b6d7cc012103b9b554e25022c2ae549b0c30c18df0a8e0495223f627ae38df0992efb4779475ffffffff0118730100000000001976a9140ce17649c1306c291ca9e587f8793b5b06563cea88ac00000000

We will be using the Rust standard library without any external crates.

First create a cargo project.

cargo new Bitcoin-Tx-Hex --name btc-tx-hex

This creates a crate called btc-tx-hex in the directory called Bitcoin-Tx-Hex.

In the main.rs file create a variable called raw_tx to represent our hex encoded transaction.

fn main() {
let mut raw_tx = "010000000269adb42422fb021f38da0ebe12a8d2a14c0fe484bcb0b7cb365841871f2d5e24000000006a4730440220199a6aa56306cebcdacd1eba26b55eaf6f92eb46eb90d1b7e7724bacbe1d19140220101c0d46e033361c60536b6989efdd6fa692265fcda164676e2f49885871038a0121039ac8bac8f6d916b8a85b458e087e0cd07e6a76a6bfdde9bb766b17086d9a5c8affffffff69adb42422fb021f38da0ebe12a8d2a14c0fe484bcb0b7cb365841871f2d5e24010000006b48304502210084ec4323ed07da4af6462091b4676250c377527330191a3ff3f559a88beae2e2022077251392ec2f52327cb7296be89cc001516e4039badd2ad7bbc950c4c1b6d7cc012103b9b554e25022c2ae549b0c30c18df0a8e0495223f627ae38df0992efb4779475ffffffff0118730100000000001976a9140ce17649c1306c291ca9e587f8793b5b06563cea88ac00000000";
}

Bitcoin hex encoded transactions have four concatenated parts

version | inputs | outputs | locktime

Version

The Bitcoin version is represented in four bytes which is a u32 in Rust. Create a new module called version.rs and import it into the main.rs file.

mod version;
pub use version::*;

fn main() {
    //... previous code
}

In our version.rs file let’s create a struct to parse our version

/// Bitcoin transactions version one and two are supported
/// by Bitcoin core. A node must pre-configure a transaction
/// version higher than version 2 and this transaction is
/// not guaranteed to be propagated by all Bitcoin core.
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Default)]
pub enum TxVersion {
    /// This will be treated as the default version
    /// when calling TxVersion::default()
    #[default]
    One,
    /// The Bitcoin transaction version two which allows
    /// using the OPCODE `OP_CHECKSEQUENCEVERIFY` which allows
    /// setting relative locktime for spending outputs.
    Two,
    /// Custom transaction version which is considered non-standard,
    /// must be set by the Bitcoin node operator and is not guaranteed
    /// to be accepted by other nodes running Bitcoin core software
    Custom(u32),
}

Next we implement our encoder and decoder; to and from bytes for the TxVersion . The version is in little-endian format which is also called reverse byte order .

impl TxVersion {
    /// This converts our version to bytes.
    /// Since version number is four bytes little-endian we use `u32::to_le_bytes()`
    pub fn to_bytes(&self) -> [u8; 4] {
        match self {
        Self::One => 1u32.to_le_bytes(),
        Self::Two => 2u32.to_le_bytes(),
        Self::Custom(version) => version.to_le_bytes(),
        }
    }

    /// This converts from bytes to `Self`
    pub fn from_bytes(bytes: [u8; 4]) -> Self {
        let parsed = u32::from_le_bytes(bytes);

        match parsed {
            1u32 => Self::One,
            2u32 => Self::Two,
            _ => Self::Custom(parsed),
        }
    }
}

Next we write a simple test to check the correctness of our parser.

#[cfg(test)]
mod tx_sanity_checks {
    use crate::TxVersion;

    #[test]
    fn tx_version() {
        assert_eq!([1u8, 0, 0, 0], TxVersion::One.to_bytes());
        assert_eq!([2u8, 0, 0, 0], TxVersion::Two.to_bytes());
        assert_eq!([30u8, 0, 0, 0], TxVersion::Custom(30).to_bytes());

        assert_eq!(TxVersion::One, TxVersion::from_bytes([1u8, 0, 0, 0]));
        assert_eq!(TxVersion::Two, TxVersion::from_bytes([2u8, 0, 0, 0]));
        assert_eq!(
        TxVersion::Custom(30),
        TxVersion::from_bytes([30u8, 0, 0, 0])
        );
    }
}

Let’s run our test with the command

cargo test --verbose --all-features

Our test passes with the following output

# ... Other log data here
# Below is the part we are interested in
running 1 test
test version::tx_sanity_checks::tx_version ... ok

After parsing the version, we need to get the number of outputs in the output section. To do this, we parse the first byte into a Bitcoin VarInt .

VarInt

A VarInt (short for “Variable Integer”) is a crucial format used in Bitcoin to indicate the lengths of fields within transactions, blocks, and peer-to-peer network data. Learn more about VarInt at https://web.archive.org/web/20230331170203/https://learnmeabitcoin.com/technical/varint

NOTE: That a VarInt of 8 bytes is beyond the maximum block size of a Bitcoin block and therefore never used in a Bitcoin transaction.

In our Rust VarInt parser, we will use a std::io::Cursor to read either 0, 2, 4 or 8 bytes from the current position in our byte length field.

Reading:

  • 0 bytes will be treated as a u8

  • 2 bytes will be treated as a u16

  • 4 bytes will be treated as a u32

  • 8 bytes will be treated as a u64

Where:

  • 0 bytes is represented by a u8 of <= 252

  • 2 bytes is represented by a u8 of 253

  • 4 bytes is represented by a u8 of 254

  • 8 bytes is represented by a u8 of 255

Create a new module in our src directory called varint.rs then in then import our module in the src/main.rs file.

// ... Previous imports here
mod varint;
pub use varint::*;

fn main() {
    // ... Previous code here
}

In our src/varint.rs file.

// We are using a Cursor to have a position
// of up to the index that the bytes have been
// read. This is convinient instead of using
// a counter to keep track of everything
// which can be cumbersome since we need to
// keep track of the length of bytes t
use std::io::{self, Cursor, Read};

/// We create a `VarInt` struct to hold methods for calculating
/// the number of bytes in the `VarInt``
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone, Copy)]
pub struct VarInt;

impl VarInt {
    /// This converts our VarInt byte into the number of bytes that we need to parse
    pub const fn parse(byte: u8) -> usize {
        match byte {
            // 0 to 252 is treated as a Rust u8 which is 1 byte long
            0..=252 => 1,
            // 253 is treated as a Rust u16 which is 2 bytes long
            253 => 2,
            // 253 is treated as a Rust u32 which is 4 bytes long
            254 => 4,
            // 253 is treated as a Rust u64 which is 8 bytes long
            255 => 8,
        }
    }

    /// Given a Cursor of bytes, we read the current or next number of bytes
    /// then convert them into an integer
    pub fn integer(byte_len: usize, bytes: &mut Cursor<&[u8]>) -> io::Result<usize> {
        let outcome = match byte_len {
                1 => {
                // NOTE - Since we are reading one value and the Cursor always advances
                // by the number of bytes read, we reset the cursor to the last position
                // in order to parse that one byte. First we get the current cursor
                // position using `bytes.position()` and then subtract 1
                bytes.set_position(bytes.position() - 1);

                // A u8 has array length of 1
                let mut buffer = [0u8; 1];
                // Read exactly one byte
                bytes.read_exact(&mut buffer)?;

                buffer[0] as usize
            }
            2 => {
                // A u16 has array length of 2
                let mut buffer = [0u8; 2];
                // Read exactly two bytes
                bytes.read_exact(&mut buffer)?;

                u16::from_le_bytes(buffer) as usize
            }
            4 => {
                // A u32 has array length of 4
                let mut buffer = [0u8; 4];
                // Read exactly four bytes
                bytes.read_exact(&mut buffer)?;

                u32::from_le_bytes(buffer) as usize
            }
            8 => {
                // A u32 has array length of 8
                let mut buffer = [0u8; 8];
                // Read exactly eight bytes
                bytes.read_exact(&mut buffer)?;

                u64::from_le_bytes(buffer) as usize
            }
            _ => {
                    // All other values are not supported and we return an error to
                    // indicate this
                    return Err(std::io::Error::new(
                    std::io::ErrorKind::NotFound,
                    "The byte length specified is not supported",
                    ));
                }
            };

        Ok(outcome)
    }
}

In the same file, we write tests to check if our VarInt is being parsed correctly or does the parse resolve to an error

#[cfg(test)]
mod varint_sanity_checks {
    use crate::VarInt;
    use std::io::{Cursor, Read};

    #[test]
    fn varint_zero_to_252() {
        let bytes = [0u8, 0, 0, 0, 1];
        let mut bytes = Cursor::new(bytes.as_slice());

        // Simulate version bytes by skipping 4 bytes
        bytes.set_position(4);

        let mut varint_byte = [0u8; 1];
        bytes.read_exact(&mut varint_byte).unwrap();
        let varint_byte_len = VarInt::parse(varint_byte[0]);
        let varint_len = VarInt::integer(varint_byte_len, &mut bytes);
        assert!(varint_len.is_ok());
        assert_eq!(1usize, varint_len.unwrap());
    }

    #[test]
    fn varint_253() {
        let mut bytes = vec![0u8, 0, 0, 0, 253];
        let placeholder_bytes = [1u8; 257];
        bytes.extend_from_slice(&placeholder_bytes);
        let mut bytes = Cursor::new(bytes.as_slice());

        // Simulate version bytes by skipping 4 bytes
        bytes.set_position(4);

        let mut varint_byte = [0u8; 1];
        bytes.read_exact(&mut varint_byte).unwrap();
        let varint_byte_len = VarInt::parse(varint_byte[0]);
        let varint_len = VarInt::integer(varint_byte_len, &mut bytes);
        assert!(varint_len.is_ok());
        assert_eq!(257usize, varint_len.unwrap());    
    }

    #[test]
    fn varint_254() {
        let mut bytes = vec![0u8, 0, 0, 0, 254];
        let placeholder_bytes = [1u8; 40];
        bytes.extend_from_slice(&placeholder_bytes);
        let mut bytes = Cursor::new(bytes.as_slice());

        // Simulate version bytes by skipping 4 bytes
        bytes.set_position(4);

        let mut varint_byte = [0u8; 1];
        bytes.read_exact(&mut varint_byte).unwrap();
        let varint_byte_len = VarInt::parse(varint_byte[0]);
        let varint_len = VarInt::integer(varint_byte_len, &mut bytes);
        assert!(varint_len.is_ok());
        assert_eq!(16843009usize, varint_len.unwrap());
    }    

    #[test]
    fn varint_255() {
        let mut bytes = vec![0u8, 0, 0, 0, 255];
        let placeholder_bytes = [1u8; 40];
        bytes.extend_from_slice(&placeholder_bytes);
        let mut bytes = Cursor::new(bytes.as_slice());

        // Simulate version bytes by skipping 4 bytes
        bytes.set_position(4);

        let mut varint_byte = [0u8; 1];
        bytes.read_exact(&mut varint_byte).unwrap();
        let varint_byte_len = VarInt::parse(varint_byte[0]);
        let varint_len = VarInt::integer(varint_byte_len, &mut bytes);
        assert!(varint_len.is_ok());
        assert_eq!(72340172838076673usize, varint_len.unwrap());
    }
}

Next we create our transaction parser by creating a module called tx.rs in the src directory and then registering our module in the src/main.rs file.

// ... Previous imports here

mod tx;
pub use tx::*;

fn main() {
    //... Previous code here
}

In our src/tx.rs file

Bitcoin transaction have inputs and outputs so we create structs to represent them.

/// Our transaction inputs
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)]
pub struct TxInput {
    // The SHA256 bytes of the previous transaction ID
    // of the unspent UTXO
    previous_tx_id: [u8; 32],
    // Previous index of the previous transaction output
    previous_output_index: u32,
    // The scriptSig
    signature_script: Vec<u8>,
    // The sequence number
    sequence_number: u32,
}

/// Transaction outputs
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)]
pub struct TxOutput {
    // Amount in satoshis
    amount: u64,
    // The locking script which gives conditions for spending the bitcoins
    locking_script: Vec<u8>,
}

Now we combine the TxInput, TxOutput , VarInt and TxVersion as part of the transaction struct

// Import our modules
use crate::{TxVersion, VarInt};
use std::io::{Cursor, Read, self};

/// The structure of the Bitcoin transaction
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Default)]
pub struct BtcTx {
    // The version of the Bitcoin transaction
    version: TxVersion,
    // A transaction can have multiple inputs
    inputs: Vec,
    // A transaction can have multiple outputs
    outputs: Vec,
    // The locktime for the transaction parsed
    // from 4 bytes into a u32
    locktime: u32,
}

Next we implement methods to parse our version, inputs, outputs and locktime into our BtcTx

impl BtcTx {
    /// Convert hex bytes into a Transaction struct. This calls all other
    /// methods to parse the version, inputs, outputs and locktime.
    pub fn from_hex_bytes(bytes: impl AsRef<[u8]>) -> io::Result<Self> {
        // Instantiate a new cursor to hold the bytes.
        // The cursor's position advances whenever we read
        // bytes allowing us to simplify the logic
        // instead of using a counter to keep track of bytes read
        let mut bytes = Cursor::new(bytes.as_ref());

        // The version number is always a 4 byte array
        let mut version_bytes = [0u8; 4];
        // Read exactly 4 bytes and advance the cursor to the 4th byte
        bytes.read_exact(&mut version_bytes)?;
        // Get the transaction version from the bytes
        let version = TxVersion::from_bytes(version_bytes);        

        // Get a vector of inputs by calling the `Self::get_inputs()` method
        let inputs = BtcTx::get_inputs(&mut bytes)?;
        // Get a vector of outputs by calling the `Self::get_outputs()` method
        let outputs = BtcTx::get_outputs(&mut bytes)?;
        // Get a vector of inputs by calling the `Self::locktime()` method
        let locktime = BtcTx::locktime(&mut bytes)?;    

        Ok(BtcTx {
            version,
            inputs,
            outputs,
            locktime,
        })
    }

    /// Get all inputs from the current position of the `Cursor`.
    /// This method decodes the number of inputs by first decoding the
    /// `varint` and then looping number of inputs calling
    /// `Self::input_decoder()` on each iteration.
    fn get_inputs(bytes: &mut Cursor<&[u8]>) -> io::Result<Vec> {
        let mut varint_len = [0u8];
        bytes.read_exact(&mut varint_len)?;

        let varint_byte_len = VarInt::parse(varint_len[0]);
        let no_of_inputs = VarInt::integer(varint_byte_len, bytes)?;

        let mut inputs = Vec::::new();

        (0..no_of_inputs).into_iter().for_each(|_| {
        inputs.push(BtcTx::input_decoder(bytes).unwrap());
        });

        Ok(inputs)
    }    

    // Decodes an input from current `Cursor` position.
    fn input_decoder(bytes: &mut Cursor<&[u8]>) -> io::Result {
        // The previous transaction ID is always a SHA256 hash converted to a 32 byte array
        let mut previous_tx_id = [0u8; 32];
        // Read exactly 32 bytes and advance the cursor to the end of the 32 byte array
        bytes.read_exact(&mut previous_tx_id)?;
        // The transaction ID in hex format is in network byte order so we reverse
        // it to little endian
        previous_tx_id.reverse();

        //Previous transaction index is 4 bytes long which is a Rust u32
        let mut previous_tx_index_bytes = [0u8; 4];
        bytes.read_exact(&mut previous_tx_index_bytes)?;
        // Convert the read 4 bytes to a u32
        let previous_output_index = u32::from_le_bytes(previous_tx_index_bytes);

        // Get the length of the scriptSig
        let mut signature_script_size = [0u8];
        bytes.read_exact(&mut signature_script_size)?;
        // Parse the length VarInt
        let varint_byte_len = VarInt::parse(signature_script_size[0]);
        // Get the length by converting VarInt into an integer by calling `integer`
        let integer_from_varint = VarInt::integer(varint_byte_len, bytes)?;

        // Buffer to hold the signature script
        let mut signature_script = Vec::<u8>::new();
        let mut sig_buf = [0u8; 1];
        // Since we are using a cursor, we iterate in order to advance
        // the cursor in each iteration
        (0..integer_from_varint).for_each(|_| {
        bytes.read_exact(&mut sig_buf).unwrap();

        signature_script.extend_from_slice(&sig_buf);
        });

        // The sequence number is a u32 (4 bytes long)
        let mut sequence_num_bytes = [0u8; 4];
        bytes.read_exact(&mut sequence_num_bytes)?;
        // Convert the sequence number to a integer
        let sequence_number = u32::from_le_bytes(sequence_num_bytes);    

        Ok(TxInput {
            previous_tx_id,
            previous_output_index,
            signature_script,
            sequence_number,
        })
    }    

    /// Get the outputs after all inputs have been parsed.
    fn get_outputs(bytes: &mut Cursor<&[u8]>) -> io::Result<Vec> {
        // Get the number of outputs by reading our VarInt
        let mut num_of_output_bytes = [0u8; 1];
        bytes.read_exact(&mut num_of_output_bytes)?;
        let var_int_byte_length = VarInt::parse(num_of_output_bytes[0]);
        // Convert our VarInt to an integer
        let num_of_outputs = VarInt::integer(var_int_byte_length, bytes)?;

        let mut outputs = Vec::::new();

        // Iterate over number of outputs
        (0..num_of_outputs).into_iter().for_each(|_| {
        // The first value of the output is the amount in satoshis
        // which is 8 bytes long (Rust u64)
        let mut satoshis_as_bytes = [0u8; 8];
        bytes.read_exact(&mut satoshis_as_bytes).unwrap();
        // Get the number of satoshis in decimal
        let satoshis = u64::from_le_bytes(satoshis_as_bytes);    

        // Get the exact size of the locking script
        let mut locking_script_len = [0u8; 1];
        bytes.read_exact(&mut locking_script_len).unwrap();
        // Parse the length into a varint
        let script_byte_len = VarInt::parse(locking_script_len[0]);
        // Convert our VarInt to an integer
        let script_len = VarInt::integer(script_byte_len, bytes).unwrap();
        let mut script = Vec::<u8>::new();

        // For the length of the script, read each byte and advance the cursor in each iteration
        (0..script_len).for_each(|_| {
            let mut current_byte = [0u8; 1];

            bytes.read_exact(&mut current_byte).unwrap();
            script.extend_from_slice(¤t_byte);
        });

        // Construct our Transaction Output struct and then push it to the outputs vec
        outputs.push(TxOutput {
                amount: satoshis,
                locking_script: script,
            });
        });

        Ok(outputs)
    }

    // Lastly, after parsing our version, inputs and outputs we parse the locktime
    fn locktime(bytes: &mut Cursor<&[u8]>) -> io::Result<u32> {
        // The locktime is 4 bytes long
        let mut locktime_bytes = [0u8; 4];
        bytes.read_exact(&mut locktime_bytes)?;

        // Convert the locktime into an integer
        Ok(u32::from_le_bytes(locktime_bytes))
    }
}

We can now utilize our code to decode a hex transaction. First add the hex-conservative crate into the Cargo.toml manifest file since creating a library to parse hex into bytes is beyond the scope of this article.

Lastly, we use our parse to parse the transaction hex string we introduced at the beginning of the article.

fn main() {
    let raw_tx = hex!("010000000269adb42422fb021f38da0ebe12a8d2a14c0fe484bcb0b7cb365841871f2d5e24000000006a4730440220199a6aa56306cebcdacd1eba26b55eaf6f92eb46eb90d1b7e7724bacbe1d19140220101c0d46e033361c60536b6989efdd6fa692265fcda164676e2f49885871038a0121039ac8bac8f6d916b8a85b458e087e0cd07e6a76a6bfdde9bb766b17086d9a5c8affffffff69adb42422fb021f38da0ebe12a8d2a14c0fe484bcb0b7cb365841871f2d5e24010000006b48304502210084ec4323ed07da4af6462091b4676250c377527330191a3ff3f559a88beae2e2022077251392ec2f52327cb7296be89cc001516e4039badd2ad7bbc950c4c1b6d7cc012103b9b554e25022c2ae549b0c30c18df0a8e0495223f627ae38df0992efb4779475ffffffff0118730100000000001976a9140ce17649c1306c291ca9e587f8793b5b06563cea88ac00000000");
    let tx_decode = BtcTx::from_hex_bytes(raw_tx);

    dbg!(tx_decode.unwrap());
}

That’s it. We have decoded a raw Bitcoin hex encoded transaction.

In the next article, we will look into converting hash bytes into SHA256 strings and converting our script bytes into a Bitcoin Script.

References

  1. VarInt — https://web.archive.org/web/20230331170203/https://learnmeabitcoin.com/technical/varint

  2. The code for this article — https://github.com/448-OG/BitcoinTransactions

  3. Compact Size — https://learnmeabitcoin.com/technical/general/compact-size/

More from this blog

4

448-OG Decentralized Infrastructure Blog

11 posts

Payments, Networking and Decentralized Infrastructure