Skip to main content

Command Palette

Search for a command to run...

Understanding the data storage of Solana Program Derived Addresses

Updated
8 min read
4
<[Open Source, Rust, Decentralized Infrastructure]>::earth()

Decentralized applications need to store data immutable onchain. Solana is no different, that’s why we get Program Derived Addresses (PDAs). Creating a PDA is done through system_instruction::SystemInstruction::CreateAccountWithSeed which derives the PDA public key from a base, mostly the program public key and a seed which is a string of Rust UTF8 characters.

How a Solana PDA initialized

The initialized storage is an array of the maximum memory size the data structure takes up in memory. Storing a data structure like a username as a Rust String would require 24 bytes of memory allocated on the Solana blockchain. The bigger the data structure the higher the rent needed to pay for onchain storage. Calculating the maximum size a data structure would take up can be done using the Rust function core::mem::size_of::<T>() where T is the data structure.

core::mem::size_of::<String>(); //24 bytes

When system_instruction::SystemInstruction::CreateAccountWithSeed is called with the size of the String (24 bytes), will allocate an empty array of 24 bytes [u8; 24].

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

For large data structures serialize/ deserialize libraries are used to handle the conversion of data structures to and from bytes.


A BIT OF GLASS: Solana uses Rust for programming onchain programs and the most popular serialization framework is serde. Unfortunately, serde uses too many CPU cycles to serialize and deserialize making it expensive to convert data to and from bytes on Solana’s limited BPF smart contract execution stack. borsh crate is the go to serialization library for Solana.


Suppose we need to create an end-to-end encrypted messaging Dapp that stores a user’s Diffie-Hellman public keys onchain in the user’s PDA address. The data structure in Rust would look like this:

use borsh::{BorshSerialize, BorshDeserialize};

#[derive(Debug, BorshSerialize, BorshDeserialize, Default)]
pub struct MessagingAccount {
   username: String,
   dh_keys: Vec<[u8; 32]>,
}

Calculating the size using core::mem::size_of::<MessagingAccount>() will return a maximum size of 48 bytes that can be stored in memory. Creating a PDA on Solana using system_instruction::SystemInstruction::CreateAccountWithSeed will initialize a zeroed array of 48 bytes ([u8; 48]) which can be represented as:

[0u8; core::mem::size_of::<MessagingAccount>()]

Using solana_program::msg to log the PDA would show the initialized account as

AccountInfo {
       key: pda_address,
       owner: program_id,
       is_signer: false,
       is_writable: true,
       executable: false,
       ..,
       data.len: 48,
       data: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,
   .. }

The data part of the AccountInfo above shows that 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 which is hex for the 48 bytes of the zeroed storage initialized when the PDA account was created [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

We can deserialize the data structure onchain as follows:

// – Code snippet –
entrypoint!(process_instruction);

   pub fn process_instruction(
       program_id: &Pubkey,
       accounts: &[AccountInfo],
       instruction_data: &[u8],
   ) -> ProgramResult {
       let accounts_iter = &mut accounts.iter();

       let pda_account = next_account_info(accounts_iter)?;
       let pda_data = MessagingAccount::try_from_slice(instruction_data)?;
   }

Calling the smart contract code will return a borsh error

Err(
    Custom {
        kind: InvalidData,
        error: "Not all bytes read",
    },
)

The error returned is because borsh tries to deserialize the PDA account based on the data size encoded in the borsh specification - https://borsh.io/, in our case MessagingAccount::default which is [0, 0, 0, 0, 0, 0, 0, 0] which is an array of 8 bytes while the initialized PDA store is an array of 48 bytes.

What To Do?

A simple way to solve this issue is to pack and unpack bytes using a storage format. The storage format will contain a marker to indicate the length of the valid bytes. The marker is of type usize which has a maximum of 8 bytes which can be calculated by core::mem::size_of::<usize>().len() The data storage format can be defined as:

MARKER (8 bytes) | DATA (variable byte length) | Zeros (filled with zeroes)
To Serialize the data into the storage format
  1. calculate the size of the valid bytes serialized by borsh: MessagingAccount::try_to_vec()?.len()

  2. Get the size of the storage length of the pda account: pda_account.data.len()

  3. Add the MARKER length to the serialized data length: 8 + MessagingAccount::try_to_vec()?.len()

  4. Check if the length of the MARKER + MessagingAccount::try_to_vec()?.len() is greater than the length of the pda_account data storage.

  5. If the size of the MARKER + MessagingAccount::try_to_vec()?.len() os greater, return an error informing the user that the data cannot be written to the Solana PDA storage because it exceeded the capacity of the PDA account.

  6. If the data is less than or equal to the capacity of the PDA account storage, concate the MARKER with the bytes of the serialized data and then if the concatenated data is still less than the capacity of the PDA account storage, fill the remaining space with zeroes.

  7. Write the data into the PDA storage

To deserialize the data storage format
  1. Get the first 8 bytes and convert them to a MARKER : usize::from_le_bytes(bytes_stored[0..8]

  2. Skip the first 8 bytes as indicated by the marker and then fetch the rest of the bytes up to the index indicated by the marker: let data = bytes_stored.iter().skip(8).take(MARKER).collect::<Vec<u8>>();

  3. Deserialize the data using borsh: MessagingAccount.try_from_slice(&data)?;

Transforming the pseudocode above to code:

The code contains comments that explain each step.

  • Create code to handle the errors
/// The result type that encompasses the `AccountStoreError`
pub type AccountStoreResult<T> = Result<T, AccountStoreError>;

#[derive(Debug)]
pub enum AccountStoreError {
    /// The buffer cannot acommodate the size of the `MARKER` which is `8 bytes`
    BufferTooSmallForMarker = 0,
    /// The buffer cannot acommodate the size of the data and the  `MARKER`
    BufferTooSmallForData = 1,
    /// The deserialized bytes do not contain a `MARKER`
    CorruptedMarker = 2,
    /// The data provided does not contain enough data length as specified by the `MARKER`
    CorruptedStorage = 3,
    /// The error provided is invalid
    InvalidError = 4,
}
  • Create a data structure to handle these operations:
      pub const MARKER_SIZE: usize = 8;

      #[derive(Debug)]
      pub struct AccountStore<T> {
          pub data: T,
      }
  • Create a method to calculate the size needed for a generic data structure to be stored onchain with our storage format. The impl {} block to take a generic parameter T that must implement BorshSerialize, BorshDeserialize and core::fmt::Debug.
impl<T> AccountStore<T>
where
    T: BorshDeserialize + BorshSerialize + Default + Sized,
{
    pub fn size_of() -> usize {
        core::mem::size_of::<T>() + MARKER_SIZE
    }
}

The size_of() method calculates the size of the data structure specified as T and adds 8 bytes to accommodate the marker information.

  • Create a method to pack the data Create a method called pack() to convert the data into the storage format and write it to a provided buffer
impl {
  // -- Code snippet --
   pub fn pack(&self, buffer: &mut [u8]) -> AccountStoreResult<usize> {
        // Get the length of the PDA account storage size
        let buffer_length = buffer.len();

        // Check if the size of the `MARKER` is less than the size  of the `buffer_length`
        if buffer_length < MARKER_SIZE {
            // If the size is smaller, return an error indicating this to the user
            return Err(AccountStoreError::BufferTooSmallForMarker);
        }
        // Serialize the user data using `borsh`
        let data = self.data.try_to_vec().unwrap(); //HANDLE THIS BORSH ERROR AS YOU WISH
                                                    // Get the data length
        let data_length = data.len();

        // Check if the sum of the size of the `data_length` and the `MARKER_SIZE` is
        // greater than the `buffer_length`
        if buffer_length < data_length + MARKER_SIZE {
            return Err(AccountStoreError::BufferTooSmallForData);
        }

        // Copy the `data_length` to the buffer as the `MARKER`
        buffer[0..=7].copy_from_slice(&data_length.to_le_bytes());

        // Copy the data into the buffer.
        // If the data is smaller than the buffer then the space filled with
        // zeroes is left intact
        buffer[8..=data_length + 7].copy_from_slice(&data);

        Ok(data_length + 8usize)
    }
}
  • Create a method unpack() to read the storage format
impl {
  // -- Code snippet --


    pub fn unpack(buffer: &[u8]) -> AccountStoreResult<AccountStore<T>> {
        // Get the length of the PDA account storage
        let buffer_length = buffer.len();

        // Check if the size of the `MARKER` is less than the size  of the `buffer_length`
        if buffer_length < MARKER_SIZE {
            return Err(AccountStoreError::BufferTooSmallForMarker);
        }

        // Convert the `MARKER` bytes to and array of `[u8; 8] `
        // since `usize::from_le_bytes` only accepts `[u8; 8]`
        let marker: [u8; 8] = match buffer[0..MARKER_SIZE].try_into() {
            Ok(value) => value,
            Err(_) => return Err(AccountStoreError::CorruptedMarker),
        };
        // Get the last index of the valid data
        let byte_length = usize::from_le_bytes(marker);

        // Check if the last index of the valid buffer is greater than the PDA storage size
        if byte_length > buffer_length {
            return Err(AccountStoreError::CorruptedStorage);
        }

        // Collect the valid data by skipping the `MARKER_SIZE` of `8 bytes`
        // and iterating the rest of the bytes until the index marked by the `byte_length`
        let data = buffer
            .iter()
            .skip(8)
            .take(byte_length)
            .map(|byte| *byte)
            .collect::<Vec<u8>>();

        if byte_length != 0 {
            let data = T::try_from_slice(&data).unwrap(); // Handle error as you see fit
            Ok(AccountStore { data })
        } else {
            // If the `byte_length` is zero it means that no previous
            // data had been written to the PDA account previously
            // so return the `Default` representation of the data structure represented
            // by the generic `T`
            Ok(AccountStore { data: T::default() })
        }
    }
}
  • Lastly, create a method to add data to the store
impl {
  // -- snippet --
pub fn add_data(&mut self, data: T) -> &mut Self {
        self.data = data;

        self
    }
}
Utilizing our storage library to serialize and deserialize the pda_account.data
  • Unpack the Solana PDA data pda_account.data using unpack() method and add_data() to add some data eg. from a Solana Instruction
let mut data = AccountStore::MessagingAccount>::unpack(&pda_account.data.as_ref().borrow()).unwrap();

let user_data = MessagingAccount {
    username: "SolanaSeaMonster".into(),
    dh_keys: vec![[1u8; 32], [2u8; 32]],
};

data.add_data(user_data);
  • Packing the data and writing it back to the Solana PDA account using pack() method.
data.pack(&mut &mut pda_account.data.borrow_mut()[..]).unwrap();

Since a Solana PDA account storage is a Rust RefCell<Rc<[u8; T>>, the &variable_name.as_ref().borrow() is used to access the data and convert it into a &[u8] as required by the unpack() method and the &mut &mut variable_name.data.borrow_mut()[..] is used to write data.

Now you know how the PDA storage concept works and how you can create your own lightweight storage format to read and write the data.

That's it.

Keep chewing glass.

More from this blog

4

448-OG Decentralized Infrastructure Blog

11 posts

Payments, Networking and Decentralized Infrastructure