Replibyte Transformers

This post is a follow-up to the post I wrote about last week, Replibyte in Practice.

One of the most powerful features of Replibyte is the use of transformers to manipulate data when exported from the source database to a data dump file.

Replibyte has a few custom transformers built-in to help obfuscate some string values and credit card info, which you can learn more about in their documentation.

Here, I wanted to provide an example of creating a custom transformer and using it in your Replibyte configuration.


The Replibyte engine allows you to configure any web assembly executable that accepts input from STDIN and outputs the transformed value back to STDOUT. This simplicity makes it super flexible to write the transformer in almost any language that transpiles to WebAssembly.

In the post, I will use AssemblyScript because I have the most experience with Typescript.

Setup

If you followed along in the previous post, you should have a directory that looks similar to this:

replibyte/
- .env.replibyte
- conf.yml
- metadata.json
- dumps/

We will place all the transformers within one directory named custom-transformers, but each transformer will have its own WASM executable file. This will help keep the boilerplate and dependencies tidy.

To get set up, we will follow the instructions for the AssemplyScript example.

mkdir custom-transformers
cd custom-transformers
npm init -y
npm install --save @assemblyscript/loader as-wasi
npm install --save-dev assemblyscript
npx asinit .

First Transformer - Account Name

Our first transformer will obfuscate the account name.

In the scaffolding provided by asinit assembly/index.ts is the main executable file, but for our purposes, we will create a new file named accountName.ts:

import 'wasi';
import { Console } from 'as-wasi/assembly';

execute();

function execute() {
  Console.log(transform());
}

function transform(): string {
  const randomBank = getRandomBank();
  const randomAccountType = getRandomAccountType();
  const randomNumber = getRandomNumber(100).toString();

  return [randomBank, randomAccountType, `(${randomNumber})`].join(' ');
}

accountName.ts

The first thing to note is that there is no native console.log. It's all contained within the as-wasi library. We will use the imported Console to manage writing to the terminal, and later reading from the terminal.

The execute function is the main entry point for this file. It will log the output from the transform function to STDOUT. In our next transformer, we will demonstrate how to accept the current value as input to transform that value into a new value.

The transform function uses helper functions that I created to return a random bank name, a random account type, i.e. "Savings," and a random number between 0 and 100. The reason for the random number is to avoid duplicate account names, which would violate a unique constraint in the database. An example output would look like "Capital One Savings (67)".

Second Transformer - Transaction Amount

Our second transformer will obfuscate transaction amounts.

Here we will use the value contained in the database table as input. Replibyte provides the database value as input to the transformer through STDIN. It can be of type string or null, so we make sure to validate the input before attempting to transform it.

Let's create another file named amount.ts:

import 'wasi';
import { Console } from 'as-wasi/assembly';

execute();

function execute() {
  const input = Console.readLine();
  Console.log(transform(input));
}

function transform(input: string | null): string {
  if (!input) {
    return '';
  }

  const value = parseFloat(input);

  if (isNaN(value)) {
    return input;
  }

  const randomDrift = Math.random() % 0.5;
  const multiplier = getRandomBooleanValue()
    ? 1 - randomDrift
    : 1 + randomDrift;
  const transformedValue = value * multiplier;
  const formattedValue = formatValue(transformedValue);

  return `${formattedValue}`;
}

Here we have a transform function that parses the input to a number. If it's null, we return an empty string to appease the required input type to Console.log. If it's not a valid number, then we return the original non-transformed input. Lastly, we set up the actual transformation, which effectively transforms the number up or down by up to 50%.

I determined this very unscientifically, but if there were a transaction for $100, this transformer would convert it to a random number between $50 - $150. We get a more realistic value by transforming the value instead of coming up with a random number within some bounds. It also has the effect of showing the same general results on a graph without providing using the actual values.

Build

To build, we will first update the scripts in package.json file.

Update the scripts section to look like the following:

"scripts": {
    "asbuild:amount": "asc assembly/amount.ts --target release -o build/amount",
    "asbuild:accountName": "asc assembly/accountName.ts --target release -o build/accountName",
    "asbuild": "npm run asbuild:amount && npm run asbuild:accountName"
  },

This update will allow us to use the release target configured in asconfig.json to compile our separate input files into individual .wasm files for Replibyte to execute.

Now we can run npm run asbuild to build our transformers into the build/ directory.

Configuration

Now we have to configure Replibyte to use this transformation.

Luckily, that is the easy part. We take the config we used in the previous post and update it like so:

encryption_key: $REPLIBYTE_ENCRYPTION
source:
  connection_uri: $SOURCE_DATABASE_URL
  transformers:
    - database: public
      table: accounts
      columns:
        - name: name
          transformer_name: custom-wasm
          transformer_options:
            path: "custom-transformers/build/accountName.wasm"
    - database: public
      table: transactions
      columns:
        - name: amount
          transformer_name: custom-wasm
          transformer_options:
            path: "custom-transformers/build/amount.wasm"
destination:
  connection_uri: $TARGET_DATABASE_URL
datastore:
  local_disk:
    dir: ./dumps

For the table named accounts, we will use our accountName transformer on the name column. And for the table named transactions we will use our amount transformer on the amount column.

The transformer_name has to be custom-wasm because it tells Replibyte that we are using a custom transformer.

And that's all. When you execute the replibyte -c ./replibyte-conf.yml dump create command, it will run the transformer on the specified table column data.

Comments on AssemblyScript

I chose AssemblyScript because I have a lot of experience with Typescript and assumed it would be the quickest solution. I learned that with AssemblyScript, you don't get the value of the plethora of npm modules out there.

Don't expect to be able to npm install library, import it, and expect it to work.

Initially, I wanted to create a simple transformer that used the faker library to generate data. After many hours of attempts, I could not get it to work and was provided little to no explanation on why it would not compile.

I learned that using a library in AssemblyScript requires the source Typescript files to be included, and that source code has to be compatible with AssemblyScript. At first, I just tried to import the code like any other npm module. But most libraries transpile the Typescript down to JavaScript and don't even include the Typescript in the npm package. And AssemblyScript has a unique way of working with Javascript that does not allow including it in the compiled WASM file. Once I figured out how to include the source code with the install (using Github tarball link) I still ran into error after error that did not have simple solutions (that I could find, at least).

I may have had better luck using a different language, like Ruby or Rust, but I decided it was not much worth the effort rather than creating helper functions to generate the dummy data from scratch.


I hope you found this post helpful in understanding transformers for Replibyte and can go off and create some of your own.

Let me know if you have any questions.

Find me on Threads or email me at codingmatty@gmail.com