Dump Lua 1D array to a file efficiently

Recently, I was using Lua [1] in one of my projects. I got stuck in a trivial problem. The project which was written in Lua, was generating a massive amount of data. My aim was to save this data in files in approximately real-time. Fortunately, I was able to decompose the data into 1-dimensional arrays (table).

In this post, I am going to explain an efficient way of dumping large Lua 1D arrays into files. We are going to use Userdata provided by Lua [2]. The userdata offers a raw memory area with no predefined operations. Note that once we have access to raw memory, we can do operations quickly. We are going to use Lua C bindings [3] for this purpose.

Let’s code. First, we need to define the following structure to hold the data. In our case, we are dealing with float type data.

typedef struct FloatArray {
  int size;
  float values[1];
} FloatArray;

Let’s define a function to create an array using userdata-

static FloatArray *create_array(lua_State *L, int n) {
  // number of bytes to allocate a block of memory
  size_t nbytes = sizeof(FloatArray) + (n - 1) * sizeof(float);
  FloatArray *a = (FloatArray *)lua_newuserdata(L, nbytes);
  a->size = n;
  return a;
}

The above function expects an int number to define the size of this array. Now we need a way to set each element of this array. Let’s define a function for the same as shown below-

static int set_element(lua_State *L) {
  FloatArray *a = (FloatArray *)lua_touserdata(L, 1);
  int index = luaL_checkint(L, 2);
  float value = luaL_checknumber(L, 3);

  luaL_argcheck(L, a != NULL, 1, "array expected");
  luaL_argcheck(L, 1 <= index && index <= a->size, 2, "index out of range");

  // set the value at given index
  // index starts from 0 in C where as it starts from 1 in Lua
  a->values[index - 1] = value;
  return 1;
}

We should also define other helper functions to get the element and size from this array.

static int get_element(lua_State *L) {
  FloatArray *a = (FloatArray *)lua_touserdata(L, 1);
  int index = luaL_checkint(L, 2);

  luaL_argcheck(L, a != NULL, 1, "array expected");
  luaL_argcheck(L, 1 <= index && index <= a->size, 2, "index out of range");

  lua_pushnumber(L, a->values[index - 1]);
  return 1;
}

static int get_size(lua_State *L) {
  FloatArray *a = (FloatArray *)lua_touserdata(L, 1);
  luaL_argcheck(L, a != NULL, 1, "array expected");
  lua_pushnumber(L, a->size);
  return 1;
}

At this stage, we need a function which can dump this array into a file. In the following function, I am going to write a binary file containing float values. since userdata provides raw memory access, we can use fwrite function [4] provided by stdio.h in C language to write a block of memory into the file. Below is the function-

static void write_file(const char *file_name, lua_State *L, float *values,
                       int size) {
  // initialize binary file
  FILE *f = fopen(file_name, "wb");
  luaL_argcheck(L, f != NULL, 2, "can't open file");

  // write the array into binary file
  fwrite(values, sizeof(float), size, f);

  // make sure the data has been written successfully
  if (fclose(f) != 0)
    luaL_error(L, "can't close file %s", file_name);
}

static int dump_float_array(lua_State *L) {
  FloatArray *a = (FloatArray *)lua_touserdata(L, 1);
  luaL_argcheck(L, a != NULL, 1, "array expected");

  // get the input file name
  const char *file_name = luaL_checkstring(L, 2);

  // write values to a binary file
  write_file(file_name, L, a->values, a->size);
  return 1;
}

We are almost done. Now, we can create an array in Lua and then we can set each element of this array. Finally, we can dump this array to a file as well. BUT wait!!! can we improve it more? Yes, we can! Instead of setting each element of this array in Lua, what if we can provide a 1d array to C function and it sets each element inside C? It sounds great. Isn’t it? Let’s do it.

We are going to provide a 1d array to the function defined as shown below-

static int new_array_from_table(lua_State *L) {
  luaL_argcheck(L, lua_type(L, 1) == LUA_TTABLE, 1, "1D array expected");

  // get the length of input table
  int n = lua_objlen(L, 1);

  // create new userdata
  FloatArray *a = create_array(L, n);

  // fetch table data and use it to fill the userdata
  int index;
  for (index = 1; index <= n; index++) {
    lua_rawgeti(L, 1, index);
    a->values[index - 1] = (float)lua_tonumber(L, -1);
    lua_pop(L, 1);
  }
  return 1;
}

Notice carefully that first, we created userdata, and then we filled each element. This is much quicker than filling the elements in Lua. Once we have the array, we can dump it as shown previously using fwrite.

At this point, you may be wondering that why are we creating an array first and then dumping it in later stages. It clearly gives you more freedom, such as you can easily change the elements of this newly created array, etc. Before finishing our Lua C binding, I would like to add one more function, which doesn’t create an array as shown below-

static int dump_table(lua_State *L) {
  luaL_argcheck(L, lua_type(L, 1) == LUA_TTABLE, 1, "array expected");

  // get the length of input table
  int n = lua_objlen(L, 1);

  // fetch table data and use it to fill the array
  int index;
  float values[n];
  for (index = 1; index <= n; index++) {
    lua_rawgeti(L, 1, index);
    values[index - 1] = (float)lua_tonumber(L, -1);
    lua_pop(L, 1);
  }

  // get the input file name
  const char *file_name = luaL_checkstring(L, 2);

  // write values to a binary file
  write_file(file_name, L, values, n);
  return 1;
}

Can you find out the difference? Instead of creating userdata, we created a float array here. Let’s add few more functions and finish this binding-

static int dump_array(lua_State *L) {
  switch (lua_type(L, 1)) {
  case LUA_TUSERDATA:
    return dump_float_array(L);
  case LUA_TTABLE:
    return dump_table(L);
  default:
    luaL_argcheck(L, 0, 1, "expected 1D array or FloatArray");
  }
  return 0;
}

static int new_array(lua_State *L) {
  switch (lua_type(L, 1)) {
  case LUA_TNUMBER:
    create_array(L, luaL_checkint(L, 1));
    return 1;
  case LUA_TTABLE:
    return new_array_from_table(L);
  default:
    luaL_argcheck(L, 0, 1, "expected number or 1D array");
  }
  return 0;
}

int luaopen_array(lua_State *L) {
  luaL_openlib(L, "array", array_lib, 0);
  return 1;
}

This is how we register our functions to Lua-

static const struct luaL_reg array_lib[] = {{"new", new_array},
                                            {"set", set_element},
                                            {"get", get_element},
                                            {"size", get_size},
                                            {"to_string", to_string},
                                            {NULL, NULL}};

Note carefully, that using switch statements we have encapsulated the complexity associated with array creation and array dumping operations beautifully.

Compilation

Before proceding further make sure to download the code from the repository [6].

  1. Open terminal or press CTRL+ALT+T
  2. Compile C code using the following command-
    gcc -I/usr/include/lua5.1 -o array.so -shared array.c -fPIC
    

    The above command is going to produce shared object.

Usage

We can use the above shared object in Lua code as shown below-

-- add the shared object to our Lua code
-- make sure to keep 'array.so' along with this code
require "array"

-- define number of elements in our array
local n = 1000000

-- create array
local my_array = {}
for i=1,n do
  my_array[i] = i+1
end

-- now we have an array called 'my_array', 
-- which we want to dump into a file. there
-- are two ways to do it

-- case 1
-- create new array first
-- then call dump function
local arr = array.new(my_array)
array.dump(arr, "out.bin")

-- case 2
-- directly call dump function
array.dump(my_array, "out.bin")

-- if we use 'array.new', we can do much more as shown below
local i=1
print(string.format("size of arr:%d, my_array[%d]:%d, arr[%d]:%d", array.size(arr), i, my_array[i], i, array.get_element(arr, i)))

Environmental Details

  1. Ubuntu 14.04 LTS 64 Bit OS
  2. Lua v5.1
  3. GCC v4.8.4

Install Lua v5.1

  1. Open terminal or press CTRL+ALT+T then use following command-
    sudo apt-get install lua5.1
    
  2. Install Lua headers by following command-
    sudo apt-get install liblua5.1-0-dev
    

Benchmarking

In order to benchmark our binding, we should use the standard Lua I/O libraries to write a file. We are going to measure the elapsed time in Lua [5].

local start_time = os.clock()
local file = io.open('out.txt','w')
for i=1,n do
    file:write(my_array[i]..'\n')
end
file:close()
local end_time = os.clock()
print(string.format("elapsed time: %.4f", (end_time - start_time)))

Below are the results of benchmarking-

Case Elapsed time (second)
Direct dump 0.0343
Create array then dump 0.0367
Standard Lua I/O 3.1897

Clearly, we can see a huge difference in elapsed time while using Lua userdata. All the code is available to download from the repository [6].

References

  1. The Programming Language Lua
  2. Userdata in Lua
  3. The Application Program Interface: Lua C Bindings
  4. fwrite
  5. Date and Time
  6. Dump Lua Array
Written on July 10, 2018