Home > programming > Tree serialization and deserialization

Tree serialization and deserialization


http://www.sureinterview.com/shwqst/108006

General idea

The tree serialization is to save its data structure into a flat file, from which the original tree can be recovered (deserialized).

There are a few ways to serialize a tree.

 

 

1 Serialize the nodes

A common method of serialize a tree or graph is to save the node values and their relationship. For example, given the tree on the right. Each node can be represented by 4 elements, (#, value, #left, #right). By listing all notes, the tree can be serialized as:

0, "a", 1, 2
1, "b", 3, ^
2, "c", ^, ^
3, "d", ^, ^

Actually, this is also the general way to serialize a graph. Following code draws this example tree in DOT language. The graph (tree) breaks down into nodes and edges, which can have different properties.

 


  1. digraph{
  2.   node [style=bold, label="\N", shape=record];
  3.   node0 [label="<l> | <v>A | <r>"];
  4.   node1 [label="<l> | <v>B | <r>"];
  5.   node2 [label="<l> | <v>C | <r>"];
  6.   node3 [label="<l> | <v>D | <r>"];
  7.   node0:l -> node1;
  8.   node0:r -> node2;
  9.   node1:l -> node3;
  10. }

 

During the serialization process, we need to assign each node a unique id. To efficiently look up that id, the mapping from object to id can be saved into a hash table. If the tree is stored in a list or array, the index of the node can be used as the id instead. Similar hash table mapping from id to object can be used in deserialization.

2 Tree traversal

The traversal result of a tree also implicitly determine its topology. For example, the pre-order traversal uniquely describe the example tree above as.


  1. a b d ^ ^ ^ c ^ ^

The “^” is for empty node, which is necessary to recover the tree. Because a tree with n nodes has n+1 empty nodes, to save space, multiple empty nodes can be noted as ^#. For example, the above serialized result can be revised to:


  1. a b d ^3 c ^2

 

Check a slide for serializing trees.

A combination of preorder and inorder traversal can also uniquely describe a tree. The example tree has preorder and inorder traversal as follows:


  1. a b d c //preorder
  2. c b a d //inorder
  3. a (b c) [d] //preorder
  4. (c b) a [d] //inorder

 

The first node in preorder list is the root. By searching the root in the inorder list, the traversals are partitioned into left branch and right branch. The branches can be recovered in the same way recursively. This method is not efficient in term of time and space complexity. Saving tree size is better but is still not quite efficient.

3 Newick format

Newick format is widely used in bioinformatics. It is a human readable format but needs more effort to serialize and deserialize.

Following code is to deserialize the tree in format of value(<left branch>,<right branch>). The example above is serialized as0(1(2,),3).

 


  1. private BNode makeTreeRec() throws Exception {
  2.     // 1. setup
  3.     BNode root = new BNode();
  4.     char c = treeStr[pos];
  5.     root.value = 0;
  6.     while (c >= '0' && c <= '9') {
  7.         root.value *= 10;
  8.         root.value += c - '0';
  9.         c = treeStr[++pos];
  10.     }
  11.     // 2 left, root, right
  12.     if (c == '(') {
  13.         c = treeStr[++pos];
  14.         if (c != ',') {
  15.             root.left = makeTreeRec();
  16.         }
  17.         if (treeStr[pos] != ',')
  18.             throw new Exception("',' expected at position " + pos);
  19.         c = treeStr[++pos]; // skip ","
  20.         if (c != ')') {
  21.             root.right = makeTreeRec();
  22.         }
  23.         if (treeStr[pos] != ')')
  24.             throw new Exception("')' expected at position " + pos);
  25.         ++pos; // --> )
  26.     }
  27.     return root;
  28. }

 

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/lib/tree/TreeUtil.java#172

 

 

Advertisements
Categories: programming
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: