Monday, March 7, 2011

An interesting interview question, How to store an grid of which each dimension has more than 1m elements

One of my friend , he went to an interview, during the interview he was asked, if we have a grid , both x and y dimension have 1 million elements,  what kind of data structure you will use? Given the left top position of a view port , we also know the width and height of the view port, how we can quickly figure out those nodes in the viewport?

I was thinking this these few days, assume each node in the grid, we need to persist it's x axis and y axis,
public struct Node
{
     int x;
     int y;
}
each node need at least 8 bytes in memory, put the memory alignment aside , don't worry about it at this moment, for an 1million * 1million grid, if we store all the nodes in memory, how big it will be,
Total =  1million * 1million * 8 /1024/1024/1024 GB
         =  7450.58GB
so, obviously , it is not possible to store the whole grid in memory, we have to break down the grid to small grids.Assume the size of the view port is 1024 * 768, I will say each small grid, the size is 1024 * 768, a view port.

Based on this , we can break down the big grid to a small one, I will call it page grid, but each cell in the grid, pointing to another grid, which is exactly the size of one view port, I call it a page, sorry, I borrow this idea from database .

x:0-1024
y:0-768

x:1024-2048
y:0-768

x:2048-3072
y:0-768

x:3072 - 4096
y:0-768

......

x:0-1024
y:768-1536

x:1024-2048
y:768-1536

x:2048-3072
y:768-1536

x:3072 - 4096
y:768-1536

......

....

....

....

....

......

The width of this new grid is 1million / 1024 = 977
The height of this grid is 1million / 768 = 1303

The data of each cell on above grid will be store on hard disk, based on the left top position of the view port, we can easily identify data in which cell we need to access.  the formula is quite simple
maximum we will need to load in four cells of data.  think about it.

In this way, the is not too big anymore, so we don't need to bother whether to use linked list or binary tree,  or whatever,  for linked list, it is not design for fast search, it is for easy expandable. while binary tree, it good for search, but maintenance cost is very high. obviously we need to balance

while in this case , I will say , use an array or a hash map to persist the page grid, which is easy to search and maintain.

No comments:

Post a Comment