Tag Archives: CS models

Cross Joins

Cross joins are used to return every combination of rows from two tables, this sometimes called a Cartesian product. In SQL Server you can use the CROSS JOIN keywords to define a cross join.

Cross joins can be cause out of memory issues in Big Data engines like spark when users are allowed to cross join on large datasets (i.e. millions of rows) as each partition has limited memory.

At the fundamental level Cross join can be thought of as nested for loop, where the code traverses thru the inner and outer join. The key issue will be how the memory is handled. If the code is good at allocating and deallocating memory at the inner loop ideally or at-least in the outer loop, the issues of OOM errors can be mitigated, however if we are relying on a external system like GC to manager memory, we can run into OOM issues.

for ( i = 0; i < 100M; i++)

{

allocate memory

for ( j=0; j < 100M; j++)

{

allocate memory

<Operations>

deallocate memory

}

deallocate memory

}

Leave a comment

Filed under Uncategorized