A trait describing join implementations that are based on a sort-merge join.
The type of the left RDD.
The type of the right RDD.
The type of data yielded by the left RDD at the output of the
join. This may not match T if the join is an outer join, etc.
The type of data yielded by the right RDD at the output of the
Performs a region join between two RDDs (shuffle join).
This implementation is shuffle-based, so does not require collecting one side into memory
like BroadcastRegionJoin. It basically performs a global sort of each RDD by genome position
and then does a sort-merge join, similar to the chromsweep implementation in bedtools. More
specifically, it first defines a set of bins across the genome, then assigns each object in the
RDDs to each bin that they overlap (replicating if necessary), performs the shuffle, and sorts
the object in each bin. Finally, each bin independently performs a chromsweep sort-merge join.
The 'left' side of the join
The 'right' side of the join
An RDD of pairs (x, y), where x is from leftRDD, y is from rightRDD, and the region
corresponding to x overlaps the region corresponding to y.