Equals column lineage

We already have great functionality for displaying relationships from JOIN and WHERE clauses.

But that's not all.

Remember what we learned in school? If we know that x = y, we can substitute one for the other anywhere. Right? Now, take a look at this query:

SELECT T1.C1 FROM
  T1 JOIN T2 ON T1.C1 = T2.C3

We get T1.C1 as the lineage result, correct?

But T1.C1 = T2.C3, which means this query is equivalent to:

SELECT T2.C3 FROM
  T1 JOIN T2 ON T1.C1 = T2.C3

See what's happening here? The lineage of the upstream column T2.C3 is hidden from your view!

Have you ever encountered a tool that reveals this to you? Sure, you'll see that there's a dependency on table T2 at the object level. But no details. Good luck debugging that!

It gets even worse! If x = y and y = z, then x = z. Right?

SELECT C1 FROM
  T1 
    JOIN T2 ON C1 = C3
    JOIN T3 ON C3 = C5;

You get the point…

All of this becomes even more complicated when you add mathematical operations, function calls, type conversions, unions…

Can we see additional data lineage generated by the equality conditions in JOIN and WHERE?

Sure! Here's what the full data lineage would look like for the examples above:

Last updated