sql server - SQL join to correlated subquery where tables are related by overlapping ranges -
i have following table structure:
item
id | name -------- 1 | apple 2 | pear 3 | banana 4 | plum 5 | tomato
event
itemstart | itemend | eventtype | eventdate -------------------------------------------- 1 | 2 | planted | 2014-01-01 1 | 3 | picked | 2014-01-02 3 | 5 | eaten | 2014-01-05
the 2 tables linked primary key of item , range of itemstart , itemend (inclusive) in event. events refer contiguous sequences of items, not events given item have same range. events never occur on same date given item.
the query i'd produce follows:
list items, , each item show recent event
sample output:
id | name | event | date ---------------------------- 1 | apple | picked | 2014-01-02 (planted picked) 2 | pear | picked | 2014-01-02 (planted picked) 3 | banana | eaten | 2014-01-05 (picked eaten) 4 | plum | eaten | 2014-01-05 (eaten) 5 | tomato | eaten | 2014-01-05 (eaten)
this seems reasonable enough on face of it, , if there traditional foreign-key relationships in place (imagine itemid
instead of itemstart
, itemend
) i'd join correlated subquery this:
select name, eventtype, eventdate item inner join ( select itemid, eventtype, eventdate event e eventdate = (select max(eventdate) event e_max e_max.itemid = e.itemid) ) latest_events on i.id = latest_events.itemid
however, range relationship in place i'm stuck, want more doesn't work:
select name, eventtype, eventdate item inner join ( select itemstart, itemend, eventtype, eventdate event e eventdate = (select max(eventdate) event e_max i.id >= e_max.itemstart , i.id <= e_max.itemend) ) latest_events on i.id >= latest_events.itemstart , i.id <= latest_events.itemend
i error i.id >= e_max.itemstart , i.id <= e_max.itemend
on line 6, because can't reference i
within part of join. wanted (which isn't necessary in simpler example) because when i'm constructing subquery no longer have single id link - overlapping ranges mean there many possible ways of including single item, , want refer directly item, id available in top-level item table.
i hope makes sense.
i'm using sql server 2008 r2. report run overnight, speed not important be, there many items (100s of million); while there multiple events against each item, use of large ranges means there much fewer event records.
things i've thought about:
- somehow expanding out item/event relationship events recorded against individual items. result in significant increase in amount of data under consideration, allow simpler query approach.
- somehow process events constrain or consolidate ranges - if knew given item events had same start , end perhaps simplify things. not thought through.
how can produce query? in advance!
you can utilizing cte
, row_number()
.
;with cte ( select *, row_number() on (partition i.id order e.eventdate desc) rnum item join event e on i.id between e.itemstart , e.itemend ) select id, name, eventtype, eventdate cte rnum = 1
basically cte has joined item , event , added new column rownumber , partitioned on item.id. here's screenshot of looks like. here select rnum = 1 should max event date each item.id.
Comments
Post a Comment