Bigquery: Check for duplications during stream -

- April 15, 2015

we have data generated our devices installed on clients' side. duplicated data exist , design, means wouldn't able eliminate duplicated ones in data generating phase. looking possibility avoid duplication while streaming bigquery (rather clean data doing table copy , delete later). that's say, every ready-to-be-streamed record, check whether it's in bigquery first, if not continue stream in, if exist, won't stream in.

but here's concern: (quote [here]:https://developers.google.com/bigquery/streaming-data-into-bigquery)

data availability

the first time streaming insert occurs, streamed data inaccessible warm-up period of 2 minutes. after warm-up period, streamed data added during , after warm-up period queryable. after several hours of inactivity, warm-up period occur again during next insert.

data can take 90 minutes become available copy , export operations.

our data go different bigquery tables (the table name dynamically generated data's date_time). "the first time stream insert occur" mean? per table?

does above doc mean cannot rely on query result check duplications in process of streaming?

if provide insert id, bigquery automatically deduplication you, long duplicates within de-duplication window. official docs don't mention how long de-duplicatin window is, 5 minutes 90 minutes (if write data table, closer 5 90, if data trickled in, last longer in deduplication buffers.).

regarding "the first time streaming insert occurs", per table. if have new table , start streaming it, may take few minutes data available querying. once you've started streaming, however, new data available immediately.

Search This Blog

Back

Bigquery: Check for duplications during stream -

data availability

Comments

Post a Comment

Popular posts from this blog

c# - HttpResponseMessage System.InvalidOperationException -

sql - Postgresql error: "failed to find conversion function from unknown to text" -

how to remove index.php file from url in codeigniter? -