database - Storing lots of duplicate strings. Looking for good hash function to save storage. -


i looking hash function save on storage space while storing lot of duplicate strings in database.

i have db have store date rate description xxx yyyy zzzz

100s of millions of rows; description max 1k string. description string has high repeat; many duplicates. avoid (wasted) storage thinking of doing this

table 1 date rate desc_id

table 2 desc_id description

approach 1: desc_id == md5 hash --> db has primary key on descid; app generates hash; db writes fast (that thinking)

approach 2: desc_id == db generated id; unique key has description; db writes might slower above approach.

question1: should stick md5 or there better algos out there? worthwhile go sha-x functions theoretically better collision avoidance, while taking hit on storage d compute time?

question2. should consider going approach 2?


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

android - Associate same looper with different threads -

visual studio 2010 - Connect to informix database windows form application -