database - Storing lots of duplicate strings. Looking for good hash function to save storage. -
i looking hash function save on storage space while storing lot of duplicate strings in database.
i have db have store date rate description xxx yyyy zzzz
100s of millions of rows; description max 1k string. description string has high repeat; many duplicates. avoid (wasted) storage thinking of doing this
table 1 date rate desc_id
table 2 desc_id description
approach 1: desc_id == md5 hash --> db has primary key on descid; app generates hash; db writes fast (that thinking)
approach 2: desc_id == db generated id; unique key has description; db writes might slower above approach.
question1: should stick md5 or there better algos out there? worthwhile go sha-x functions theoretically better collision avoidance, while taking hit on storage d compute time?
question2. should consider going approach 2?
Comments
Post a Comment