memory management - What is the best document storage strategy in NoSQL databases? -
nosql databases couchbase hold lot of documents in memory, hence enormous speed it's putting greater demand on memory size of server(s) it's running on.
i'm looking best strategy between several contrary strategies of storing documents in nosql database. these are:
- optimise speed
putting whole information 1 (big) document has advantage single information can retrieved memory or disk (if purged memory before). schema-less nosql databases wished. document become big , eat lot of memory, less documents able kept in memory in total
- optimise memory
splitting documents several documents (eg using compound keys described in question: designing record keys document-oriented database - best practice when documents hold information necessary in specific read/update operation allow more (transient) documents held in memory.
the use case i'm looking @ call detail records (cdr's) telecommunication providers. these cdr's go hundreds of millions typically per day. yet, many of these customer don't provide single record on each given day (i'm looking @ south-east asian market it's prepaid dominance , still less data saturation). mean typically large number of documents having read/update maybe every other day, small percentage have several read/update cycles per day.
one solution suggested me build 2 buckets, more ram being allocated more transient ones , less ram being allocated second bucket holding bigger documents. allow faster access more transient data , more slower 1 bigger document eg holds profile/user information isn't changing @ all. see 2 downsides proposal though, 1 can't build view (map/reduce) across 2 buckets (this couchbase, other nosql solution might allow this) , second 1 more overhead in managing closely balance between memory allocation both buckets user base growths.
has else being challenged , solution problem? best strategy pov , why? in middle of both strategies, having 1 document or having 1 big document split hundreds of documents can't ideal solution imo.
edit 2014-9-14 ok, though comes close answering own question in absence of offered solution far , following comment here bit more background how plan organise data, trying achieve sweet spot between speed , memory consumption:
mobile_no:profile
- this holds profile information table, not directly cdr. less transient data goes in here age, gender , name. key compound key consisting of mobile number (msisdn) , word profile, separated ":"
mobile_no:revenue
- this holds transient information usage counters , variables accumulating total revenue customer spent. key again compound key consisting of mobile number (msisdn) , word revenue, separated ":"
mobile_no:optin
- this holds semi transient information when customer opted program , when he/she opted out of program again. can happen several times , handled via array. key again compound key consisting of mobile number (msisdn) , word optin, separated ":"
connection_id
- this holds information specific a/b connection (sender/receiver) done via voice or video call or sms/mms. key consisting of both mobile_no's concatenated.
before these changes in document structure putting profile, revenue , optin information in 1 big document, keeping connection_id separate document. new document storing strategy gives me better compromise between speed , memory consumption split main document several documents each of them has important information read/updated in single step of app.
this takes care of different rate of changes on time data being transient (like counters , accumulative revenue field gets updated every cdr coming in) , profile information being unchanged. hope gives better understanding of i'm trying achieve, comments , feedback more welcome.
thank updating original question. correct when talking finding right balance between coarse grained documents vs. fine grained.
the final architecture of documents falls under particular business domain needs. have identify in use cases "chunks" of data needed whole , base stored documents shape on this. here high level steps need perform when design documents structure:
- identify document consumption use cases app/service. (read, read-write, searchable items)
- design documents (most end several smaller documents vs 1 big doc has everything)
- design document keys can coexists in 1 bucket different documents types (e.g. use namespace in key value)
- do "dry run" of resulting model against use cases see of have optimal (read/write) transactions nosql , required document data in transaction.
- run performance testing use cases (try simulate expected load @ least 2 times higher)
note: when design different docs ok have sort of redundancy (remember not rdbms normalized form) think of more object oriented design.
note2: if have searchable items outside of keys (e.g. search customers last name "starts with" , other dynamic search criteria) consider using elasticsearch integration cb or can try n1ql query language coming cb3.0.
it seems going in right direction splitting several smaller documents linked msisdn e.g.: msisdn:profile, msisdn:revenue, msisdn:optin. pay special attention last document type "a/b" connection. sounds might generate large volume , in nature transient...so have find out how long these documents have live in couchbase bucket. can specify ttl (time live) old docs auto-cleared up.
Comments
Post a Comment