python - Entrez epost + elink returns results out of order with Biopython -


i ran today , wanted toss out there. appears using the biopython interface entrez @ ncbi, it's not possible results (at least elink) in correct (same input) order. please see code below example. have thousands of gis need taxonomy information, , querying them individually painfully slow due ncbi restrictions.

from bio import entrez entrez.email = "my@email.com" ids = ["148908191", "297793721", "48525513", "507118461"] search_results = entrez.read(entrez.epost("protein", id=','.join(ids))) webenv = search_results["webenv"] query_key = search_results["querykey"]  print entrez.read(entrez.elink(webenv=webenv,                           query_key=query_key,                          dbfrom="protein",                          db="taxonomy"))  print "-------"  in ids:     search_results = entrez.read(entrez.epost("protein", id=i))     webenv = search_results["webenv"]     query_key = search_results["querykey"]      print entrez.read(entrez.elink(webenv=webenv,                           query_key=query_key,                          dbfrom="protein",                          db="taxonomy")) 

results:

[{u'linksetdb': [{u'dbto': 'taxonomy', u'link': [{u'id': '211604'}, {u'id': '81972'}, {u'id': '32630'}, {u'id': '3332'}], u'linkname': 'protein_taxonomy'}], u'dbfrom': 'protein', u'idlist': ['148908191', '297793721', '48525513', '507118461'], u'linksetdbhistory': [], u'error': []}] ------- [{u'linksetdb': [{u'dbto': 'taxonomy', u'link': [{u'id': '3332'}], u'linkname': 'protein_taxonomy'}], u'dbfrom': 'protein', u'idlist': ['148908191'], u'linksetdbhistory': [], u'error': []}] [{u'linksetdb': [{u'dbto': 'taxonomy', u'link': [{u'id': '81972'}], u'linkname': 'protein_taxonomy'}], u'dbfrom': 'protein', u'idlist': ['297793721'], u'linksetdbhistory': [], u'error': []}] [{u'linksetdb': [{u'dbto': 'taxonomy', u'link': [{u'id': '211604'}], u'linkname': 'protein_taxonomy'}], u'dbfrom': 'protein', u'idlist': ['48525513'], u'linksetdbhistory': [], u'error': []}] [{u'linksetdb': [{u'dbto': 'taxonomy', u'link': [{u'id': '32630'}], u'linkname': 'protein_taxonomy'}], u'dbfrom': 'protein', u'idlist': ['507118461'], u'linksetdbhistory': [], u'error': []}] 

the elink documentation (http://www.ncbi.nlm.nih.gov/books/nbk25499/) @ ncbi says should possible, passing multiple 'id=', doesn't appear possible biopython epost interface. has else seen or missing obvious.

thanks!

from bio import entrez   entrez.email = "my@email.com" ids = ["148908191", "297793721", "48525513", "507118461"] search_results = entrez.read(entrez.epost("protein", id=','.join(ids)))  xml = entrez.efetch("protein",                     query_key=search_results["querykey"],                     webenv=search_results["webenv"],                     rettype="gp",                     retmode="xml")  record in entrez.read(xml):     print [x[3:] x in record["gbseq_other-seqids"] if x.startswith("gi")]     gb_quals = record["gbseq_feature-table"][0]["gbfeature_quals"]     qualifier in gb_quals:         if qualifier["gbqualifier_name"] == "db_xref":             print qualifier["gbqualifier_value"]       # or list comprehension      # print [q["gbqualifier_value"] q in      #        record["gbseq_feature-table"][0]["gbfeature_quals"] if      #        q["gbqualifier_name"] == "db_xref"]   xml.close() 

i efetch query, , parse-like xml after read entrez.read(). things turn messy, , have dive xml-dict-list. guess there's way extract "gbfeature_quals" "gbqualifier_name" "db_xref" nicer mine... works (by now). output:

['148908191'] taxon:3332  ['297793721'] taxon:81972  ['48525513'] taxon:211604  ['507118461'] taxon:32630 

Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

visual studio 2010 - Connect to informix database windows form application -

android - Associate same looper with different threads -