BioPerl Tutorial: Reading multiple GenBank Records with Bio::DB BioPerl Module

bioperl tutorialIn the last BioPerl Tutorial we saw how to retrieve single GenBank Sequence using Bio::DB::GenBnak module.

In this one, we will analyze how to do it for multiple GenBank Sequences.

NOTE: To install BioPerl, learning how to code in BioPerl or using BioPerl modules, see BioPerl Tutorial series index.

 Creating query:

NOTE: Executing queries need active internet connection.

Now, to query GeneBank database, we would use Bio::DB:Query:GenBank Module. We will look for Maturase in title for all the GenBank queries in Rose (scientific name Rosa) and having sequence length up to 1000. The code of the query will be:

$gnbnk_qry = "Rosa[ORGN] AND Maturase[TITL] and 0:1000[SLEN]";

$gnbnk_qry is the name of variable which will store query

[ORGN] -> Origin is a query identifier

[TITL] -> Title is a query identifier

[SLEN] -> Sequence Length is a query identifier

0:1000 is range of sequence length (you can change the range)

For all query identifier, you can visit NCBI site.

 Creating Query Object

Once we have created the query, we need to now create a query object. I will be naming it $obj_gbqry. The code for that will be:

$obj_gbqry = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $gnbnk_qry );

obj_gbqry is Query Object

Bio::DB::Query::GenBank is query module used for creating object

new() operator used to create new query object

db and query are arguments being passed while creating query object

nucleotide and gnbnk_qry are values of queries telling Perl to use nucleotide database and to retrieve sequences on base of query in $gnbnk_qry

 Retrieving Sequences

Though we have queried the sequences but we have to retrieve the sequences. For this purpose we will be creating GenBank object $obj_gbnk. The code will be:

$obj_gbnk = Bio::DB::GenBank->new;

Using Stream Object

We need to use stream object and get_Stream_by_query() to retrieve sequences in series. In other words, stream object helps in retrieving more than one sequence objects. The code will be:

$stream_obj = $obj_gbnk->get_Stream_by_query($obj_gbqry);

$stream_obj is stream object

get_Stream_by_query() is method telling to get one object(s) based on query objects and store them in genbank object $obj_gbnk

 Printing Sequences

Once we have retrieved sequences, now we need to print the retrieved sequences. For this we need to use sequence object. I will be limiting printing to couple of identifiers.

The code will be:

print $obj_seq->accession_number, "\t", $obj_seq->length, "\n";

But as we have more than one sequence to be printed, we will use while loop. The code will be:

while ($obj_seq = $stream_obj->next_seq)
print $obj_seq->accession_number, "\t", $obj_seq->length, "\n";

This will execute the while loop till there are present sequences in stream object.

The complete code will be:

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$gnbnk_qry = "Rosa[ORGN] AND Maturase[TITL] and 0:1000[SLEN]";
$obj_gbqry = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $gnbnk_qry );
$obj_gbnk = Bio::DB::GenBank->new;
print "Accession No\tLength\n";
print "------------\t------\n";
$stream_obj = $obj_gbnk->get_Stream_by_query($obj_gbqry);
while ($obj_seq = $stream_obj->next_seq)
print $obj_seq->accession_number, "\t", $obj_seq->length, "\n";

Screenshot of code:
bio db genbacnk 1

The output will be:bio db query genbank 2

I hope you enjoyed this BioPerl Tutorial too. Feel free to practice and ask questions on BioPerl Programming and Tutorials.

About Brij

Bhrat Brij is a SEO expert, Internet Marketer, Affiliate Marketer and Bioinformatician. My short bio or have a look on my Google Profile


  1. Jack Simpson says:

    Thank-you so much for this amazing set of tutorials! This is the first time I’ve been able to understand the BioPerl module. I cannot thank you enough!!!!

Speak Your Mind