In the last BioPerl Tutorial we saw how to retrieve single GenBank Sequence using Bio::DB::GenBnak module.
In this one, we will analyze how to do it for multiple GenBank Sequences.
NOTE: To install BioPerl, learning how to code in BioPerl or using BioPerl modules, see BioPerl Tutorial series index.
Creating query:
NOTE: Executing queries need active internet connection.
Now, to query GeneBank database, we would use Bio::DB:Query:GenBank Module. We will look for Maturase in title for all the GenBank queries in Rose (scientific name Rosa) and having sequence length up to 1000. The code of the query will be:
$gnbnk_qry = "Rosa[ORGN] AND Maturase[TITL] and 0:1000[SLEN]";
$gnbnk_qry is the name of variable which will store query
[ORGN] -> Origin is a query identifier
[TITL] -> Title is a query identifier
[SLEN] -> Sequence Length is a query identifier
0:1000 is range of sequence length (you can change the range)
For all query identifier, you can visit NCBI site.
Creating Query Object
Once we have created the query, we need to now create a query object. I will be naming it $obj_gbqry. The code for that will be:
$obj_gbqry = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $gnbnk_qry );
obj_gbqry is Query Object
Bio::DB::Query::GenBank is query module used for creating object
new() operator used to create new query object
db and query are arguments being passed while creating query object
nucleotide and gnbnk_qry are values of queries telling Perl to use nucleotide database and to retrieve sequences on base of query in $gnbnk_qry
Retrieving Sequences
Though we have queried the sequences but we have to retrieve the sequences. For this purpose we will be creating GenBank object $obj_gbnk. The code will be:
$obj_gbnk = Bio::DB::GenBank->new;
Using Stream Object
We need to use stream object and get_Stream_by_query() to retrieve sequences in series. In other words, stream object helps in retrieving more than one sequence objects. The code will be:
$stream_obj = $obj_gbnk->get_Stream_by_query($obj_gbqry);
$stream_obj is stream object
get_Stream_by_query() is method telling to get one object(s) based on query objects and store them in genbank object $obj_gbnk
Printing Sequences
Once we have retrieved sequences, now we need to print the retrieved sequences. For this we need to use sequence object. I will be limiting printing to couple of identifiers.
The code will be:
print $obj_seq->accession_number, "\t", $obj_seq->length, "\n";
But as we have more than one sequence to be printed, we will use while loop. The code will be:
while ($obj_seq = $stream_obj->next_seq)
{
print $obj_seq->accession_number, "\t", $obj_seq->length, "\n";
}
This will execute the while loop till there are present sequences in stream object.
The complete code will be:
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$gnbnk_qry = "Rosa[ORGN] AND Maturase[TITL] and 0:1000[SLEN]";
$obj_gbqry = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $gnbnk_qry );
$obj_gbnk = Bio::DB::GenBank->new;
print "Accession No\tLength\n";
print "------------\t------\n";
$stream_obj = $obj_gbnk->get_Stream_by_query($obj_gbqry);
while ($obj_seq = $stream_obj->next_seq)
{
print $obj_seq->accession_number, "\t", $obj_seq->length, "\n";
}
Screenshot of code:

I hope you enjoyed this BioPerl Tutorial too. Feel free to practice and ask questions on BioPerl Programming and Tutorials.




















Thank-you so much for this amazing set of tutorials! This is the first time I’ve been able to understand the BioPerl module. I cannot thank you enough!!!!