Help:Why are there Culex proteins with Xs

From VectorBase Help System

Jump to: navigation, search


We are aware that three Culex proteins contain Xs in there sequences:

  • CPIJ008116-PA
  • CPIJ015880-PA
  • CPIJ017233-PA

Their gene sequence also contain a stretch of Ns, because of spanning a gap between two contigs. Sometimes the gene has been extended, over a gap, to find a STOP codon (e.g.: CPIJ008116) - trying to find the nearest stop codon was one of the automatic gene build. Sometimes there are blast hits also spanning the gap so it seems correct to have the gene prediction on two contigs (e.g.: CPIJ015880).

More details:


CPIJ008116

Last exon span a gap: http://cpipiens.vectorbase.org/Genome/ContigView/?region=supercont3.184&vc_start=517666&vc_end=530937

It contains a long stretch of Ns then 15 bases, finishing by a stop codon: http://cpipiens.vectorbase.org/Genome/ExonView/?db=core;transcript=CPIJ008116-RA

All the blast hits stop at the beginning of the gap so the gene should stop before the gap - the automatic pipeline probably tried to find the nearest STOP codon. In this case it would have been better to stop the gene earlier and tag it as incomplete.

It's also likely that a small exon is missing in the middle.


CPIJ015880

There is a gap at the beginning: http://cpipiens.vectorbase.org/Genome/ContigView/?l=supercont3.741:195400-197160

In this case the Blast hits overlap the gap so it is likely to be true. We can't really create an intron here and the blast hits clearly show that there is continuity.

When looking at it along Aedes, it could be a pseudogene: http://cpipiens.vectorbase.org/Genome/MulticontigView/?bottom=%7Copt_tblat%3Aon&s1=aa&w=21761&c=supercont3.741%3A196280%3A1&h=aael006972%7Ccpij015880&w1=30171&c1=supercont1.230%3A141114%3A1

But this is nor obvious anymore when looking along Anopheles or Drosophila: http://cpipiens.vectorbase.org/Genome/MulticontigView/?gene=CPIJ015880;context=10000;s1=Anopheles_gambiae;g1=AGAP008427 http://cpipiens.vectorbase.org/Genome/MulticontigView/?gene=CPIJ015880;context=10000;s1=Drosophila_melanogaster;g1=CG7246


CPIJ017233

Again, the first exon span a gap: http://cpipiens.vectorbase.org/Genome/ContigView/?region=supercont3.892&vc_start=158969&vc_end=165467

Some of the blast hits, but not a majority, also span this gap.

The blast hit against Aedes is really nice at the beginning (so it was worth spanning the gap): http://cpipiens.vectorbase.org/Genome/AlignView/?class=Homology;gene=CPIJ017233;g1=AAEL003701

.... but the one for Anopheles and Drosophila is not that nice: http://cpipiens.vectorbase.org/Genome/AlignView/?class=Homology;gene=CPIJ017233;g1=AGAP009913 http://cpipiens.vectorbase.org/Genome/AlignView/?class=Homology;gene=CPIJ017233;g1=CG9426

Personal tools