MOTIF SCANNER TUTORIAL
METHOD OF DATA ENTRY
Before you can begin you must choose how you will provide the Motifscanner with a sequence to work with.
There are two ways to submit a protein for Scansite to search. You can submit a protein sequence through an on-line database, or you
may paste in the sequence yourself.
In this tutorial, I will explain the process of submitting a protein that is stored in a database such as Genpept, SWISS-PROT, or
TREMBL.
1. PROTEIN SUBMISSION

This first page requires you to do several things:
First enter the ACCESSION NUMBER of the protein you wish to investigate in the space provided.
If you do not know the ACCESSION number of your protein, you may click on the appropriate links to open a new window to search either
the Genpept or the SWISS-PROT and TREMBL databases for the specific accession number of your protein of interest.
The default database choice setting is SWISS-PROT. However, if your protein accession number is particular to Genpept, then
you must change the database choice setting accordingly. If you do not choose the correct database, you will get an error message
stating that the sequence has not been found.
The Search Options portion of the submission page will allow you to change some of the parameters of the search.
The default setting is to search for all motifs on the highest stringency. However you may change the parameters
to limit the search to a smaller subset of motifs by removing the check option for all motifs and selecting the desired
motifs from the list provided.
You can change the sensitivity of the detection algorithm by selecting from the three stringency options: high, medium, and low.
A high stringency setting limits the motifscanner to only show you the candidate motifs that have scores that fall in the
top 0.2% of scores within the whole SWISS-PROT vertebrate database. Medium stringency has a threshold limit of the top 1%, while
low stringency has a threshold limit of the top 5%.
After you are satisfied with the protein and parameters you have chosen, you may then go ahead and submit the query to
the motifscanner program. A small window will pop-up advising you to wait while the protein is being processed. This takes
from 1-5 minutes, depending on the time of day and the speed of the internet connection. When queries are submitted, the
motifscanner first retrieves the sequence from the appropriate database, then forwards it to Pfam for domain prediction.
Once Pfam returns the results of the domain prediction, the motifscanner then scans through the protein for candidate motifs
that the user has selected. Once these three levels of processing have been completed, the results will then be displayed
and the small window will disappear.
2. GRAPHIC RESULTS

After submitting your protein of interest, the results page will show you
a linear diagram of the protein with the domains predicted by Pfam.
Sites found within the protein are drawn above the diagram, showing the name,
residue, and position of the site.
A plot of the surface accessibility is below the protein diagram. This is a running
average of the hydrophilicity according to Emini et. al.
Clicking on the Domain Info button will forward you to the Pfam website at Washington University in St.Louis.
This site will provide you with detailed information on the domains found in your protein.
Below the Domain Info button is an option for changing the stringency of the search and resubmitting.
Clicking on the image itself will take you to the extended results page.
3. EXTENDED RESULTS

The extended results page will show you a complete listing of all the sites found in
your search. It will also inform you of the motifs that you selected for your search but were
not found in this particular protein.
The position, score, percentile ranking, sequence, and exact surface acessibility value are listed
for all putative sites found within the protein. Scores closer to 0.00 reflect less divergence from the
optimal motif predicted by the peptide library screening for that particular kinase or binding protein.
Thus, a score of 0.00 suggests a perfect match to the optimal motif, while a score of 1.234
suggests that the site found is not quite as close to the optimal motif.
A link to GeneCard is provided to get more information on the kinase or binding motif for which the site was found.
Clicking on the Score will open a new window showing you a histogram of all the proteins searched for the
particular motif selected.
Clicking on the Sequence will open another new window displaying the site within the complete protein sequence and comparison
of the sequence with sequences of actual targets for that particular kinase or binding protein.
Clicking on the 'View Ratio Composition of Amino Acids' link will open a new window and allow you to browse through the sequence for
other motifs of interest to you.
4. HISTOGRAM

The histogram window will show you where the site's score falls within the range of scores calculated for the whole SWISS-PROT
vertebrate database.
There are several pieces of information here:
The right-hand box shows the name of the database search for this histogram (in this case, for Lck), the section of
SWISS-PROT searched (vertebrate), the total number of proteins in this section, and the total number of scored sites for this
section. It will also show you the site sequence, the position of the site within the protein, and the corresponding percentile
ranking.
The histogram itself will display to you the score distribution curve (red) and the cumulative percentile curve (blue). The mean
score is included along with the standard deviation and z score. The location of the query sequence score is displayed and the
exact score shown on the histogram.
5. SITE AND TARGETS COMPARISON
This window shows you the complete protein sequence. The exact location of the center residue of the query motif sequence is
highlighted.
There is a link to BLAST, which will provide you with a multiple sequence alignment to check for conversation of the
motif within the same protein in different species or within related protein family members in the same species.
Scrolling down further will give you a list of actual targets with their sequence scores and percentile ranks for comparison with your
candidate motif of interest. Clicking on the 'More Info' link will take you the target's corresponding Genpept or SWISS-PROT data file.
6. RATIO COMPOSITION DISPLAY
The Ratio Composition feature shows you the frequency of amino acids surrounding Ser, Thr, or Tyr. It will also allow you to browse
through the sequence looking for particular motifs of interest, specifically those amino acids surrounding Ser, Thr, or
Tyr.
You can select which center residue you want by clicking on either 'Serine','Threonine', and 'Tyrosine' and you can select
another amino acid surrounding the center residue from within the grid.
The output will appear in the bottom frame, highlighting the motifs you selected, if they exist in the protein.