Database Search Notes

Scansite database searches can be restricted in several ways to keep the results more targeted and relevant to specific experiments, as described below. All these restriction categories are optional, however; for the most general results, simply leave all the fields blank in this category. You may want to select the organism class at least (Mammals, Vertebrates, etc.), or "All Organisms", which will cover all class categories as well as uncategorized entries.
Organism class
Only the Mammals group here is a "class" in the strict taxonomic sense; the rest are convenient groups used for research purposes. The definitions are as follows:

MammalsMembers of class Mammalia.
VertebratesMembers of sub-subphylum Vertebrata, including class Mammalia.
InvertebratesAll members of superkingdom Eukaryota excluding Vertebrates, Plants, and Fungi.
PlantsMembers of kingdom Viridiplantae.
FungiMembers of kingdom Fungi.
Archaea/BacteriaMembers of superkingdoms Archaea and Eubacteria.
VirusesMembers of superkingdom Viridae.
AllAll organisms, including those not fitting into any other category above (e.g., plasmids, synthetic sequences).


Single species
This text box can be used more broadly than its name suggests, allowing the results to be restricted to one species, a few species with similar names, or one genus. Regular expressions are permitted here. Both "Saccharomyces cerevisiae" and "S.* cerevisiae" will yield the same results, for example; "S.*accharomyces" will yield results for both Saccharomyces cerevisiae and Schizosaccharomyces pombe. "Mus" will return any organism containing those three letters (including Thermus aquaticus), while "Mus .*" will return everything in genus Mus (which is more likely what you want. See regular expressions for more syntax optiions. Some common species names are listed below.


Molecular weight range
The search can be restricted by molecular weight range, expressing values in daltons, i.e., "20000" to "50000" for 20 kDa to 50 kDa. Specifying only the first value is interpreted as a lower limit, and specifying only the second value indicates an upper limit. The molecular weights used are calculated from the amino acid sequence assuming a physiological pH (7.0 to 7.4), and do not account for prosthetic groups or posttranslational modifications.

Isoelectric point range
An isoelectric point range can be specified, such as 4.5 to 6.5, which can be used in conjunction with 2D gel electrophoresis to predict a spot location or identify an existing one. The algorithm used is from ExPASy's Compute pI/Mw program, which was kindly provided by Elisabeth Gasteiger (see also Bjellqvist, B., Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F., Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences, Electrophoresis 1993, 14:1023-1031). We have added the option to include phosphorylations, using pKa = 2.12 for the first ionization and pKa = 7.21 for the second.

Phosphorylated sites
Because of the importance of phosphorylated states in signaling pathways, you can specify that you expect up to three phosphorylated sites in the results, which will influence the molecular weight and isoelectric point values calculated.

Keyword search
A phrase entered here will be searched for in the "description" field of each database, as well as in the additional "keyword" field found in SwissProt and TrEMBL records. Regular expressions are allowed here. Possible examples are "nucleotide-binding", "oxidase", "kinase", "iron-sulfur", "ORF", or "hypothetical protein". This feature is most useful in the well-annotated SwissProt records; for other databases, desired results may be excluded accidentally because of nondescript "description" fields.

Sequence contains
The search can be restricted to proteins containing a desired subsequence or consensus sequence. Regular expressions are allowed here. "P..P" specifies two prolines separated by any two residues, for example, while "P.*P" allows any number of residues between the prolines (even zero).

Regular expressions
All the text input fields in this search allow regular expressions, which are familiar to users of Perl and Unix shell scripts. For example, you can search for a sequence like MKRAALP in the simplest way by specifying it exactly in the input field, "MKRAALP", but if the two alanines can really be any two residues, then search for "MKR..LP" (the period means any character). If there could be more or less than two residues between the MKR and LP parts, use "MKR.+LP" (the plus sign means one or more of the preceding period wildcard). Using "MKR.*LP" will match even zero residues between the MKR and LP parts. If the first alanine really is required, but the second one could alternatively be valine or isoleucine, search for "MKRA[AVI]P" -- any of the residues in brackets can be in that position. A sequence containing one or more repeats of LVM in a row would be "(LVM)+". There are some differences between regular expression syntax as used in different environments. Scansite uses the syntax of MySQL's REGEXP operator, for which the complete usage rules are as follows:

^Match the beginning of the string only.
$Match the end of the string.
.Match any single character, including a space or return character.
[...]Match any character appearing between the brackets.
[^...]Match any character not appearing between the brackets.
e*Match zero or more instances of pattern element e.
e+Match one or more instances of pattern element e.
e?Match zero or one instances of pattern element e.
e|fMatch pattern element e or pattern element f.
e{m}Match m instances of pattern element e.
e{m,}Match m or more instances of pattern element e.
e{,n}Match zero to n instances of pattern element e.
e{m,n}Match m to n instances of pattern element e.
(...)Group pattern elements into a single element.
otherAll other characters match themselves.

To search for one of the special symbols [, ], $, ^, +, and so on (in the keyword search, for example), use a backslash first: "\[", "\]", "\$", "\^", "\+", etc. This indicates that you want the literal character rather than its syntactic meaning. Longer discussions of regular expressions can usually be found as a chapter in a Perl book, or on the web (try here or here).