AspGD Help: Pattern Matching
PatMatch permits the identification of patterns or motifs within
the collection of all AspGD protein or DNA
sequences. The pattern can be either a simple string or a regular
expression. Standard substitutions are allowed in the string, such as
using "R" for any purine base when performing a nucleotide
search. Pattern matching offers an alternative to sequence alignment
techniques such as BLAST for identifying nucleotide or
peptide sequences with conserved or biologically interesting regions.
AspGD offers a selection of sequence datasets that can be
searched, depending on the user's requirements.
- Complete sequence of chromosomes
- ORF DNA, Coding Sequences of defined ORFs (DNA)
- ORF DNA, Genomic (Coding and Introns) Sequences of defined ORFs (DNA)
- ORF DNA, 1000 bp Up and Down stream of Genomic (Coding and Introns) Sequences of defined ORFs (DNA)
- Intergenic DNA, Sequence between ORFs and other annotated genomic features (DNA)
- ORF Proteins, Translations of defined ORFs (Protein)
Tips for Pattern Matching:
- The pattern may be lowercase or uppercase. There is no maximum or
minimum pattern size.
- A description of the allowed syntax of the pattern is provided at
the bottom of the Pattern Matching page.
- The Strand option
is used for restricting NUCLEOTIDE searches to only
one strand of the specified dataset. The default is that both strands are
searched. If the "Strand in dataset" option is chosen, then only the
strand that is actually present in the dataset will be searched.
Choosing "Reverse complement of strand in dataset" restricts the PatMatch
search to the reverse complement of the strands described above.
Please note that in the displayed sequence, only the Watson strand will be
shown, regardless of which strand option is chosen. If your pattern has a
match on the Crick strand, the reverse complement of the pattern will be
highlighted in the Watson sequence.
- The Mismatch, Deletion or Insertion options will permit matches to
sequences that contain a defined number of substitutions, deletions or insertions relative to the input pattern. This number can range from 1 to 3. At this time, patterns containing regular expressions do not support the mismatch, deletion and insertion options.
- When searching for patterns near the beginning or end of a sequence, bear in mind that nucleotide sequences will include the stop codon (TAA, TAG,
or TGA) and start codon (5' ATG). Peptide sequence will include the initiator methionine,
whether or not it is removed in vivo.
- The sequences with hits are listed in the table based on the number of
the hits and sequence name.
- At this time, PatMatch will not find overlapping hits.
If a PatMatch search results in no or few matches, the user may try to
increase the number of matches in a number of ways. Going back to the
PatMatch search page, the user can change the database searched, use a less selective pattern, or increase the number of allowed mismatches, deletions or insertions.
Aborting a PatMatch Search
To abort a search, the user should click on the button labeled "Click
here to abort the search", which will actually stop the process
running on the AspGD server. This is better than hitting the "Back"
button on the browser, since otherwise the AspGD computer will continue
to process the search request.
PatMatch can be accessed:
- by selecting the "PatMatch" hypertext link on the tool bar at the top of most AspGD
- by selecting the "PatMatch" hypertext link on the sidebar
displayed on the left-hand side of the home page and index pages
- by selecting the "Pattern Matching" link under Specialized Gene
and Sequence Searches on the Search Options index page
Go to PatMatch
- BLAST Search Page
AspGD Copyright © 2008-2013 The Board of Trustees, Leland
Stanford Junior University, and Broad Institute.
use the information contained in this database was given by the
researchers/institutes who contributed or published the
information. Users of the database are solely responsible for
compliance with any copyright restrictions, including those applying
to the author abstracts. Documents from this server are provided
"AS-IS" without any warranty, expressed or implied.
To cite AspGD
please use the following
: Cerqueira GC, Arnaud MB, Inglis DO, Skrzypek MS, Binkley G, Simison M, Miyasato SR, Binkley J, Orvis J, Shah P, Wymore F, Sherlock G, Wortman JR (2013). The Aspergillus
Genome Database (AspGD): multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations. Nucleic Acids Res. 42(Database issue)
How to cite AspGD.