Melina2 Help Page

 

‚PD What can you do with Melina2?

 

1) comparison among several motif elucidation programs

Melina2 is a tool which finds potential DNA motifs with a user friendly interface. It runs a maximum of 4 programs from 5 motif prediction tools (Consensus, MEME, Gibbs Sampler, MDScan, Weeder), and returns graphical representations of their results.

 

2) search against known motifs or genome sequences

Melina2 constructs a weight matrix based on predicted motifs. A user can search possible similar motifs based on these predictions in the database of transcriptional regulation in Bacillus subtilis (DBTBS) or the JASPAR database. Furthermore, with the weight matrix, users can find possible predicted sites in human, mouse, B. subtilis, or E. coli promoter sequences.

 

3) update in version 2

E   Two prediction programs (MDScan and Weeder) were added, and one (Coreseqnch) was removed because its functionality was essentially the same as Consensus.

E   Users can select 4 statuses in 5 programs with any parameters, and see the results in one glance.

E   The predicted motifs can be illustrated by SequenceLogo.

E   Users can find similar motifs in known transcription factor binding sites motifs in DBTBS or JASPAR.

E   Predicted motif search in several genome sequences is also available.

 

 

‚QDHow to use Melina2

 

1) welcome screen

You can find this screen (Figure 1) when you access Melina2.

You can run Melina2 in 3 simple steps. In fact, the second step can be omitted if you run programs with default parameters.

(Step 1)@Input query sequences (FASTA format) (Figure 2)

(Step 2)@Set programs and parametersiYou can omit this stepj

(Step 3)@Just run, and get results.

 

 

(Step 1)@Input query sequences

There are three ways to load data into Melina2.

 

1. use sample data

If you click esamplef you can try Melina2 with an example. You can use either 200 bases x 10 sequences or create random sequences using our random sequence generator.

2. use input window

You can copy & paste directly into the input window.

3. load your FASTA formatted file

 

 

(Step 2) Set programs and parameters

 

1. choose programsioptionalj

You can choose the motif prediction programs. As default Consensus, MEME, Gibbs sampling, and MDscan are selected.

 

2. set parametersioptionalj

Default parameters are not always the same with the original ones supplied by original authors so that the search conditions should be (essentially) the same for all algorithms. We decided a set of new default parameters to make the search condition as similar as possible each other: in principle, (1) the motif length is around 8 bases; (2) both strands are searched; and (3) multiple occurrences are allowed for each sequence. The details are shown in the below table. Fields shown in red mean the places where the values were changed.

@ strand #motifs size background
Consensus both 0 or more 10 NA
MEME both any 6-10 NA
Gibbs both 0 or more 10 NA
Weeder both some, each 6,8,10 H. sapiens
MDscan both 0 or more 10 Query

 

To set parameters for each programs, click eParamf in the eQueryf tab, or click on the eParameterf tabiFigure‚Rj

 

 

In the eParameterf tab, click eParamf for each program to set its parameters. (Figure 4)

Parameters are different for each program, and critical parameters are shown with bold characters. If you click edefaultf, you can revert to our default parameters at any time (they are not always the same with the values set by original authors; see above for more details). Please see the references for more details on each programfs parameters.

 

 

 

 

(Step 3)@Run and get results.

After clicking eSubmitf, you will see the message gLoading XMLh , which means your job is running. If you would like to access your results later, keep the Job ID from the eResultsf tab. iFigure‚T(a)j

 

 

After Melina2  finishes motif detection, the results are displayed as shown in Figure 6. Positions of detected motifs are illustrated with colored arrows in the upper window. If you click a motif in the upper window, you can obtain more details in the lower window.

If you would like to know the details for each programs results,

click the eRawf button next to each program.

 

1) result screen

Detected motifs can be useful to search the motifs in a database of known binding sites or in genome sequence.

 

 

1. Finding similar motifs in known transcription factor binding sites.

Weight matrices of detected motifs are automatically constructed to perform similarity search in JASPAR or DBTBS. The results are shown with SequenceLogo (Figure 7).

 

2. Putative binding sites search in genome sequences.

User can search putative motifs in human, mouse, B. subtilis, or E. coli genome sequences (Figure 8).

 

           

 

‚RDIntroduction of each program and its main parameters

1) Consensus

     Consensus was developed by Stormo et al.[2]. It searches motifs with weight matrix based on information contents.

 

2) MEME

MEME (Multiple Expectation maximization for Motif Elicitation), which was developed by Bailey et al. [3], predicts no-gap motifs in more than 1 sequences by Expectation maximization algorithm. Even though the original MEME was developed for both amino acids and nucleic acid sequences, Melina2 can accept only DNA sequences. One of three motif models can be selected as a parameter.

 

@@OOPS (one occurrence per sequence) model :@1 sequence has 1 motif (default).

@@ZOOPS (zero or one occurrence per sequence) model :@1 sequence has 0 or 1 motif.

@@TCM (two-component mixture) model :@1 sequence has 1 or more than 1 motifs.

 

3) Gibbs Sampling

Gibbs Sampling, which was developed by Lawrence et al. [4], finds statistically significant motifs in given sequences. For example, it first removes one sequence from N given DNA sequences, and randomly selects a motif randomly from each remaining sequences.  All random motifs are aligned to construct a weight matrix. At the same time, background nucleic acids frequencies are calculated. Gibbs Sampling then calculates the probability of the motifs with the background model in the previously omitted sequence. It searches again in the remaining sequences based on this probability, and set the position of the motifs.

Then, another sequence is removed and the same steps repeated. Initially, the motifs should be random and meaningless; however, the possibility to detect likely motif is increased if a real motif is chosen by chance. These motifs usually converge to few motifs step by step.

 

4) MDScan

     MDscan, developed by Liu et al [5], searches putative motifs with weight matrices. 

@

5) Weeder

     Weeder, invented by Pavesi et al [6], automatically detects motifs in several DNA sequences.

 

 

4. About the 'GenomeDB' function
 

Using a detected motif from Melina II, users can search for occurrences of similar patterns in the upstream (e.g., promoter) regions of all genes in one of 6 species (see below). The search is performed using the 'hmmsearch' program from the HMMER package developed by Sean Eddy (http://hmmer.janelia.org/).  

The parameters used are E-value: 10^-5 and SequenceScore: 0.5.  

 

Currently, the following sequences are available.  

   

@ #sequences based on upstream downstream total source
human 30964 TSS 1000 200 1200 DBTSS
mouse 19021 TSS 1000 200 1200 DBTSS
A. thaliana 30480 1st ATG 1000 0 1000 NCBI
S. cerevisiae 5850 1st ATG 800 0 800 NCBI
E. coli 4289 1st ATG 300 0 300 NCBI
B. subtilis 4100 1st ATG 300 0 300 NCBI

 

*** Human & mouse  

The sequences come from DBTSS (http://dbtss.hgc.jp). The sequences range from 1000 bp upstream to 200 bp downstream of representative transcriptional start sites (TSS) based on DBTSS version 5.2 information.

Note: all alternative promoters in DBTSS are also included.  

 

*** Arabidopsis thaliana  

The sequences are constructed based on the entries NC_003070.5, NC_003071.3, NC_003074.4, NC_003075.3, and NC_003076.4 from NCBI. They consist of the 1000 bp upstream of translational start sites (1st ATG).

 

*** Saccharomyces cerevisiae  

The sequences are constructed based on the entries NC_001133.6, NC_001134.7, NC_001135.4, NC_001136.8, NC_001137.2, NC_001138.4, NC_001139.7, NC_001140.5, NC_001141.1, NC_001142.6, NC_001143.7, NC_001144.4, NC_001145.2, NC_001146.5, NC_001147.5 and NC_001148.3. They consist of the 800 bp upstream of translational start sites (1st ATG).

 

*** Escherichia coli & Bacillus subtilis  

The sequences are constructed based on the entries NC_000913.2 and NC_000964.2 from NCBI. They consist of the 300 bp upstream of translational start sites (1st ATG).  

 

 

5. Caution

 

E   There is no limitation for input file size; however, the performance depends on the userfs computer environment.

E   We will not take any responsibility for the use of Melina  and Melina2, including but not restricted to hardware problems or data loss.

 

 

6DReferences

 

[1]@Poluliakh, N., Takagi and T., Nakai, K. (2003)  MELINA : motif extraction from promoter regions of potentially co-regulated genes. Bioinformatics 19(3), pp.423-424

[2]  Stormo, G.D. and Hartzell, G.W. (1989)  Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA, 86, pp.1183-1187

[3]  Bailey, T.L. and Elkan, C. (1994)  Fitting a mixture model by expection maximization to discover motifs

in biopolymers. In Proceedings of 2nd International Conference on Intelligent Systems Molecular Biology.

pp.28-36.

[4]  Lawrence , C.E., Altschul, S.F., Boguski, M.S., Neuwald, A.F., Liu, J.S. and Wootton, J.C. (1993)

Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, pp.208-214

[5]@Liu, X.S., Brutlag, D.L. and Liu, J.S. (2002) An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation. Nature Biotech, pp.835-839

[6]  Pavesi, G., Mereghetti, P., Mauri, G. and Pesole, G. (2004) WeederWeb: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research, 32 W199-W203