Analyzing Gels with DeCyder EDA
DeCyder
Extended Data Analysis (EDA) should be used to analyze gels in experiments
with multiple experimental groups or to gather more detailed data about
the proteins and spot maps in other experiments. EDA can also be used
to find biochemical pathways and gene ontology information about proteins
of interest, as well as to identify a small group of biomarkers from
the total set of differentially expressed proteins.
Preparing for
EDA Analysis
Creating an EDA Workspace
Creating the Base Set
Filtering the Base Set
Switching between Sets
Calculating Differential
Expression
Viewing the Differential
Expression Results
Filtering the Data by Differential
Expression
Performing Principal Components Analysis
(PCA)
Interpreting the Principal Components Analysis
Performing Hierarchical Cluster
Analysis
Interpreting Hierarchical
Cluster Analysis
Performing Partition Cluster Analysis
Interpreting Partition Cluster
Analysis
Entering Protein Names into EDA
Querying Protein Databases for more
Information
-Preparing for EDA Analysis
- EDA
Analysis is performed by combining several BVA workspaces. In order
to ensure a smooth import of data from EDA to BVA, the following conventions
should be kept in BVA:
- Use
the experimental design block in BVA to put each spot map into its
proper experimental group (Standard, Control, Treatment_1, etc.) Maintain
consistency of these groups for each BVA workspace to be analyzed
by EDA
- Select
one "master" gel from the entire experiment and use it to
match the gels in each of the BVA workspaces. For the BVA workspace
that already contains this gel, import it a second time and match
all other gels to it, including itself. For the experimental design,
the spot maps for this gel should be put in the "Unassigned"
group in all cases.
- The
master gel must remain the same in all BVA workspaces!! This means
that if two spots are merged on the master gel in one BVA workspace,
they must also be merged on the master gel in all other BVA workspaces,
whether they match the other gels or not.
-Creating an EDA Workspace
- In
the "Step 1 - Workspace" section, select Create Workspace.
- Select
the proper project in the first column of the menu that pops up.
- Select
the BVA workspaces in the second column to be analyzed in EDA.
- Click
the Add button, then click Create.
-Creating the Base Set
- If
all gels in the entire experiment (across all BVA workspaces) used
one common internal standard, select Automatic in the "Step 3
- Base Set" section at the bottom left and then skip to Filtering the Data.
- If
all of the gels did not share one common internal standard, select
Manual in the "Step 3 - Base Set" section at the bottom
left to normalize the spot maps.
- Click
the normalization tab at the top of the menu.
- Select
any experimental group common to all the workspaces (e.g., Control)
in the first pull-down menu. Select the workspace where the master
gel is also assigned to an experimental group as the reference workspace
for the second pull-down menu. Click Apply Normalization. It will
take at least a few minutes for the normalization to complete. Close
this window when completed.
- In
the Experimental Design section in the right column, select one of
the experimental groups, click Edit Group, and assign it to a color
by clicking on the color block. Do this for each experimental group,assigning
each group a different color.
-Filtering the Base Set (optional but recommended)
- Select
Calculations in the workflow diagram at the top of the screen.
- Select
Filter Set in the left column.
- In
the Protein Filter section, select the dropdown criteria "% of
spot maps where protein is present", select ">=",
and input a value for the cutoff. For example, if 75 (recommended)
is used, EDA will exclude any proteins found in less than 75% of the
spot maps.
- In
the Spot Map Filter section, select the dropdown criteria "Remove
unassigned spot maps". Click Add.
- Click
Create Set at the bottom of the window, and give the new set an identifying
name, such as "Filtered Set".
Note: Filtering the base set is not mandatory
but will produce more robust and consistent analysis.
-Switching between Sets
- To
switch between sets in any menu, click the dropdown menu under "Select
Set:" in the left column, choose the desired set, and click Select
Set.
-Calculating Differential Expression
- Click
the Calculations button in the workflow chart at the top of the page
and switch to the Filtered Set.
- Select
the Differential Expression Analysis calculation.
- Create
settings for the calculations as follows:
o
If the experiments compare the same patient/animal under different conditions,
select "Paired tests"; otherwise, select "Independent
Tests".
o
If you are interested in comparing two experimental groups (e.g. Control
and Treated), check the Average ratio and Student's t-test boxes. Select
the desired groups (use Ctrl-click to combine multiple groups), name
the calculation, and click Add to List.
o
If you are interested in looking for differential expression among multiple
experimental groups, follow the directions above but also check the
One-way ANOVA box and check the "Calculate multiple comparison
tests" box.
Note: Only one differential expression
calculation is accessible at any time for a given set. Creating a new
calculation will delete the old calculation.
-Viewing
the Differential Expression Results
- Click
Results in the workflow diagram and click the Differential Expression
Analysis tab. The lower section of this page, shown below, is a table
with tabs for the proteins and the spot maps. The Protein tab contains
the average ratio, t-test score, and ANOVA score for each protein.
To view the multiple comparison tests (if any), select the protein
in the table. The multiple comparison test results will be shown in
the upper section of this page, along with a plot of the log standard
abundances for each spot map, split up by experimental group.
- The
protein table can be sorted by clicking the header of the column by
which you wish to sort (e.g. click "T-test" in the table
to sort the proteins from highest to lowest t-test score; click it
again to sort from lowest to highest).
- The
settings for both the table and the plot can be altered by clicking
the Settings Dialog Box as shown in the picture shown below.
-Filtering the Data by Differential Expression
- You
should now filter the proteins to include only those that show differential
expression for further analysis. The proteins can be filtered by either
t-test and/or ANOVA. They can also be filtered by average ratio, but
this should be used only in addition to ANOVA and/or t-test, and not
in place of them.
- While
still in the Results/Differential Expression Analysis tab, select
Filter Set in the right column. Under protein filter, select a filter
criteria from the dropdown menu and set the desired threshold for
statistically significant differential expression (e.g., "Student's
T-test", "<", 0.05". Click Add.
- Add
as many filters as desired in the same manner. Use the "AND all"
radio button to only include proteins which satisfy all filter conditions;
use the "OR all" radio button to include proteins which
satisfy any one of the filter conditions.
- To
remove a filter, select the condition and click Remove.
- Click
Create Set and give the set an identifying name and color, such as
"Differential Expression".
-Performing Principal Components Analysis (PCA)
- PCA
uses plots to identify protein outliers, if any, and to confirm that
protein expression is consistent throughout multiple samples from
the same experimental group.
- Click
on Calculations in the workflow diagram.
- Switch
to the set containing the differential expressed proteins.
- Click
on Principal Components Analysis under "Select Calculation"
in the left column.
- Select
the upper left radio button, which plots proteins against spot maps.
Give this calculation an identifying name, such as "Proteins
- Spot Maps" and click Add to List.
- Select
the lower left radio button, which plots spot maps against proteins.
Give this calculation an identifying name, such as "Spot Maps
- Proteins" and click Add to List.
- Click
Calculate in the right column.
-Interpreting the Principal Components Analysis: Proteins - Spot Maps
- The
Proteins - Spot Maps calculation will perform statistical checks to
identify if there are any protein outliers, i.e., spots that have
been mismatched or are not true proteins).
- For
most purposes, only the plot on the left side is important, and is
thus the only plot shown below.
- Each
dot represents a protein. The circle on the plot represents a 95%
confidence interval within which the expression of a true protein
is expected to lie based on the general expression patterns in the
spot maps. Any protein that falls outside this interval, like the
one indicated by the red arrow above, should be checked in BVA to
make sure it is properly matched and a true protein. There is not
necessarily anything wrong with this protein, however; good proteins
that have especially strong differential expression often end up outside
the 95% area.
- To
look at a protein in BVA, click on the dot, click on a spot map in
the table, and go to Tools à Open Source in the menu
bar.

- If any proteins are actually artifacts or are mismatched, exclude
them from the set by Ctrl-clicking to select them in the plot and
then clicking Create Set in the left column. Select the radio button
for Removing Selection in the Proteins section, and select the radio
button for Including All in the Spot Maps section. Give the new set
an identifying name (such as Differential Expression - Filtered) and
click Create.
-Interpreting the Principal Components Analysis: Spot Maps - Proteins
- The
Spot Maps - Proteins calculation will plot the spot maps on a graph
based on the expression patterns of the selected proteins. It is used
to make sure that multiple trials of the experiment are producing
consistent results.
- Once
again, the important plot is the one on the left hand side of the
results, so this is the only plot discussed here and shown below.
- Each
dot on this plot represents a spot map, based on the expression of
the proteins in the set used for the calculation (this should be the
Differentially Expressed - Filtered set). Each dot will be colored
with the color of the experimental group it belongs to, assuming that
the groups were assigned unique colors in the experimental design
(see "Creating the Base Set").
- If
the experiment is producing consistent results, spot maps from the
same experimental group should be located in the same general area,
i.e., contained mostly in one quadrant or half of the plot.
- If
a spot map is located outside the 95% ellipse, or is located far away
from the other spot maps in its experimental group (an example is
indicated by the red arrow in the picture above), then you may want
to exclude this spot map from further analysis. It is recommended
to look at the gel in your imaging software. If there are widespread
gel abnormalities or streaking, the gel should be excluded in the
same manner as the proteins were in the previous section. Otherwise,
it is acceptable to keep the gel in the analysis.
- This
calculation can also be used to make some rudimentary conclusions
about the experimental groups. If the locations of the spot maps in
two or more experimental groups overlap a great deal (as with the
green and orange spot maps in the image above), then the samples in
these groups show few, if any, differences in protein expression,
and the groups are likely very similar. On the other hand, if the
spot maps in two or more experimental groups are found in mostly distinct
areas of the plot (as with the blue and red spot maps in the image
above), these samples show many consistent differences in protein
expression, and the groups are likely fundamentally groups. However,
these conclusions can and should be refined with further analysis.
-Performing Hierarchical Cluster Analysis
- Hierarchical
Cluster Analysis will group proteins together in a hierarchical tree
by analyzing the expression patterns of the proteins in the different
spot maps. Proteins showing similar expression patterns will be located
near each other in the tree, while those with different expression
patterns will be farther away. This can also be done with spot maps.
The result of this analysis is a heat map that will be discussed in
the following section, "Interpreting Hierarchical Cluster Analysis".
- Click
Calculations in the workflow diagram at the top of the page. Make
sure that the filtered set of differentially expressed proteins is
selected as the current set.
- Click
Pattern Analysis in the left column.
- Select
the Hierarchical Clustering Algorithm.
- Select
the upper left radio button to cluster the proteins based on their
expression in the spot maps.
- Give
the calculation an identifying name, such as "Proteins",
and click Add to List.
- Select
the lower left radio button to cluster the spot maps based on the
expression of the proteins in them.
- Give
the calculation an identifying name, such as "Spot Maps",
and click Add to List.
- Click
Calculate in bottom of the right column.
-Interpreting the Hierarchical Cluster Analysis
- Click
Results in the workflow diagram at the top of the screen. Then click
the Pattern Analysis button and click the Hierarchical Cluster Analysis
tab.

- The image above shows a screenshot of the Hierarchical Cluster Analysis
results. In the heat map, each horizontal row represents one protein
(with the protein number at the end of the row). Each vertical column
represents one spot map (with the details underneath the column).
As a result, each rectangular box, such as the one highlighted in
pink in the image, represents the expression of that row's protein
in that column's spot map as compared to the internal standard. A
green color means that the expression decreased compared to the standard,
and a red color means that the expression increased compared to the
standard - the brighter the color, the stronger the change, as seen
by the scale to the lower left of the heat map.
- If
the cursor is held over a box, the numerical change in expression
will appear under the heat map scale.
- The
trees to the left of the heat map and above the heat map show the
relationships between the proteins and between the spot maps, respectively.
Proteins that split off into different parts of the tree near the
root of the tree are less related than those that split off near the
heat map. For instance, in the image above, the tree at the left splits
the proteins near the root into two main groups. Looking at the tree,
the protein indicated by the pink box is the last protein in the first
group, while the proteins below it all belong to the second group.
- Spot
maps can be looked at in the same way. Generally, one would expect
spot maps from the same group to be clustered closely together, as
is the case with almost all of the red "control" spot maps
in the image above. As with the PCA, this is a useful tool to ensure
that the experiment is producing consistent results, and it can also
be used to identify possible cases of individual variation.
- The
buttons to the upper right of the heat map allow the user to change
some settings. On the top row, the leftmost button will show the view
shown above and the middle button will shown only the heat map (and
make it bigger). The rightmost button allows the user to change the
scale of the heat map, which is visually useful if most changes are
only on the order of 50%. The bottom row of buttons allow the user
to zoom in or out on the heat map.
Note: The image above for the hierarchical
cluster analysis comes from a different experiment than the image for
the PCA. Otherwise, each spot map shown on the PCA image will correspond
to a particular spot map in the hierarchical cluster analysis - these
can be matched using the Spot Map table in the bottom half of the screen
if desired.
-Performing
Partition Cluster Analysis
- Partition
Cluster Analysis groups proteins into clusters that show similar expression
patterns across the different experimental conditions. Spot maps or
experimental groups can also be clustered but this is not useful for
most experiments.
- There
are three methods for partition cluster analysis. The first, Kmeans
clustering, generates a certain number of disjoint clusters based
on the spread of the data. The second, Gene Shaving, generates a large
number of small clusters, more than one of which can contain the same
protein. The third, Self-Organizing Maps, tends to generate a small
number of large clusters, and is not preferred.
K-means
- Select
the Kmeans Algorithm.
- Select
the upper left radio button to cluster proteins according to expression
in spot maps.
- Give
the calculation an identifying name, such as "KMeans", and
click Add to List.
- Click
Calculate at the bottom of the right column.
Gene Shaving
- Select
the Gene Shaving Algorithm.
- Select
the upper left radio button to cluster proteins according to expression
in spot maps.
- Give
the calculation an identifying name, such as "Gene Shaving",
and click Add To List.
- Click
Calculate at the bottom of the right column.
-Interpreting Partition Cluster Analysis
- Click
Results in the workflow diagram at the top of the screen. Then click
the Pattern Analysis button and click the Partition Cluster Analysis
tab. Use the pulldown menu to select the desired calculation (Gene
Shaving, KMeans, etc).
- An
example screen of the results from partition cluster analysis is shown
below. Summaries of each cluster (expression pattern, number of proteins,
and "q" score) are displayed in small boxes in the main
window. A detailed chart of the expression of each protein in the
cluster is also shown in the main window. A table showing each protein
in the selected cluster is located below the main window.
- The
"q" score, shown above each of the clustery summaries, is
a measure of the similarity between the expression of all the proteins
in the cluster. The maximum q-score is 100, so the closer the q-score
is to 100, the stronger the pattern between all of those proteins.
The q-score has no meaning as a comparative value between different
clusters, though; it is merely an expression of the precision of an
individual cluster.
- Proteins
that are in the same cluster can be significant in several ways. First,
if proteins that are next to each other are clustered together, this
indicates that these are the same protein, and strengthens the belief
that this protein is differentially expressed. Second, if proteins
show similar expression patterns, it is possible that they either
serve similar functions, are affected by similar substances or events,
or belong to the same biochemical pathway. All of these possibilities
should be researched and analyzed further.

-Entering Protein Names into EDA
- Proteins
that are candidates for further analysis, both in EDA and in future
experiments, should be picked from a gel and identified via mass spectrometry.
Once these proteins are known, their names and UniProt/NCBI accession
numbers should be entered into EDA.
- To
enter the name of a protein, click on it in a table and then click
Select in the box to the left of the table. A new window will open.
- Type
the name of the protein in the text box underneath name.
- Type
the accession ID in the text box underneath the appropriate database.
If the format of the ID is not valid, the text box will be outlined
in red and you will be unable to enter that ID.
-Querying Protein Databases for More Information
- EDA
has several tools that can be used to quickly acquire information
from protein databases and other sources about the proteins you have
identified.
- Click
Interpretation in the workflow diagram at the top of the screen.
- Be
sure that the final set of proteins is selected.
-
Click Create Query and select the desired query.
o
Select Gene Ontology to view the molecular functions, biological processes,
and cellular components associated with the identified proteins based
on the genetic motifs they contain.
o
Select Pathways to find any commonly recognized pathways in which any
of the identified proteins take part.
o
Select UniProt Features to obtain a summary table with general information
about taken from UniProt about each of the identified proteins. This
information includes function, pathways, family, and cellular location.
o
Select PubMed to retrieve information and articles about the identified
proteins from PubMed. This is only available with a connection to discoveryHub.