The goal of this practical session is to learn common ways to visualize, filter, analyse and cluster clones on the Vidjil web application. These clones may have been computed by the Vidjil algorithm or by any other algorithm.
0. Connect to the public server (https://app.vidjil.org), either with your account or the demo account (firstname.lastname@example.org / demo), select the Demo LIL-L3 (tutorial) patient. If you don’t see it, search for #Demo in the top-left search box. Then click on the bottom right link, see results: multi-inc-xxx. Do not open the Demo LIL-L3 (analyzed) patient: this one contains the complete analysis. The Vidjil web application opens.
This patient (patient 063 from Lille study on the feasibility of MRD using HTS) suffering T-ALL has one diagnosis sample, with dominant clones both in IGH and TRG, and four follow-up samples, including a relapse.
1. In the settings menu, try the various options for sample key. The five samples can be labeled by their name, their date of sampling or by the number of days after the first sample.
In the following sections, we focus on the diagnosis sample. The section 5 will deal with the comparison of several samples.
The Vidjil web application allows to run several “RepSeq” (immune repertoire analysis) algorithms. Each RepSeq algorithm has its own definition of what a clone is (or, more precisely a clonotype), how to output its sequence and how to assign a V(D)J designation. The number of analyzed reads will depend on the algorithm used. This sample has been processed using the Vidjil algorithm.
The percentage of analyzed reads can range from .01 % (for RNA-Seq or capture data) to 98-99 % (for very high-quality runs mostly on Illumina).
2. How many reads have been analyzed in the current sample with the embedded algorithm ?
In the upper left corner, you can see an information panel with analyzed reads 1 967 338 (82.31 %)’
Now we will try to assess the reason why some reads were not analyzed in our sample. This might reflect a problem during the sequencing protocol…or that could be normal. For that sake you will need to display the information box by clicking on the i in the upper left part.
3. What are the average read lengths on IGH? and on TRG?
In the Analysis log row under "av.len"
IGH -> 314.5
TRG -> 197.6
The lines starting with UNSEG display the reasons why some reads have not been analyzed.
You can see what those reasons mean in the online documentation of the algorithm:
4. What are the major causes explaining the reads have not been analyzed? Also have a look at the average read lengths of these causes. Do you notice something regarding the average read lengths?
1. The algorythm was not able to find a V or a J for most of the unsegmeneted reads.
2. The may be too short to cover enough of the V or J genes to be detected.
Each RepSeq algorithm has its own definition of what a clone is (or, more precisely a clonotype), and on how to output its sequence and how to assign a V(D)J designation.
In this file, the most abundant clone is IGHV3-9 7/CCCGGA/17 J6*02.
5. Select this clone, either by clicking on the list or on the grid. How many reads do this clone represent? (see again the bottom part to the right)
the bottom panel display information about currently selected clones -> 189 991 reads (9.665 %)
There are several options to display the V(D)J designation.
6. In the settings menu, under N regions in clone names select length to show N zones by their length. Revert to the default sequence (when short) setting to show the full N on short sequences.
7. Try also the options alleles in clone names : by selecting always, the clone V gene is displayed as IGHV3-9*01. Revert to the default when not *01. This setting, which is the default, allows to have a more condensed V(D)J designation that doesn’t make the *01 appear (it is implicit).
By default Vidjil displays the 50 most abundant clones at each time point. With five time points, we may therefore have from 50 to 250 clones displayed depending if the top 50 are always the same or always different or, more realistically, in-between. This number can be increased to a maximum of 100 clones by going to the filter menu and by putting the slider to its right end.
8. Notice how the IGH smaller clones percentage (second clone displayed in the list) changes. What was its initial value? What is it now?
filter set to 50 -> IGH smaller clones 10.11 %
filter set to 100 -> IGH smaller clones 8.92 %
The smaller clones correspond to clones that are not displayed because they are never among the most abundant ones.
Consider the most abundant clones in the list: IGHV3-9 7/CCCGGA/17 J6*02 and TRGV10 13//5 JP1. Usually we may want to tag them in order to remember them later on.
9. Click on the star and choose colored tags for these two clones, such as clone 1 or clone 2. Notice how the color applies throughout all the views.
Later you may want to filter clones depending on the tags you have chosen.
10. In the upper left part, click on the little dark gray square (the second coloured square starting from the right). What happens? What if you click again?
This is a way of filtering some clones. This may be useful when we want to focus on some specific clones. Another way of doing so is to filter them by their gene names or by their DNA sequences.
11. In the search box, enter GGAGTCGGGG and validate with Enter. How many sequences are left? Note that the search is performed both on the forward and the reverse strand.
12. Check that by searching for the reverse complement of the sequence: CCCCGACTCC. Do you find the same results as previously?
13. How can you cancel this filter and view again all the clones?
Another solution to tag a specific clone is to rename it.
14. Double click on the name of a clone (in the list of clones) and choose another name (e.g. interesting clone) and validate using Enter.
After this rename, you can see that the clone is still selected.
15. Click on several clones by holding the Ctrl key to select more. Each time you add a new clone to the selection, its sequence is added in the bottom part.
16. How many clones are selected? How many reads do those clones represent?
17. Notice the star at the the right of the screen, near the number of reads. You can also tag clones using this icon. In that way, you will be able to tag all the selected clones at once.
18. When you want to focus on the selected clones, you can click on the focus link on the right, next to the number of selected clones. This feature is useful when you want to analyse some clones more thoroughly without being annoyed by other clones.
19. To remove this focus, click on the cross next to the search box, above the list.
20. To unselect them all, you can click in an empty area on the top or bottom plot.
Sometimes, one wants to hide noisy or unrelated clones.
21. Select a clone or several clones and click on the hide button, near the focus button. Show again these clones by clicking on the cross next to the search box.
It is also possible to filter samples that do not contain a clone. When you have lots of samples it helps to keep the sample of interest. Here the number of sample is quite limited, so the feature may appear less useful.
22. Click on the
By selecting this, the samples where this clone doesn’t appear are hidden. This is useful for instance to assess the contamination among dozens of samples.
The first thing to be done is to see if some clones should be clustered (because of sequencing or PCR errors for instance). This step could be automatized but, in any case, the automatic clustering would need to be checked by an expert eye.
By default in the bottom plot (the
The sequences of the clones now appear in the bottom part of the browser (the
Then, the sequences in the sequence panel can be visually compared but you can also align them to see more easily their similarities.
Now it is the user’s expertise to determine if sequences are sufficiently similar, depending on her or his specific question. If some sequences don’t appear to be similar enough, you can remove them from the sequence panel by clicking on the cross in front of the sequence in the sequence panel.
Now all the sequences in the sequence panel should be highly similar. All their differences could be due to sequencing or PCR errors. These artifacts (mutations, homopolymers, insertions, deletions) depend on the sequencer and the PCR technique.
All the clustered sequences now appear within a same clone. That can be seen in the list: the
clone which hosts the subclones appears with a
As you may have noticed the subclones appear again in the grid. You can compare their sequences again if you’d like (for example to double check that you were right to cluster them). You can also remove some subclones from the cluster by clicking on the cross at their left in the list.
As a proxy to sequence similarity we used the V and J genes, however there are other ways to assess sequence similarity that may be more pertinent. Moreover you may want to plot other metrics on the lymphocyte population. For instance we can choose to plot the V genes versus the length of the N insertions.
Then you can continue aligning and clustering clones if necessary.
Note that you can choose any axis to be plotted: just go the
There is still a feature to help you analyse your data that we have not explored yet. You can
change the colors to make it represent some variables of interest with the
Using those different features you should be able to analyse how similar your sequences are, and potentially you could cluster them if you’d like or tag them.
Some clones may be less trustable than other ones… Let’s see how to spot them.
There may have several reasons:
You can view those values for any clone by clicking the
First make sure to come back to the preset
If you want to focus on specific locus, you can click on the locus name in the upper left part.
One click will make the locus disappear, another one will make it appear again. If you hold the
You can also change the current locus by clicking on the locus name in the right part of the grid.
Sometimes you may include spike-ins in your sample to allow a more reliable quantification. Let us assume that the main clone with IGHV-3-9 / IGHJ5 is a spike-in whose expected concentration is 1% (.01).
Then you can set expected concentrations for other clones and you are free to switch between those normalizations. It is also possible to set up normalization against external data, contact us if you are interested.
We will see how to make the best use of the patient and sample database and how to use it efficiently. For this sake you need an account with the rights to create new patients, runs, sets, to upload data and, preferably, to run analyses. Therefore the demo account is not suitable.
You should now have three files. We will imagine that those three files are the results from a
single sequencing run. More precisely, each one corresponds to a single patient. Thus we now
want to upload those files and assign all of them to a same
Note that usually you should check whether the patient has already been created by searching her/his name in the search box at the upper left corner
Now you should have three lines with Patient 1, Patient 2, Patient 3 and one line with Run 1. If you created too many lines you can remove some by clicking on the cross at the right hand side.
The last field is optional but it is very important (the field called
Here you can enter any information relevant to this set of samples. More specifically you can enter tags (starting with a #) that will allow you to search very easily and quickly all the patients/runs/sets sharing this tag. By default when you enter a # in this field, some tags appear and the suggestions are updated while you enter other characters. Note that a tag cannot contain any space. Also note that you can create other tags just by entering whatever you would like in the field preceded with a #. Thus any tag you enter is saved (and can be suggested later on).
Now the three patients and the run have been created but we have not uploaded the sequence files yet.
Similarly to the patient/run creation page, we can add as many samples as we want on this page.
In our case we want to add each sample to a different patient. Thus we don’t need to modify this field.
Now you are back on the page of the run where you should see the three samples that are being uploaded.
You can have a coffee, a tea, or something else, while the process is launched.
Then you can view the results as explained before. Instead we will remain on the server.
When switching the time point, the views dynamically update which allows to easily track the changes along time. Also note that the number of analyzed reads differ from the previous point. We can again analyse the reason why some reads were unsegmented.
We will look now at how the V gene distribution evolves along the time.
By doing so you can look at how the V distribution changes along the time. Of course you can also change the data displayed in the grid to look at the evolution of another information.
We remind that by default at most 50 clones are displayed on the time graph. However the
remaining of the application usually displays the 50
If you have many samples, you may wish to reorder the samples.
You may also want to compare two samples, either to check a replicate, to check for possible contaminations, or to compare different research or medical situations.
Another option is to directly plot a log-log curve comparing two samples.
For some studies, VDJ designations are very important. In the list and in the sequence panel, those designations are written in their short form.
We can double check this designation with other popular software.
Note that data returned by IMGT/V-QUEST is available by clicking on the
It may happen the software makes a mistake in the VDJ designation. In such a case you’re very welcome to report us the problem and we will try to improve the designation algorithm.
Even if you do not use the
Suppose that you would like to change the VDJ designation shown on the web application.
Beware: the modifications you made (name changes, clusters, clone tagging, sample reordering…)