Germline genes⚓︎
Vidjil-algo can use custom germline presets. This developer documentation focuses on updating or adding the default germline files.
The germlines are compiled with germline/split-germlines.py.
They come from various sources:
- IMGT/GENE-DB. See in particular the data updates
- Genomic sequences through the [NCBI E-utilities API]
- A few static files
It is advised to regularly retrieve the new sequences. However doing so may break some tests and requires some time and to fix things by hand.
On a feature-g/ branch⚓︎
We first prepare germlines on a feature-g branch.
First you need to retrieve the new germlines.
From the germline/ directory of Vidjil:
- run
make get-all-data - run
make diff-from-savedto see what changed since the previous release Take inspiration from this diff to write an insightful commit message. - when we add new features/germline pre-processing, we add tests to
germline/tests
It is also advised to work on tests on the algorithm (see below), but, at this stage, this is not enforced.
When a pipeline from a feature-g succeeds, a .tar.gz is uploaded to 2021-01-21.
On a feature-a/ branch⚓︎
- Put the new germline id in
germline/germline_id(and also ingermline/homo_sapiens.g) -
Then
make germlinewill retrieve fromthe new germlines -
From the root directory, run a
make testand possibly update the tests (and possiblymake diff-from-saved)
10-md5-germlines.should⚓︎
You also have to generate the md5 of the germline data. For that purpose:
# To be launched in the germline directory
rm -f ../algo/tests/should-get-tests/10-md5-germline.should
echo > ../algo/tests/should-get-tests/10-md5-germline.should
echo "$ Check md5 in germline/, sequences split and processed from germline and other databases" >> ../algo/tests/should-get-tests/10-md5-germline.should
md5sum */???[VDJ].fa | sed -r 's/^/1:/;s/\s+/ /g;' | sort -k 2 >> ../algo/tests/should-get-tests/10-md5-germline.should
echo >> ../algo/tests/should-get-tests/10-md5-germline.should
echo "$ Check md5 in germline/, other sequences" >> ../algo/tests/should-get-tests/10-md5-germline.should
md5sum */CD*.fa */???[VDJ]+{up,down}.fa */IGK-*.fa */TRDD[23]*.fa */IG*=*.fa | sed -r 's/^/1:/;s/\s+/ /g;' | sed "s/[+]/./" | sort -k 2 >> ../algo/tests/should-get-tests/10-md5-germline.should
# Then check the differences with 'git diff' and/or use 'git commit -p'
On some systems, md5sum should be replaced by md5 -r.