The main objective of the course is to provide an introduction and
overview to the fields of bioinformatics (computational biology). Broadly
speaking, bioinformatics can be regarded as the application of
computational techniques to the discovery of knowledge from biological
data. In narrower sense it can be said that bioinformatics covers the
areas in which such techniques have proved particularly productive. Most
importantly this applies to the analysis of biochemical sequences
(protein, DNA and RNA) and structures, analysis and reconstruction of
biochemical networks, and phylogeny reconstruction. Another important
aspect is data management and data mining in biological databases.
The course generally will be oriented towards the computational aspects
of bioinformatics, however the main databases and software tools will also
be considered. The topics covered will include: sequence alignment,
structure alignment, protein structure prediction, phylogeny
reconstruction, reconstruction of biochemical networks, methods for data
mining and classification. Course will also give a brief introduction to
the main methods used for obtaining biochemical data (e.g. DNA sequence
information) together with related algorithmic problems.
The course will consist of the distance learning part - one month
before the course students will get the pre-reading and certain home task
and during the meeting in person will have lectures of the Baltic and
Nordic teachers as well as the practical training, the presentation of
home works and group discussions.
Online home readings
1 Basic use of the NCBI Blast
database search programs
See
http://www.matfys.kvl.dk/bioinformatik/databases2.pdf
2 Pairwise sequence alignment. Global versus local alignment,
linear and affine gap costs, the dynamic programming matrix, traceback.
The optimizations and approximations performed in the Blast
implementation.
See
http://www.matfys.kvl.dk/bioinformatik/pairwise-1.pdf
and
http://www.matfys.kvl.dk/bioinformatik/pairwise-2.pdf
3 Multiple sequence alignment. The intractibility complexity of the
problem, heuristics such as progressive alignment, The origin of
substitution matrices (BLOSUM, PAM).
4 A taste of phylogeny. Distance-based (UPGMA, neighbour-joining),parsimony-based.
For some older slides on 3 and 4, see
http://www.dina.kvl.dk/~sestoft/tmp/multiplealignment.pdf
5
Michael S. Waterman. Introduction to Computational Biology: Maps,
Sequences and Genomes Chapman & Hall/CRC; Lst ed. edition (June 1, 1995).
ISBN: 0412993910, P.5-26
6
Arthur M. Lesk. Introduction to Bioinformatics. Oxford University Press;
(May 1, 2002). ISBN: 0199251967. P.189-198.
7
Arthur M. Lesk. Introduction to Bioinformatics. Oxford University Press;
(May 1, 2002). ISBN: 0199251967. P.207-225.
8
Pavel A. Pevzner. Computational Molecular Biology: An Algorithmic Approach
Bradford Books; 1st edition (August 21, 2000). ISBN: 0262161974.
P.153-173.
9
Pavel A. Pevzner. Computational Molecular Biology: An Algorithmic Approach
Bradford Books; 1st edition (August 21, 2000). ISBN: 0262161974.
P.175-187.
10 Lecture notes of the course "Algorithms for Molecular Biology" read
by Ron Shamir at Tel Aviv University School of Computer Science.
http://www.math.tau.ac.il/~rshamir/algmb/01/algmb01.html
NEW!
11 Sections 2.1-2.5 of Durbin et al:
Biological Sequence Analysis, Cambridge University Press 1998.
http://www.dina.kvl.dk/~sestoft/bsa/Durbin-chap2.pdf
12 Sections 6.1-6.4 of Durbin et al:
Biological Sequence Analysis, Cambridge University Press 1998.
http://www.dina.kvl.dk/~sestoft/bsa/Durbin-chap6.pdf
13 Pages 1-36 of William R Pearson: Protein
Sequence Comparison and Protein Evolution ISMB2000 Tutorial, 53 pages.
http://www.people.virginia.edu/~wrp/papers/ismb2000.pdf
(This file is pretty difficult to print for some printers, though.)