Tuesday, March 18, 2014

Using UCSC Genome Browser Sessions to Organize and View MethBase Tracks

I would like to share a tip that uses the UCSC Genome Browser Sessions to organize and view MethBase tracks.

As of 03/18/2014, MethBase contains 2294 tracks, including tracks for methylation levels, read depths, HMRs, etc. For human alone, there are 1130 tracks. Since each one may be interested in different set of samples and different selection of features, and want to display them with specific settings, there is no single setting that satisfies everyone. Fortunately, the UCSC Genome Browser Sessions provides a nice feature for you to 1) manually tailor a set of tracks (e.g., brain related), 2) store them for future re-use, and 3) share them with others. With UCSC Genome Browser, each one will have a "personalized" view of MethBase.

Using the Session feature requires you to create an account with UCSC Genome Browser and is quite straightforward (http://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html). For example, here is a UCSC browser session that contains high coverage methylomes of normal cell samples from human (link).

Hopefully, more interesting and useful sessions, such as brain development oriented and cancer oriented, will be created and shared.

Wednesday, January 29, 2014

A Survey of Commercial Providers of Whole Genome Bisulfite Sequencing Services

These days I am working on an NIH proposal that involves whole genome bisulfite sequencing (WGBS). For the purpose of project budgeting, I surveyed several commercial providers of WGBS. The following list of companies provide WGBS service, including library preparation and high through sequence with Illumina machines. Most of them also provide additional bioinformatics service as well. Our project only requires library preparation and sequencing services, the quote from the lowest to highest are: BGI < SeqWright < NXT-dx < Glocal Biologics < Alpha Biolaboratory < ACGT. Illumia only offers service if the desired coverage of a single sample is above 30x, and I have not heared back
from Zymo Research yet.

Tuesday, December 10, 2013

A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics

Original link: http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081148
 
DNA methylation is implicated in a surprising diversity of regulatory, evolutionary processes and diseases in eukaryotes. The introduction of whole-genome bisulfite sequencing has enabled the study of DNA methylation at a single-base resolution, revealing many new aspects of DNA methylation and highlighting the usefulness of methylome data in understanding a variety of genomic phenomena. As the number of publicly available whole-genome bisulfite sequencing studies reaches into the hundreds, reliable and convenient tools for comparing and analyzing methylomes become increasingly important. We present MethPipe, a pipeline for both low and high-level methylome analysis, and MethBase, an accompanying database of annotated methylomes from the public domain. Together these resources enable researchers to extract interesting features from methylomes and compare them with those identified in public methylomes in our database.

Examples of high-level methylation features available in MethBase through the UCSC Genome Browser track hub.

Wednesday, December 04, 2013

Convert PDF files to high quality PNG figures

To display figures on your website, it is necessary to convert PDF files to image files in PNG format. However, the conversion sometimes results in low-quality figures, especially if there are texts in the PDF original files. Below are the procedures I used to convert PDF files to high-quality PNG files. It includes two step:

1. use Preview to convert PDF files to PNG files
Open your pdf file with Preview on Mac OS. Click File->Export. Select PNG from the Format field. Below the Format selector, there is a text box Resolution, which is the key to preserve high quality. Make sure to input quite high number, say 300 pixel/inch. Click Save. This produces a png file of high quality

2. use OptiPNG to reduce PNG file size
The PNG file from the above step is usually quite big, which may make your website slow to load. The OptiPNG (http://optipng.sourceforge.net/) program can used to reduce file size. With the default settings, it is able to reduce the png file size by half without perceptible loss in image quality.

You may see a PNG figure produced with the above procedure in my MethPipe website (http://smithlab.usc.edu/methpipe/).


Tuesday, August 20, 2013

Add trunk/tags/branches directories to an existing SVN repository

In a standard SVN repository, the top level directory is the project directory, which contains three subdirectories: trunk, tags and branches (SVN Best Practices). Most time, you actively work and update the trunk directory. When you release a new version, you may take a snapshot of the trunk directory by copying the trunk directory to tags. The branches directory is where you may try out some new ideas.

Occasionally, you may have a svn repository that does not follow the recommend layout, probably because it seemed not worth the efforts when you first start that toy-like project. However as developments continue, that repository may have lots of commits, and you found it much convenient if there are the trunk/tags/branches layout (for example, link). Here I will give a step-by-step tutorial.

First, we need to dump the old repository with svnadmin, and then create a new clean repository.
svnadmin dump /srv/svn/repos/test > test-repo.dump
mv /srv/svn/repos/test /srv/svn/repos/test-backup
svnadmin create /srv/svn/repos/test
Next, check out the clean repository, and add trunk, tags, and branches directories.
svn checkout PATH-TO-TEST-REPO
cd test
svn mkdir trunk tags branches
svn ci  trunk tags branches -m "add trunk tags branches structure"
Finally, load the previous repository dump into the trunk subdirectory. Note, the --parent-dir is essential.
svnadmin load  /srv/svn/repos/test --parent-dir  trunk < test-repo.dump
Done!

Friday, August 16, 2013

A Simple Python ConfigParser Class for Parsing Configuration Files

The default ConfigParser in Python is flexible and sophisticated, but surprisingly it behaves annoyingly when working with simple configutation files. It requires that every option must belong to certain sections (link). If there is no section, it aborts with an error. Additionally, it automatically converts keys to lower case,  therefore it is case-insensitive regarding keys (link).

To deal with these annoyances, I implemented an alternative ConfigParser (https://github.com/songqiang/configparser). It aims to work simple configuration files, that contains a key and its value in each line. The delimiter between a ket and its value can be equal (=), colon (:), whitespaces and tabs. Section names are  optional. It implements the same set of interfaces of the default ConfigParser excluding the functionality for writing and sophisticated customization. To use my ConfigParser, just download the ConfigParser.py  file and put it in the same directory with the calling python script. Since Python first looks up the current working directory when importing a module, my ConfigParser will override the default one.


Monday, February 04, 2013

FASTQ Quality Score Convesion Table


FASTQ Quality Score Convesion Table

In FASTQ format, the fourth line encodes the quality score of sequences in the second line. This scheme was initially used by the Phred base-calling program to use ASCII characters to encode the probability that the corresponding base call is wrong in traditional Sanger sequencing. The same format is also used by Illumina/Solexa sequencing, however the mapping from probability values to characters is slightly changed from the Phred score and also varies between different version of Solexa sequencer. The exact formula is given somewhere else. The following lists the conversion table for each platform and/or version. 


Range

  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
    (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)


Sanger sequencing score

   |------+-------+-------+--------------|
   | char | value | Phred |  Error-Prob. |
   |------+-------+-------+--------------|
   | !    |    33 |     0 | 1.0000000000 |
   | "    |    34 |     1 | 0.7943282347 |
   | #    |    35 |     2 | 0.6309573445 |
   | $    |    36 |     3 | 0.5011872336 |
   | %    |    37 |     4 | 0.3981071706 |
   | &    |    38 |     5 | 0.3162277660 |
   | '    |    39 |     6 | 0.2511886432 |
   | (    |    40 |     7 | 0.1995262315 |
   | )    |    41 |     8 | 0.1584893192 |
   | *    |    42 |     9 | 0.1258925412 |
   | +    |    43 |    10 | 0.1000000000 |
   | ,    |    44 |    11 | 0.0794328235 |
   | -    |    45 |    12 | 0.0630957344 |
   | .    |    46 |    13 | 0.0501187234 |
   | /    |    47 |    14 | 0.0398107171 |
   | 0    |    48 |    15 | 0.0316227766 |
   | 1    |    49 |    16 | 0.0251188643 |
   | 2    |    50 |    17 | 0.0199526231 |
   | 3    |    51 |    18 | 0.0158489319 |
   | 4    |    52 |    19 | 0.0125892541 |
   | 5    |    53 |    20 | 0.0100000000 |
   | 6    |    54 |    21 | 0.0079432823 |
   | 7    |    55 |    22 | 0.0063095734 |
   | 8    |    56 |    23 | 0.0050118723 |
   | 9    |    57 |    24 | 0.0039810717 |
   | :    |    58 |    25 | 0.0031622777 |
   | ;    |    59 |    26 | 0.0025118864 |
   | <    |    60 |    27 | 0.0019952623 |
   | =    |    61 |    28 | 0.0015848932 |
   | >    |    62 |    29 | 0.0012589254 |
   | ?    |    63 |    30 | 0.0010000000 |
   | @    |    64 |    31 | 0.0007943282 |
   | A    |    65 |    32 | 0.0006309573 |
   | B    |    66 |    33 | 0.0005011872 |
   | C    |    67 |    34 | 0.0003981072 |
   | D    |    68 |    35 | 0.0003162278 |
   | E    |    69 |    36 | 0.0002511886 |
   | F    |    70 |    37 | 0.0001995262 |
   | G    |    71 |    38 | 0.0001584893 |
   | H    |    72 |    39 | 0.0001258925 |
   | I    |    73 |    40 | 0.0001000000 |
   |------+-------+-------+--------------|


Solexa score (prior 1.3)

   |------+-------+-------+--------------|
   | char | value | Phred |  Error-Prob. |
   |------+-------+-------+--------------|
   | ;    |    59 |    -5 | 0.7597469266 |
   | <    |    60 |    -4 | 0.7152527510 |
   | =    |    61 |    -3 | 0.6661394246 |
   | >    |    62 |    -2 | 0.6131368202 |
   | ?    |    63 |    -1 | 0.5573116338 |
   | @    |    64 |     0 | 0.5000000000 |
   | A    |    65 |     1 | 0.4426883662 |
   | B    |    66 |     2 | 0.3868631798 |
   | C    |    67 |     3 | 0.3338605754 |
   | D    |    68 |     4 | 0.2847472490 |
   | E    |    69 |     5 | 0.2402530734 |
   | F    |    70 |     6 | 0.2007600089 |
   | G    |    71 |     7 | 0.1663375308 |
   | H    |    72 |     8 | 0.1368068886 |
   | I    |    73 |     9 | 0.1118157698 |
   | J    |    74 |    10 | 0.0909090909 |
   | K    |    75 |    11 | 0.0735875561 |
   | L    |    76 |    12 | 0.0593509431 |
   | M    |    77 |    13 | 0.0477267210 |
   | N    |    78 |    14 | 0.0382865039 |
   | O    |    79 |    15 | 0.0306534300 |
   | P    |    80 |    16 | 0.0245033676 |
   | Q    |    81 |    17 | 0.0195623039 |
   | R    |    82 |    18 | 0.0156016622 |
   | S    |    83 |    19 | 0.0124327353 |
   | T    |    84 |    20 | 0.0099009901 |
   | U    |    85 |    21 | 0.0078806839 |
   | V    |    86 |    22 | 0.0062700123 |
   | W    |    87 |    23 | 0.0049868787 |
   | X    |    88 |    24 | 0.0039652856 |
   | Y    |    89 |    25 | 0.0031523092 |
   | Z    |    90 |    26 | 0.0025055927 |
   | [    |    91 |    27 | 0.0019912892 |
   | \\   |    92 |    28 | 0.0015823853 |
   | ]    |    93 |    29 | 0.0012573425 |
   | ^    |    94 |    30 | 0.0009990010 |
   | _    |    95 |    31 | 0.0007936978 |
   | `    |    96 |    32 | 0.0006305595 |
   | a    |    97 |    33 | 0.0005009362 |
   | b    |    98 |    34 | 0.0003979487 |
   | c    |    99 |    35 | 0.0003161278 |
   | d    |   100 |    36 | 0.0002511256 |
   | e    |   101 |    37 | 0.0001994864 |
   | f    |   102 |    38 | 0.0001584642 |
   | g    |   103 |    39 | 0.0001258767 |
   | h    |   104 |    40 | 0.0000999900 |
   |------+-------+-------+--------------|


Solexa score 1.3+

   |------+-------+-------+--------------|
   | char | value | Phred |   Error Prob |
   |------+-------+-------+--------------|
   | @    |    64 |     0 | 1.0000000000 |
   | A    |    65 |     1 | 0.7943282347 |
   | B    |    66 |     2 | 0.6309573445 |
   | C    |    67 |     3 | 0.5011872336 |
   | D    |    68 |     4 | 0.3981071706 |
   | E    |    69 |     5 | 0.3162277660 |
   | F    |    70 |     6 | 0.2511886432 |
   | G    |    71 |     7 | 0.1995262315 |
   | H    |    72 |     8 | 0.1584893192 |
   | I    |    73 |     9 | 0.1258925412 |
   | J    |    74 |    10 | 0.1000000000 |
   | K    |    75 |    11 | 0.0794328235 |
   | L    |    76 |    12 | 0.0630957344 |
   | M    |    77 |    13 | 0.0501187234 |
   | N    |    78 |    14 | 0.0398107171 |
   | O    |    79 |    15 | 0.0316227766 |
   | P    |    80 |    16 | 0.0251188643 |
   | Q    |    81 |    17 | 0.0199526231 |
   | R    |    82 |    18 | 0.0158489319 |
   | S    |    83 |    19 | 0.0125892541 |
   | T    |    84 |    20 | 0.0100000000 |
   | U    |    85 |    21 | 0.0079432823 |
   | V    |    86 |    22 | 0.0063095734 |
   | W    |    87 |    23 | 0.0050118723 |
   | X    |    88 |    24 | 0.0039810717 |
   | Y    |    89 |    25 | 0.0031622777 |
   | Z    |    90 |    26 | 0.0025118864 |
   | [    |    91 |    27 | 0.0019952623 |
   | \\   |    92 |    28 | 0.0015848932 |
   | ]    |    93 |    29 | 0.0012589254 |
   | ^    |    94 |    30 | 0.0010000000 |
   | _    |    95 |    31 | 0.0007943282 |
   | `    |    96 |    32 | 0.0006309573 |
   | a    |    97 |    33 | 0.0005011872 |
   | b    |    98 |    34 | 0.0003981072 |
   | c    |    99 |    35 | 0.0003162278 |
   | d    |   100 |    36 | 0.0002511886 |
   | e    |   101 |    37 | 0.0001995262 |
   | f    |   102 |    38 | 0.0001584893 |
   | g    |   103 |    39 | 0.0001258925 |
   | h    |   104 |    40 | 0.0001000000 |
   |------+-------+-------+--------------|


Solexa score 1.5+

   |------+-------+-------+--------------|
   | char | value | Phred |   Error Prob |
   |------+-------+-------+--------------|
   | C    |    67 |     3 | 0.5011872336 |
   | D    |    68 |     4 | 0.3981071706 |
   | E    |    69 |     5 | 0.3162277660 |
   | F    |    70 |     6 | 0.2511886432 |
   | G    |    71 |     7 | 0.1995262315 |
   | H    |    72 |     8 | 0.1584893192 |
   | I    |    73 |     9 | 0.1258925412 |
   | J    |    74 |    10 | 0.1000000000 |
   | K    |    75 |    11 | 0.0794328235 |
   | L    |    76 |    12 | 0.0630957344 |
   | M    |    77 |    13 | 0.0501187234 |
   | N    |    78 |    14 | 0.0398107171 |
   | O    |    79 |    15 | 0.0316227766 |
   | P    |    80 |    16 | 0.0251188643 |
   | Q    |    81 |    17 | 0.0199526231 |
   | R    |    82 |    18 | 0.0158489319 |
   | S    |    83 |    19 | 0.0125892541 |
   | T    |    84 |    20 | 0.0100000000 |
   | U    |    85 |    21 | 0.0079432823 |
   | V    |    86 |    22 | 0.0063095734 |
   | W    |    87 |    23 | 0.0050118723 |
   | X    |    88 |    24 | 0.0039810717 |
   | Y    |    89 |    25 | 0.0031622777 |
   | Z    |    90 |    26 | 0.0025118864 |
   | [    |    91 |    27 | 0.0019952623 |
   | \\   |    92 |    28 | 0.0015848932 |
   | ]    |    93 |    29 | 0.0012589254 |
   | ^    |    94 |    30 | 0.0010000000 |
   | _    |    95 |    31 | 0.0007943282 |
   | `    |    96 |    32 | 0.0006309573 |
   | a    |    97 |    33 | 0.0005011872 |
   | b    |    98 |    34 | 0.0003981072 |
   | c    |    99 |    35 | 0.0003162278 |
   | d    |   100 |    36 | 0.0002511886 |
   | e    |   101 |    37 | 0.0001995262 |
   | f    |   102 |    38 | 0.0001584893 |
   | g    |   103 |    39 | 0.0001258925 |
   | h    |   104 |    40 | 0.0001000000 |
   |------+-------+-------+--------------|


Solexa score 1.8+

   |------+-------+-------+--------------|
   | char | value | Phred |  Error-Prob. |
   |------+-------+-------+--------------|
   | !    |    33 |     0 | 1.000000e+00 |
   | "    |    34 |     1 | 7.943282e-01 |
   | #    |    35 |     2 | 6.309573e-01 |
   | $    |    36 |     3 | 5.011872e-01 |
   | %    |    37 |     4 | 3.981072e-01 |
   | &    |    38 |     5 | 3.162278e-01 |
   | '    |    39 |     6 | 2.511886e-01 |
   | (    |    40 |     7 | 1.995262e-01 |
   | )    |    41 |     8 | 1.584893e-01 |
   | *    |    42 |     9 | 1.258925e-01 |
   | +    |    43 |    10 | 1.000000e-01 |
   | ,    |    44 |    11 | 7.943282e-02 |
   | -    |    45 |    12 | 6.309573e-02 |
   | .    |    46 |    13 | 5.011872e-02 |
   | /    |    47 |    14 | 3.981072e-02 |
   | 0    |    48 |    15 | 3.162278e-02 |
   | 1    |    49 |    16 | 2.511886e-02 |
   | 2    |    50 |    17 | 1.995262e-02 |
   | 3    |    51 |    18 | 1.584893e-02 |
   | 4    |    52 |    19 | 1.258925e-02 |
   | 5    |    53 |    20 | 1.000000e-02 |
   | 6    |    54 |    21 | 7.943282e-03 |
   | 7    |    55 |    22 | 6.309573e-03 |
   | 8    |    56 |    23 | 5.011872e-03 |
   | 9    |    57 |    24 | 3.981072e-03 |
   | :    |    58 |    25 | 3.162278e-03 |
   | ;    |    59 |    26 | 2.511886e-03 |
   | <    |    60 |    27 | 1.995262e-03 |
   | =    |    61 |    28 | 1.584893e-03 |
   | >    |    62 |    29 | 1.258925e-03 |
   | ?    |    63 |    30 | 1.000000e-03 |
   | @    |    64 |    31 | 7.943282e-04 |
   | A    |    65 |    32 | 6.309573e-04 |
   | B    |    66 |    33 | 5.011872e-04 |
   | C    |    67 |    34 | 3.981072e-04 |
   | D    |    68 |    35 | 3.162278e-04 |
   | E    |    69 |    36 | 2.511886e-04 |
   | F    |    70 |    37 | 1.995262e-04 |
   | G    |    71 |    38 | 1.584893e-04 |
   | H    |    72 |    39 | 1.258925e-04 |
   | I    |    73 |    40 | 1.000000e-04 |
   | J    |    74 |    41 | 7.943282e-05 |
   |------+-------+-------+--------------|