tag:blogger.com,1999:blog-255289592024-03-19T05:22:57.316-07:00Grand Prismatic Spring LabSong Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.comBlogger62125tag:blogger.com,1999:blog-25528959.post-35291736732622548422022-12-15T22:52:00.002-08:002022-12-15T22:52:43.346-08:00Remove search.4jyj.com malicious extension from Firefox and Chrome<p>4jyj is a malicious extension in Firefox and Chrome browser that hijacks the search engine, and redirects to yahoo.com when you search in Google. It also modifies your default search engine setting to something like DominantPartition. It cannot be removed by simply uninstalling any extension.<br /></p><p>The following script shows how to find files <br /></p><p><span style="font-family: verdana;">The string with numbers can be different on different systems. <br /></span></p><p><span style="font-family: verdana;">cd ~/Library/Application\ Support/<br />sudo find . -name "*DominantPartition*"<br />sudo rm -rf ./.1047632777245170349/Services/com.DominantPartition.service ./.1047632777245170349/Services/com.DominantPartition.service/DominantPartition.service<br /> </span></p><p><span style="font-family: verdana;">cd ~/Library/LaunchAgents/<br />rm -rf com.DominantPartition.service.plist<br />cd /Library/Application\ Support/<br />sudo find . -name "*DominantPartition*"<br />sudo rm -rf ./.1047632777245170349/System/com.DominantPartition.system ./.1047632777245170349/System/com.DominantPartition.system/DominantPartition.system</span></p><p><span style="font-family: verdana;"><br />cd /Library/LaunchDaemons/<br />sudo rm -rf com.DominantPartition*</span><br /><br /></p><p><br /></p><p><br /></p>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-58694843613211082902014-12-02T00:35:00.001-08:002014-12-02T00:35:47.995-08:00Short comments on Nature's new open-access initiativeNature announces a new open-initiative that allows any subscriber individuals and institutions to share a public-access link to Nature articles (<a href="http://www.nature.com/press_releases/share-nature-content.html">http://www.nature.com/press_releases/share-nature-content.html</a>).<br />
<br />
It is a good, but small, step forward, although the intention of Nature is still to maximize their revenues from subscription business.<br />
<br />
I would also encourage all authors to "<a href="https://en.wikipedia.org/wiki/Self-archiving">self-archiving</a>" their papers / manuscripts on their personal or institutional websites, which will make their work widely available and promote open access. The six-month limitation of Nature only applies to the finalized version, and it is legal to self archive the original manuscripts any time at the authors' wish (<a href="http://www.eprints.org/openaccess/self-faq/#self-archiving-legal">http://www.eprints.org/openaccess/self-faq/#self-archiving-legal</a>).Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-39754899816094172202014-03-18T18:14:00.001-07:002014-03-18T18:14:27.134-07:00Using UCSC Genome Browser Sessions to Organize and View MethBase TracksI would like to share a tip that uses the <a href="http://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html">UCSC Genome Browser Sessions</a> to
organize and view <a href="http://smithlab.usc.edu/methbase/">MethBase</a> tracks.<br />
<br />
As of 03/18/2014, MethBase contains 2294 tracks, including tracks for
methylation levels, read depths, HMRs, etc. For human alone, there are
1130 tracks. Since each one may be interested in different set of
samples and different selection of features, and want to display them
with specific settings, there is no single setting that satisfies
everyone. Fortunately, the UCSC Genome Browser Sessions provides a
nice feature for you to 1) manually tailor a set of tracks (e.g.,
brain related), 2) store them for future re-use, and 3) share them
with others. With UCSC Genome Browser, each one will have a
"personalized" view of MethBase.<br />
<br />
Using the Session feature requires you to create an account with UCSC
Genome Browser and is quite straightforward
(http://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html). For
example, here is a UCSC browser session that contains high coverage
methylomes of normal cell samples from human (<a href="http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=songqiang&hgS_otherUserSessionName=hg19%2Dhuman%2Dmeth">link</a>).<br />
<br />
Hopefully, more interesting and useful sessions, such as brain
development oriented and cancer oriented, will be created and shared.<br />
<br />
Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-43163266703701299312014-01-29T00:41:00.003-08:002014-01-29T00:41:19.564-08:00A Survey of Commercial Providers of Whole Genome Bisulfite Sequencing ServicesThese days I am working on an NIH proposal that involves whole genome bisulfite sequencing (WGBS). For the purpose of project budgeting, I surveyed several commercial providers of WGBS. The following list of companies provide WGBS service, including library preparation and high through sequence with Illumina machines. Most of them also provide additional bioinformatics service as well. Our project only requires library preparation and sequencing services, the quote from the lowest to highest are: BGI < SeqWright < NXT-dx < Glocal Biologics < Alpha Biolaboratory < ACGT. Illumia only offers service if the desired coverage of a single sample is above 30x, and I have not heared back<br />from Zymo Research yet. <br />
<ul>
<li>BGI Americas <a href="http://bgiamericas.com/">http://bgiamericas.com </a></li>
<li>Zymo Research <a href="http://www.zymoresearch.com/">http://www.zymoresearch.com/</a></li>
<li>SeqWright <a href="http://www.seqwright.com/">http://www.seqwright.com/</a></li>
<li>NXT-dx <a href="http://www.nxt-dx.com/">http://www.nxt-dx.com/</a></li>
<li>Illumina <a href="http://www.illumina.com/index-c.ilmn">http://www.illumina.com/index-c.ilmn</a></li>
<li>Alpha Biolaboratory <a href="http://alphabiolab.com/">http://alphabiolab.com/</a></li>
<li>Global Biologics <a href="http://www.globalbiologics.us/">http://www.globalbiologics.us/</a></li>
<li>ACGT Inc <a href="http://www.acgtinc.com/">http://www.acgtinc.com/</a></li>
</ul>
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-6413126102191199402013-12-10T00:22:00.001-08:002013-12-10T00:22:31.756-08:00A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics Original link: <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081148">http://www.plosone.org/article/info:doi/10.1371/journal.pone.0081148</a><br />
<br />
DNA
methylation is implicated in a surprising diversity of regulatory,
evolutionary processes and diseases in eukaryotes. The introduction of
whole-genome bisulfite sequencing has enabled the study of DNA
methylation at a single-base resolution, revealing many new aspects of
DNA methylation and highlighting the usefulness of methylome data in
understanding a variety of genomic phenomena. As the number of publicly
available whole-genome bisulfite sequencing studies reaches into the
hundreds, reliable and convenient tools for comparing and analyzing
methylomes become increasingly important. We present MethPipe, a
pipeline for both low and high-level methylome analysis, and MethBase,
an accompanying database of annotated methylomes from the public domain.
Together these resources enable researchers to extract interesting
features from methylomes and compare them with those identified in
public methylomes in our database.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4kKhz9RdAZGgrTA7haUtHP0MW0YiKS49Q7bjDhyid-CJsWdUJiYyxyVbJB7XM15vXsd6H3WfMQ6AzZLabYBXLIVmNEtOVdt9xe3r2mkJ9HHQ7S2tG2hK3eKJgC1uv72rVPUMG/s1600/journal.pone.0081148.g001.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4kKhz9RdAZGgrTA7haUtHP0MW0YiKS49Q7bjDhyid-CJsWdUJiYyxyVbJB7XM15vXsd6H3WfMQ6AzZLabYBXLIVmNEtOVdt9xe3r2mkJ9HHQ7S2tG2hK3eKJgC1uv72rVPUMG/s1600/journal.pone.0081148.g001.png" height="381" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Examples of high-level methylation features available in MethBase through the UCSC Genome Browser track hub.</td></tr>
</tbody></table>
<br /> Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-36408491766678857282013-12-04T10:45:00.003-08:002013-12-05T10:28:27.327-08:00Convert PDF files to high quality PNG figuresTo display figures on your website, it is necessary to convert PDF files to image files in PNG format. However, the conversion sometimes results in low-quality figures, especially if there are texts in the PDF original files. Below are the procedures I used to convert PDF files to high-quality PNG files. It includes two step:<br />
<br />
<b>1. use Preview to convert PDF files to PNG files </b><br />
Open your pdf file with Preview on Mac OS. Click File->Export. Select PNG from the Format field. Below the Format selector, there is a text box Resolution, which is the key to preserve high quality. Make sure to input quite high number, say 300 pixel/inch. Click Save. This produces a png file of high quality<br />
<br />
<b>2. use OptiPNG to reduce PNG file size</b><br />
The PNG file from the above step is usually quite big, which may make your website slow to load. The OptiPNG (<a href="http://optipng.sourceforge.net/">http://optipng.sourceforge.net/</a>) program can used to reduce file size. With the default settings, it is able to reduce the png file size by half without perceptible loss in image quality. <br />
<br />
You may see a PNG figure produced with the above procedure in my MethPipe website (<a href="http://smithlab.usc.edu/methpipe/">http://smithlab.usc.edu/methpipe/</a>).<br />
<br />
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-37773857304788745622013-08-20T18:58:00.003-07:002013-08-20T23:09:42.857-07:00Add trunk/tags/branches directories to an existing SVN repositoryIn a standard SVN repository, the top level directory is the project directory, which contains three subdirectories: trunk, tags and branches (<a href="https://svn.apache.org/repos/asf/subversion/trunk/doc/user/svn-best-practices.html">SVN Best Practices</a>). Most time, you actively work and update the trunk directory. When you release a new version, you may take a snapshot of the trunk directory by copying the trunk directory to tags. The branches directory is where you may try out some new ideas.<br />
<br />
Occasionally, you may have a svn repository that does not follow the recommend layout, probably because it seemed not worth the efforts when you first start that toy-like project. However as developments continue, that repository may have lots of commits, and you found it much convenient if there are the trunk/tags/branches layout (for example, <a href="http://stackoverflow.com/questions/6657337/how-to-add-tags-trunk-branches-to-established-svn-repo">link</a>). Here I will give a step-by-step tutorial.<br />
<br />
First, we need to dump the old repository with svnadmin, and then create a new clean repository.<br />
<blockquote class="tr_bq">
svnadmin dump /srv/svn/repos/test > test-repo.dump<br />
mv /srv/svn/repos/test /srv/svn/repos/test-backup<br />
svnadmin create /srv/svn/repos/test</blockquote>
Next, check out the clean repository, and add trunk, tags, and branches directories.<br />
<blockquote class="tr_bq">
svn checkout PATH-TO-TEST-REPO<br />
cd test<br />
svn mkdir trunk tags branches<br />
svn ci trunk tags branches -m "add trunk tags branches structure"</blockquote>
Finally, load the previous repository dump into the trunk subdirectory. Note, the --parent-dir is essential. <br />
<blockquote class="tr_bq">
svnadmin load /srv/svn/repos/test --parent-dir trunk < test-repo.dump</blockquote>
Done!<br />
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-8610594959086589092013-08-16T20:36:00.000-07:002013-08-19T14:31:40.169-07:00A Simple Python ConfigParser Class for Parsing Configuration Files The default <a href="http://docs.python.org/2/library/configparser.html">ConfigParser</a> in Python is flexible and sophisticated, but surprisingly it behaves annoyingly when working with simple configutation files. It requires that every option must belong to certain sections (<a href="http://stackoverflow.com/questions/2885190/using-pythons-configparser-to-read-a-file-without-section-name">link</a>). If there is no section, it aborts with an error. Additionally, it automatically converts keys to lower case, therefore it is case-insensitive regarding keys (<a href="http://stackoverflow.com/questions/1611799/preserve-case-in-configparser">link</a>). <br />
<br />
To deal with these annoyances, I implemented an alternative ConfigParser (<a href="https://github.com/songqiang/configparser">https://github.com/songqiang/configparser</a>). It aims to work simple configuration files, that contains a key and its value in each line. The delimiter between a ket and its value can be equal (=), colon (:), whitespaces and tabs. Section names are optional. It implements the same set of interfaces of the default ConfigParser excluding the functionality for writing and sophisticated customization. To use my ConfigParser, just download the <a href="https://github.com/songqiang/configparser">ConfigParser.py </a> file and put it in the same directory with the calling python script. Since Python first looks up the current working directory when importing a module, my ConfigParser will override the default one.<br />
<br />
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-89393316507571442242013-02-04T20:11:00.001-08:002013-02-04T20:13:24.419-08:00FASTQ Quality Score Convesion Table<br />
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
FASTQ Quality Score Convesion Table</h1>
<div>
In FASTQ format, the fourth line encodes the quality score of sequences in the second line. This scheme was initially used by the Phred base-calling program to use ASCII characters to encode the probability that the corresponding base call is wrong in traditional Sanger sequencing. The same format is also used by Illumina/Solexa sequencing, however the mapping from probability values to characters is slightly changed from the Phred score and also varies between different version of Solexa sequencer. The exact formula is given somewhere else. The following lists the conversion table for each platform and/or version. </div>
<div>
<br /></div>
<div>
<br /></div>
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
Range</h1>
<pre style="background-color: #f9f9f9; border: 1px dashed rgb(47, 111, 171); color: #222222; line-height: 1.1em; padding: 1em; white-space: pre-wrap;"> <span style="font-family: courier new, monospace;"> SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS<wbr></wbr>SSSSSSSSSSS...................<wbr></wbr>..............................<wbr></wbr>....
..........................<wbr></wbr>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<wbr></wbr>XXXXXXXXXXXXXXXX..............<wbr></wbr>........
..............................<wbr></wbr>.<wbr></wbr>IIIIIIIIIIIIIIIIIIIIIIIIIIIIII<wbr></wbr>IIIIIIIIIII...................<wbr></wbr>...
..............................<wbr></wbr>...<b>J</b>JJJJJJJJJJJJJJJJJJJJJJJJJJ<wbr></wbr>JJJJJJJJJJJJ..................<wbr></wbr>....
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL<wbr></wbr>LLLLLLLLLLLL..................<wbr></wbr>..............................<wbr></wbr>....
!"#$%&'()*+,-./0123456789:;<=<wbr></wbr>>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[<wbr></wbr>\]^_`<wbr></wbr>abcdefghijklmnopqrstuvwxyz{|}~
| | | | | |
33 59 64 73 104 126
S - Sanger Phred+33, raw reads typically (0, 40)
X - Solexa Solexa+64, raw reads typically (-5, 40)
I - Illumina 1.3+ Phred+64, raw reads typically (0, 40)
J - Illumina 1.5+ Phred+64, raw reads typically (3, 40)
with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold)
(Note: See discussion above).
L - Illumina 1.8+ Phred+33, raw reads typically (0, 41)</span>
</pre>
<a href="http://draft.blogger.com/blogger.g?blogID=25528959" name="13ca87f573576a93_Sanger_sequencing_score" style="background-color: white; background-image: none; color: #002bb8; font-family: sans-serif; font-size: 13px; line-height: 19.046875px; text-decoration: initial;"></a><br />
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
<br />Sanger sequencing score</h1>
<pre style="background-color: #f9f9f9; border: 1px dashed rgb(47, 111, 171); color: #222222; line-height: 1.1em; padding: 1em; white-space: pre-wrap;"><span style="font-family: courier new, monospace;"> |------+-------+-------+------<wbr></wbr>--------|
| char | value | Phred | Error-Prob. |
|------+-------+-------+------<wbr></wbr>--------|
| ! | 33 | 0 | 1.0000000000 |
| " | 34 | 1 | 0.7943282347 |
| # | 35 | 2 | 0.6309573445 |
| $ | 36 | 3 | 0.5011872336 |
| % | 37 | 4 | 0.3981071706 |
| & | 38 | 5 | 0.3162277660 |
| ' | 39 | 6 | 0.2511886432 |
| ( | 40 | 7 | 0.1995262315 |
| ) | 41 | 8 | 0.1584893192 |
| * | 42 | 9 | 0.1258925412 |
| + | 43 | 10 | 0.1000000000 |
| , | 44 | 11 | 0.0794328235 |
| - | 45 | 12 | 0.0630957344 |
| . | 46 | 13 | 0.0501187234 |
| / | 47 | 14 | 0.0398107171 |
| 0 | 48 | 15 | 0.0316227766 |
| 1 | 49 | 16 | 0.0251188643 |
| 2 | 50 | 17 | 0.0199526231 |
| 3 | 51 | 18 | 0.0158489319 |
| 4 | 52 | 19 | 0.0125892541 |
| 5 | 53 | 20 | 0.0100000000 |
| 6 | 54 | 21 | 0.0079432823 |
| 7 | 55 | 22 | 0.0063095734 |
| 8 | 56 | 23 | 0.0050118723 |
| 9 | 57 | 24 | 0.0039810717 |
| : | 58 | 25 | 0.0031622777 |
| ; | 59 | 26 | 0.0025118864 |
| < | 60 | 27 | 0.0019952623 |
| = | 61 | 28 | 0.0015848932 |
| > | 62 | 29 | 0.0012589254 |
| ? | 63 | 30 | 0.0010000000 |
| @ | 64 | 31 | 0.0007943282 |
| A | 65 | 32 | 0.0006309573 |
| B | 66 | 33 | 0.0005011872 |
| C | 67 | 34 | 0.0003981072 |
| D | 68 | 35 | 0.0003162278 |
| E | 69 | 36 | 0.0002511886 |
| F | 70 | 37 | 0.0001995262 |
| G | 71 | 38 | 0.0001584893 |
| H | 72 | 39 | 0.0001258925 |
| I | 73 | 40 | 0.0001000000 |
|------+-------+-------+------<wbr></wbr>--------|
</span></pre>
<span style="background-color: white; color: #222222; font-family: courier new, monospace; font-size: 12.571428298950195px;"><a href="http://draft.blogger.com/blogger.g?blogID=25528959" name="13ca87f573576a93_Solexa_score_.28prior_1.3.29" style="background-image: none; color: #002bb8; line-height: 19.046875px; text-decoration: initial;"></a></span><br />
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: arial, sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
<span style="font-family: sans-serif;"><br />Solexa score (prior 1.3)</span></h1>
<pre style="background-color: #f9f9f9; border: 1px dashed rgb(47, 111, 171); color: #222222; line-height: 1.1em; padding: 1em; white-space: pre-wrap;"><span style="font-family: courier new, monospace;"> |------+-------+-------+------<wbr></wbr>--------|
| char | value | Phred | Error-Prob. |
|------+-------+-------+------<wbr></wbr>--------|
| ; | 59 | -5 | 0.7597469266 |
| < | 60 | -4 | 0.7152527510 |
| = | 61 | -3 | 0.6661394246 |
| > | 62 | -2 | 0.6131368202 |
| ? | 63 | -1 | 0.5573116338 |
| @ | 64 | 0 | 0.5000000000 |
| A | 65 | 1 | 0.4426883662 |
| B | 66 | 2 | 0.3868631798 |
| C | 67 | 3 | 0.3338605754 |
| D | 68 | 4 | 0.2847472490 |
| E | 69 | 5 | 0.2402530734 |
| F | 70 | 6 | 0.2007600089 |
| G | 71 | 7 | 0.1663375308 |
| H | 72 | 8 | 0.1368068886 |
| I | 73 | 9 | 0.1118157698 |
| J | 74 | 10 | 0.0909090909 |
| K | 75 | 11 | 0.0735875561 |
| L | 76 | 12 | 0.0593509431 |
| M | 77 | 13 | 0.0477267210 |
| N | 78 | 14 | 0.0382865039 |
| O | 79 | 15 | 0.0306534300 |
| P | 80 | 16 | 0.0245033676 |
| Q | 81 | 17 | 0.0195623039 |
| R | 82 | 18 | 0.0156016622 |
| S | 83 | 19 | 0.0124327353 |
| T | 84 | 20 | 0.0099009901 |
| U | 85 | 21 | 0.0078806839 |
| V | 86 | 22 | 0.0062700123 |
| W | 87 | 23 | 0.0049868787 |
| X | 88 | 24 | 0.0039652856 |
| Y | 89 | 25 | 0.0031523092 |
| Z | 90 | 26 | 0.0025055927 |
| [ | 91 | 27 | 0.0019912892 |
| \\ | 92 | 28 | 0.0015823853 |
| ] | 93 | 29 | 0.0012573425 |
| ^ | 94 | 30 | 0.0009990010 |
| _ | 95 | 31 | 0.0007936978 |
| ` | 96 | 32 | 0.0006305595 |
| a | 97 | 33 | 0.0005009362 |
| b | 98 | 34 | 0.0003979487 |
| c | 99 | 35 | 0.0003161278 |
| d | 100 | 36 | 0.0002511256 |
| e | 101 | 37 | 0.0001994864 |
| f | 102 | 38 | 0.0001584642 |
| g | 103 | 39 | 0.0001258767 |
| h | 104 | 40 | 0.0000999900 |
|------+-------+-------+------<wbr></wbr>--------|
</span></pre>
<span style="background-color: white; color: #222222; font-family: courier new, monospace; font-size: 12.571428298950195px;"><a href="http://draft.blogger.com/blogger.g?blogID=25528959" name="13ca87f573576a93_Solexa_score_1.3.2B" style="background-image: none; color: #002bb8; line-height: 19.046875px; text-decoration: initial;"></a></span><br />
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: arial, sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
<span style="font-family: sans-serif;"><br />Solexa score 1.3+</span></h1>
<pre style="background-color: #f9f9f9; border: 1px dashed rgb(47, 111, 171); color: #222222; line-height: 1.1em; padding: 1em; white-space: pre-wrap;"><span style="font-family: courier new, monospace;"> |------+-------+-------+------<wbr></wbr>--------|
| char | value | Phred | Error Prob |
|------+-------+-------+------<wbr></wbr>--------|
| @ | 64 | 0 | 1.0000000000 |
| A | 65 | 1 | 0.7943282347 |
| B | 66 | 2 | 0.6309573445 |
| C | 67 | 3 | 0.5011872336 |
| D | 68 | 4 | 0.3981071706 |
| E | 69 | 5 | 0.3162277660 |
| F | 70 | 6 | 0.2511886432 |
| G | 71 | 7 | 0.1995262315 |
| H | 72 | 8 | 0.1584893192 |
| I | 73 | 9 | 0.1258925412 |
| J | 74 | 10 | 0.1000000000 |
| K | 75 | 11 | 0.0794328235 |
| L | 76 | 12 | 0.0630957344 |
| M | 77 | 13 | 0.0501187234 |
| N | 78 | 14 | 0.0398107171 |
| O | 79 | 15 | 0.0316227766 |
| P | 80 | 16 | 0.0251188643 |
| Q | 81 | 17 | 0.0199526231 |
| R | 82 | 18 | 0.0158489319 |
| S | 83 | 19 | 0.0125892541 |
| T | 84 | 20 | 0.0100000000 |
| U | 85 | 21 | 0.0079432823 |
| V | 86 | 22 | 0.0063095734 |
| W | 87 | 23 | 0.0050118723 |
| X | 88 | 24 | 0.0039810717 |
| Y | 89 | 25 | 0.0031622777 |
| Z | 90 | 26 | 0.0025118864 |
| [ | 91 | 27 | 0.0019952623 |
| \\ | 92 | 28 | 0.0015848932 |
| ] | 93 | 29 | 0.0012589254 |
| ^ | 94 | 30 | 0.0010000000 |
| _ | 95 | 31 | 0.0007943282 |
| ` | 96 | 32 | 0.0006309573 |
| a | 97 | 33 | 0.0005011872 |
| b | 98 | 34 | 0.0003981072 |
| c | 99 | 35 | 0.0003162278 |
| d | 100 | 36 | 0.0002511886 |
| e | 101 | 37 | 0.0001995262 |
| f | 102 | 38 | 0.0001584893 |
| g | 103 | 39 | 0.0001258925 |
| h | 104 | 40 | 0.0001000000 |
|------+-------+-------+------<wbr></wbr>--------|
</span></pre>
<span style="background-color: white; color: #222222; font-family: courier new, monospace; font-size: 12.571428298950195px;"><a href="http://draft.blogger.com/blogger.g?blogID=25528959" name="13ca87f573576a93_Solexa_score_1.5.2B" style="background-image: none; color: #002bb8; line-height: 19.046875px; text-decoration: initial;"></a></span><br />
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: arial, sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
<span style="font-family: sans-serif;"><br />Solexa score 1.5+</span></h1>
<pre style="background-color: #f9f9f9; border: 1px dashed rgb(47, 111, 171); color: #222222; line-height: 1.1em; padding: 1em; white-space: pre-wrap;"><span style="font-family: courier new, monospace;"> |------+-------+-------+------<wbr></wbr>--------|
| char | value | Phred | Error Prob |
|------+-------+-------+------<wbr></wbr>--------|
| C | 67 | 3 | 0.5011872336 |
| D | 68 | 4 | 0.3981071706 |
| E | 69 | 5 | 0.3162277660 |
| F | 70 | 6 | 0.2511886432 |
| G | 71 | 7 | 0.1995262315 |
| H | 72 | 8 | 0.1584893192 |
| I | 73 | 9 | 0.1258925412 |
| J | 74 | 10 | 0.1000000000 |
| K | 75 | 11 | 0.0794328235 |
| L | 76 | 12 | 0.0630957344 |
| M | 77 | 13 | 0.0501187234 |
| N | 78 | 14 | 0.0398107171 |
| O | 79 | 15 | 0.0316227766 |
| P | 80 | 16 | 0.0251188643 |
| Q | 81 | 17 | 0.0199526231 |
| R | 82 | 18 | 0.0158489319 |
| S | 83 | 19 | 0.0125892541 |
| T | 84 | 20 | 0.0100000000 |
| U | 85 | 21 | 0.0079432823 |
| V | 86 | 22 | 0.0063095734 |
| W | 87 | 23 | 0.0050118723 |
| X | 88 | 24 | 0.0039810717 |
| Y | 89 | 25 | 0.0031622777 |
| Z | 90 | 26 | 0.0025118864 |
| [ | 91 | 27 | 0.0019952623 |
| \\ | 92 | 28 | 0.0015848932 |
| ] | 93 | 29 | 0.0012589254 |
| ^ | 94 | 30 | 0.0010000000 |
| _ | 95 | 31 | 0.0007943282 |
| ` | 96 | 32 | 0.0006309573 |
| a | 97 | 33 | 0.0005011872 |
| b | 98 | 34 | 0.0003981072 |
| c | 99 | 35 | 0.0003162278 |
| d | 100 | 36 | 0.0002511886 |
| e | 101 | 37 | 0.0001995262 |
| f | 102 | 38 | 0.0001584893 |
| g | 103 | 39 | 0.0001258925 |
| h | 104 | 40 | 0.0001000000 |
|------+-------+-------+------<wbr></wbr>--------|
</span></pre>
<span style="background-color: white; color: #222222; font-family: courier new, monospace; font-size: 12.571428298950195px;"><a href="http://draft.blogger.com/blogger.g?blogID=25528959" name="13ca87f573576a93_Solexa_score_1.8.2B" style="background-image: none; color: #002bb8; line-height: 19.046875px; text-decoration: initial;"></a></span><br />
<h1 style="background-color: white; background-image: none; border-bottom-color: rgb(170, 170, 170); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: arial, sans-serif; font-size: 24px; font-weight: normal; line-height: 19.046875px; margin: 0px 0px 0.6em; padding-bottom: 0.17em; padding-top: 0.5em;">
<span style="font-family: sans-serif;"><br />Solexa score 1.8+</span></h1>
<pre style="background-color: #f9f9f9; border: 1px dashed rgb(47, 111, 171); color: #222222; line-height: 1.1em; padding: 1em; white-space: pre-wrap;"><span style="font-family: courier new, monospace;"> |------+-------+-------+------<wbr></wbr>--------|
| char | value | Phred | Error-Prob. |
|------+-------+-------+------<wbr></wbr>--------|
| ! | 33 | 0 | 1.000000e+00 |
| " | 34 | 1 | 7.943282e-01 |
| # | 35 | 2 | 6.309573e-01 |
| $ | 36 | 3 | 5.011872e-01 |
| % | 37 | 4 | 3.981072e-01 |
| & | 38 | 5 | 3.162278e-01 |
| ' | 39 | 6 | 2.511886e-01 |
| ( | 40 | 7 | 1.995262e-01 |
| ) | 41 | 8 | 1.584893e-01 |
| * | 42 | 9 | 1.258925e-01 |
| + | 43 | 10 | 1.000000e-01 |
| , | 44 | 11 | 7.943282e-02 |
| - | 45 | 12 | 6.309573e-02 |
| . | 46 | 13 | 5.011872e-02 |
| / | 47 | 14 | 3.981072e-02 |
| 0 | 48 | 15 | 3.162278e-02 |
| 1 | 49 | 16 | 2.511886e-02 |
| 2 | 50 | 17 | 1.995262e-02 |
| 3 | 51 | 18 | 1.584893e-02 |
| 4 | 52 | 19 | 1.258925e-02 |
| 5 | 53 | 20 | 1.000000e-02 |
| 6 | 54 | 21 | 7.943282e-03 |
| 7 | 55 | 22 | 6.309573e-03 |
| 8 | 56 | 23 | 5.011872e-03 |
| 9 | 57 | 24 | 3.981072e-03 |
| : | 58 | 25 | 3.162278e-03 |
| ; | 59 | 26 | 2.511886e-03 |
| < | 60 | 27 | 1.995262e-03 |
| = | 61 | 28 | 1.584893e-03 |
| > | 62 | 29 | 1.258925e-03 |
| ? | 63 | 30 | 1.000000e-03 |
| @ | 64 | 31 | 7.943282e-04 |
| A | 65 | 32 | 6.309573e-04 |
| B | 66 | 33 | 5.011872e-04 |
| C | 67 | 34 | 3.981072e-04 |
| D | 68 | 35 | 3.162278e-04 |
| E | 69 | 36 | 2.511886e-04 |
| F | 70 | 37 | 1.995262e-04 |
| G | 71 | 38 | 1.584893e-04 |
| H | 72 | 39 | 1.258925e-04 |
| I | 73 | 40 | 1.000000e-04 |
| J | 74 | 41 | 7.943282e-05 |
|------+-------+-------+------<wbr></wbr>--------|</span></pre>
Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-80730695356519562612013-01-22T23:05:00.000-08:002013-01-22T23:11:17.868-08:00A Guess on the Encryption Design of MEGAThe newly relaunched MEGA, successor to MegaUpload, raised lots of fanfare on the net. A novel feature of the new MEGA site is its encryption function. There are two interesting articles about the encryption technique in the new MEGA site. One from Ars Technica questioned the security and usefulness of MEGA encryption design (<a href="http://arstechnica.com/business/2013/01/megabad-a-quick-look-at-the-state-of-megas-encryption/">http://arstechnica.com/business/2013/01/megabad-a-quick-look-at-the-state-of-megas-encryption/</a>). The other posted by MEGA blog address those concerns (<a href="https://mega.co.nz/#blog_3">https://mega.co.nz/#blog_3</a>).<br />
<br />
In my opinion, the editor of Ars Technica does not understand or at least misunderstands MEGA's encryption design. There are some comments of that Ars article that explained the basic idea quite clearly, which was confirmed by Mega's reply.<br />
<br />
If my guess is right, the Encryption Design of MEGA is illustrated in the figure below. A pdf version of the figure is at <a href="https://www.box.com/s/uswje6orhhqahyv97ijk">https://www.box.com/s/uswje6orhhqahyv97ijk</a><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPcglzaqR1fGFQYfmcamv-NL-sqAGSAdZT2YqS1ptz2XFDLRaFqfEorDkqNnvkP_00m3RYsRVXMDjUYQiJHX8gqYFs4bTE4fzAu6WLiqKbUh1mL0ZlLgR4q3yQvklW8jLHU0vC/s1600/mega-encryption-design.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="451" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPcglzaqR1fGFQYfmcamv-NL-sqAGSAdZT2YqS1ptz2XFDLRaFqfEorDkqNnvkP_00m3RYsRVXMDjUYQiJHX8gqYFs4bTE4fzAu6WLiqKbUh1mL0ZlLgR4q3yQvklW8jLHU0vC/s640/mega-encryption-design.png" width="640" /></a></div>
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-5697513878540313882012-12-28T11:36:00.002-08:002013-12-09T12:43:02.962-08:00Notes on Android PhonesSome notes when using my Google Nexus 4 phone.<br />
<h2>
1. backup Android files</h2>
Install Software Data Cale <a href="https://play.google.com/store/apps/details?id=com.lyy.softdatacable&hl=en">https://play.google.com/store/apps/details?id=com.lyy.softdatacable&hl=en</a>. This app start a ftp server on your phone so that you can access your files with WiFi. After you start the app, it shows its IP address in the home screen, something like ftp://198.168.1.27:8888/<br />
<br />
Next in your local machine, you can use lftp to mirror the files in your phone to your desktop machine. The following command will sync all newer files with in the root directory of your phone to the the n4 directory at your desktop<br />
<blockquote class="tr_bq">
lftp ftp://198.168.1.27:8888/ -e "mirror --verbose --only-newer / n4" </blockquote>
<h2>
<b>2. Access Developer Options</b></h2>
Go to Settings -> About Phone -> Click Build Number for seven time, the Developer option is enabled, from which you may adjust the scale of animation.<br />
<br />
<br />
<br />
<br />
<br />
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-46342842345735277592012-04-27T22:24:00.000-07:002012-04-27T22:24:48.504-07:00Notes on Upgrading to Ubuntu 12.04 Precise Pangolin<br /><br />0. In general, the upgrading process was smooth. I have been using it for two days. Since this version is LTS, I would recommend that everyone upgrade. <br /><br />1. The full-fledged Unity is so unstable that is essentially unusable. By default, the Unity Plugin is not enabled. When you log in for the first time, there is just an blank desktop, no Dash, no launcher. Some one wrote about how to fix this (<a href="http://askubuntu.com/questions/17381/unity-doesnt-load-no-launcher-no-dash-appears">http://askubuntu.com/questions/17381/unity-doesnt-load-no-launcher-no-dash-appears</a> and <a href="http://askubuntu.com/questions/121782/blank-desktop-after-updates-today-only-unity2d-works-now">http://askubuntu.com/questions/121782/blank-desktop-after-updates-today-only-unity2d-works-now</a>), however it still does not work fine with me. In particular, if you log out and then log in again, the same blank desktop appears :-(. <br /><br />2. Unity 2D works fine. The new HUD (Heads Up Display) is the killer feature. <br /><br />3. Gnome Shell generally works. However the most extensions, such as User Themes, are unavailable now. <br /><br />4. The default setting for dual monitors in Gnome Shell has a wired behavior. You can only switch workspaces in the primary monitor while the workspace in the secondary monitor keeps the same. To make the workspace span the two monitors, run the following command: <br />"gsettings set org.gnome.shell.overrides workspaces-only-on-primary false"<br />as pointed by <a href="http://gregcor.com/2011/05/07/fix-dual-monitors-in-gnome-3-aka-my-workspaces-are-broken/">http://gregcor.com/2011/05/07/fix-dual-monitors-in-gnome-3-aka-my-workspaces-are-broken/</a><br /><br />5. In Gnome Shell, the wallpaper does not show up even I have already set it from System Settings -> Appearance. This can be fixed as following:<br /><br />Open gnome-tweak-tool<br />click on the Desktop tab<br />Turn on "Have manager handle the desktop"Turn on "Computer icon visible on desktop"Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-24630417062749652962011-09-28T21:39:00.000-07:002011-09-28T21:39:02.973-07:00Multithreaded downloading with wget<br />GNU wget is versatile and robust, but lacks support for multithreaded downloading. When downloading multiple files, it just goes one by one, which is quite inefficient if the bandwidth of each connection is limited.<br /><br />There is a way to achieve nearly the same effect as multithreaded downloading (<a href="http://stackoverflow.com/questions/4745799/multiple-wget-r-a-site-simultaneously">link</a>), and here is how you do it:<br />
<blockquote>
wget -r -np -N [url] &<br />wget -r -np -N [url] &<br />wget -r -np -N [url] &<br />wget -r -np -N [url] &</blockquote>
copy as many times as you deem appropriate to have as many processes downloading. The key is the -N option, which tells wget to download a file only when its local time stamp is older than the one in the server side. <br /><br />Alternatively, I wrote a wrapper, pwget (short for parallel wget), that adds multithreading to wget. The program is available from <a href="https://github.com/songqiang/pwget">https://github.com/songqiang/pwget</a>. It has two options --max-num-threads and --sleep. The first option --max-num-threads gives the maximum number of connections you allow to establish. This number is usually determined by the setting on the server side and by default it is 3. The second option --sleep specifies how often (in seconds) the master thread checks the status of downloading threads. When the master thread wakes up, it removes finished threads and add new downloading threads if necessary. Suppose you have the list of URLs in the file url-list.txt, then run <br />
<blockquote>
./pwget.py --max-num-threads 5 --sleep 2 -i url-list.txt</blockquote>
wget will begin downloading the list of URLs in url-list.txt with at most 5 connections at once. You can also specify the option for wget in the command line, which will be passed to working threads. <br /><br />This tool has several limitations. The parallel level of pwget is based on each URL, so you need to list the all URLs in prior. Furthermore, if you have a single large file, pwget does not help. In that case, you may consider use aria2 (<a href="http://aria2.sourceforge.net/">http://aria2.sourceforge.net/</a>). <br /><br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-58672376532467360172011-09-20T23:01:00.000-07:002011-09-20T23:01:42.946-07:00Runnning SSH on a non-standard portThe default port for SSH connection is 22. However some servers change the default port to others, for example 22222, for security reasons. Here I list some common commands to deal with non-standard SSH port.<br />
<br />
Suppose you have a SSH server ssh.example.edu with ssh port number 22222.<br />
<br />
To copy your ssh public key to the server, run:
<br />
<blockquote>
ssh-copy-id '-p 22222 jon@ssh.example.edu'
</blockquote>
Note the single quote is necessary.<br />
<br />
To log in to the SSH server, run<br />
<blockquote>
ssh -p 22222 jon@ssh.example.edu </blockquote>
To copy files between your local machine and the server with scp, run<br />
<blockquote>
scp -P 22222 local-files jon@ssh.example.edu:~
</blockquote>
Note the "-P" option is capitalised. <br />
<br />
References:<br />
1. <a href="http://www.itworld.com/nls_unixssh0500506">http://www.itworld.com/nls_unixssh0500506</a><br />
2. <a href="http://mikegerwitz.com/2009/10/07/ssh-copy-id-and-sshd-port/">http://mikegerwitz.com/2009/10/07/ssh-copy-id-and-sshd-port/ </a><br />
<br />
<br />
Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-74461980175553755882011-07-14T18:19:00.000-07:002011-07-15T13:06:41.208-07:00Setting Up a Hadoop Cluster<b><span style="font-size: large;">This post lists the steps to set up an Hadoop cluster in Ubuntu 11.04. Most codes can be directly copied and pasted.</span></b><br />
<br />
* Hadoop<br />
** Install Java<br />
<blockquote>#+begin_src shell<br />
sudo apt-get install sun-java6-jdk<br />
sudo update-java-alternatives -s java-6-sun<br />
#+end_src</blockquote><br />
** Add Hadoop User and Group<br />
<blockquote>#+begin_src shell<br />
sudo addgroup hadoop<br />
sudo adduser --ingroup hadoop hadoop<br />
#+end_src</blockquote><br />
** Configuring SSH and Password-less Login<br />
<blockquote>#+begin_src sh<br />
# In the master node<br />
su hadoop<br />
ssh-keygen -t rsa -P ""<br />
<br />
for node in $(cat /conf/slaves); <br />
do<br />
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@$node;<br />
done<br />
#+end_src</blockquote><br />
** Install Hadoop<br />
*** Install<br />
<blockquote>#+begin_src sh<br />
## download and install<br />
cd /home/hadoop/<br />
tar xzf hadoop-0.21.0.tar.gz<br />
mv hadoop-0.21.0 hadoop<br />
#+end_src</blockquote>*** Update .bashrc<br />
<blockquote>#+begin_src sh<br />
## update .bashrc<br />
# Set Hadoop-related environment variables<br />
export HADOOP_HOME=/home/hadoop/hadoop<br />
export HADOOP_COMMON_HOME="/home/hadoop/hadoop"<br />
export PATH=$PATH:$HADOOP_HOME/bin<br />
export PATH=$PATH:$HADOOP_COMMON_HOME/bin/<br />
#+end_src</blockquote>*** Update conf/hadoop-env.sh<br />
<blockquote>#+begin_src sh<br />
export JAVA_HOME=/usr/lib/jvm/java-6-sun<br />
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true<br />
#+end_src</blockquote>*** Update conf/core-site.xml<br />
<blockquote><?xml version="1.0"?><br />
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><br />
<br />
<configuration><br />
<br />
<!-- In: conf/core-site.xml --><br />
<property><br />
<name>hadoop.tmp.dir</name><br />
<value>/home/hadoop/tmp</value><br />
<description>A base for other temporary directories.</description><br />
</property><br />
<br />
<property><br />
<name>fs.default.name</name><br />
<value>hdfs://128.125.86.89:54310</value><br />
<description>The name of the default file system. A URI whose<br />
scheme and authority determine the FileSystem implementation. The<br />
uri's scheme determines the config property (fs.SCHEME.impl) naming<br />
the FileSystem implementation class. The uri's authority is used to<br />
determine the host, port, etc. for a filesystem.</description><br />
</property><br />
<br />
<br />
</configuration></blockquote>*** Update conf/mapred-site.xml<br />
<blockquote><?xml version="1.0"?><br />
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><br />
<br />
<!-- Put site-specific property overrides in this file. --><br />
<br />
<configuration><br />
<br />
<!-- In: conf/mapred-site.xml --><br />
<property><br />
<name>mapreduce.jobtracker.address</name><br />
<value>128.125.86.89:54311</value><br />
</property><br />
<br />
</configuration></blockquote>*** Update conf/hdfs-site.xml <br />
<blockquote>#+begin_src html<br />
<?xml version="1.0"?><br />
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><br />
<br />
<!-- Put site-specific property overrides in this file. --><br />
<br />
<configuration><br />
<br />
<!-- In: conf/hdfs-site.xml --><br />
<property><br />
<name>dfs.replication</name><br />
<value>3</value><br />
<description>Default block replication.<br />
The actual number of replications can be specified when the file is created.<br />
The default is used if replication is not specified in create time.<br />
</description><br />
</property><br />
<br />
</configuration><br />
#+end_src</blockquote>*** Update conf/masters (master node only) <br />
<blockquote>#+begin_src sh <br />
128.125.86.89 <br />
#+end_src</blockquote>*** Update conf/slaves (master node only) <br />
<blockquote>#+begin_src sh <br />
128.125.86.89 <br />
slave-ip1 <br />
slave-ip2 <br />
...... <br />
#+end_src<br />
</blockquote>*** Copy hadoop installation and configuration files to slave nodes <br />
<blockquote>#+begin_src sh <br />
# In the master node <br />
su hadoop <br />
for node in $(cat /conf/slaves); <br />
do <br />
scp ~/.bashrc hadoop@$node:~; scp -r ~/hadoop hadoop@#node:~; <br />
done <br />
#+end_src</blockquote>** Run Hadoop<br />
*** Format HDFS <br />
<blockquote>#+begin_src sh <br />
hdfs namenode -format <br />
#+end_src</blockquote>*** Start Hadoop <br />
<blockquote>#+begin_src sh <br />
start-dfs.sh && sleep 300 && start-mapred.sh && echo "GOOD" <br />
#+end_src<br />
</blockquote>*** Run Jobs <br />
<blockquote>#+begin_src sh <br />
hadoop jar hadoop pipes <br />
#+end_src</blockquote>*** Stop Hadoop <br />
<blockquote>#+begin_src sh <br />
stop-mapred.sh && stop-dfs.sh <br />
#+end_src</blockquote>** References:<br />
1. <a href="http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/%20">http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ </a><br />
2. <a href="http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/">http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/</a><br />
3. <a href="http://fclose.com/b/cloud-computing/290/hadoop-tutorial/">http://fclose.com/b/cloud-computing/290/hadoop-tutorial/</a><br />
4. <a href="http://www.google.com/notebook/public/14957379517904439138/BDQ-QDAoQ9sHt0pIm">Fix could only be replicated to 0 nodes instead of 1 error</a>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-84790601600732404662011-03-10T01:49:00.000-08:002011-03-10T01:49:53.517-08:00Identifying dispersed epigenomic domains from ChIP-Seq dataPublished in Bioinformatics <a href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full%20">http://bioinformatics.oxfordjournals.org/content/27/6/870.full</a> <br />
<br />
<h2>1 INTRODUCTION</h2><div id="p-8">Post-translational modifications to histone tails, including methylation and acetylaytion, have been associated with important regulatory roles in cell differentiation and disease development (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-2" id="xref-ref-2-1">Kouzarides, 2007</a>). The application of ChIP-Seq to histone modification study has proved very useful for understanding the genomic landscape of histone modifications (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-1" id="xref-ref-1-1">Barski <i>et al.</i>, 2007</a>; <a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-3" id="xref-ref-3-1">Mikkelsen <i>et al.</i>, 2007</a>). Certain histone modifications are tightly concentrated, covering a few hundred base pairs. For example, H3K4me3 is usually associated with active promoters, and occurs only at nucleosomes close to transcription start sites (TSSs). On the other hand, many histone modifications are diffuse and occupy large regions, ranging from thousands to several millions of base pairs. A well known example H3K36me3 is associated with active gene expression and often spans the whole gene body (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-1" id="xref-ref-1-2">Barski <i>et al.</i>, 2007</a>). Reflected in ChIP-Seq data, the signals of these histone modifications are enriched over large regions, but lack well-defined peaks. It is worth pointing out that the property of being ‘diffuse’ is matter of degrees. Besides the modification frequency, the modification profile over a region is also affected by nucleosome densities and the strength of nucleosome positioning. By visual inspection of read-density profiles, we found that H2BK5me1, H3K79me1, H3K79me2, H3K79me3, H3K9me1, H3K9me3 and H3R2me1 show similar diffuse profiles. </div><div id="p-9">There are several general questions about dispersed epigenomic domains that remain unanswered. Many of these questions center around how these domains are established and maintained. One critical step in answering these questions is to accurately locate the boundaries of these domains. However, most of existing methods for ChIP-Seq data analysis were originally designed for identifying transcription factor binding sites. These focus on locating highly concentrated ‘peaks’, and are inappropriate for identifying domains of dispersed histone modification marks (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-4" id="xref-ref-4-1">Pepke <i>et al.</i>, 2009</a>). Moreover, the quality of ‘peak’ analysis is measured in terms of sensitivity and specificity of peak calling (accuracy), along with how narrow the peaks are (precision; often determined by the underlying platform). But for diffuse histone modifications, significant ‘peaks’ are usually lacking and often the utility of identifying domains depends on how clearly the boundaries are located. </div><div class="section methods" id="sec-2"><div class="section-nav"><a class="prev-section-link" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#sec-1" title="1 INTRODUCTION">Previous Section</a><a class="next-section-link" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#sec-3" title="3 EVALUATION AND APPLICATIONS">Next Section</a></div><h2>2 METHODS</h2><div id="p-10">Our method for identifying epigenomic domains is based on hidden Markov model (HMM) framework including the Baum–Welch training and posterior decoding (see Rabiner, 1989 for a general description). </div><div id="p-11"><i>Single sample analysis</i>: we first obtain the read density profile by dividing the genome into non-overlapping fixed length bins and counting the number of reads in each bin. The bin size can be determined automatically as a function of the total number of reads and the effective genome size (<a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S1.5</a>). We model the read counts with the negative binomial distribution after correcting for the effect of genomic deadzones. We first exclude unassembled regions of a genome from our analysis. Second, when two locations in the genome have identical sequences of length greater than or equal to the read length, any read derived from one of those locations will necessarily be ambiguous and is discarded. We refer to contiguous sets of locations to which no read can map uniquely as ‘deadzones’. Those bins within large deadzones (referred to as ‘deserts’) are ignored. For those bins outside of deserts, we correct for the deadzone effect by scaling distribution parameters according to the proportion of the bin which is not within a deadzone (<a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S1.3</a>). </div><div id="p-12">We assume a bin may have one of the two states: foreground state with high histone modification frequency and background state with low histone modification frequency. We developed a two state HMM for segmentation the genome into foreground domains and background domains. </div><div id="p-13"><i>Identifying and evaluating domain boundaries</i>: while predicted domains themselves give the locations of boundaries, we characterize the boundaries with the following metrics. We evaluate domain boundaries based on posterior probabilities of transitions between the foreground state and the background state as estimated by the HMM. For each pair of consecutive genomic bins, the posterior probability is calculated for all possible transitions between those bins. If a boundary corresponds to the beginning of a domain, the boundary score is the posterior probability of a background to foreground transition and vice versa. </div><div id="p-14">Next an empirical distribution of posterior transition probabilities is constructed by computing posterior transition probabilities from a dataset of randomly permuted bins with the same HMM parameters. Those bins whose posterior transition probabilities have significant empirical <i>P</i>-values are kept and consecutive significant bins are joined as being one boundary. We score each boundary with the posterior probability that a single transition occurs in this boundary. The peak of a boundary is set to the start of the bin with the largest transition probability (see <a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S3</a> for details). </div><div id="p-15"><i>Incorporating a control sample</i>: ChIP-Seq experiments are influenced by background noises, contamination and other possible sources of error, and researchers have begun to realize the necessity of generating experimental controls in ChIP-Seq experiments. Two common forms of control exist: a non-specific antibody such as IgG to control the immunoprecipitation, and sequencing of whole cell extract to control for contamination and other possible sources of error. With the availability of a control sample, we use a similar two-state HMM with the novel NBDiff distribution to describe the relationship between the read counts in the two samples. Analogous to the Skellam distribution (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-7" id="xref-ref-7-1">Skellam, 1946</a>), the NBDiff distribution describes the difference of two independent negative binomial random variables (see <a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S1.2</a> for details). </div><div id="p-16"><i>Simultaneously segmenting two modifications</i>: the simultaneous analysis of two histone modification marks may reveal more accurate information about the status of genomic regions. It helps to understand the functions of different histone modification marks. It is also of interest to compare samples from different cells types because histone modification patterns are dynamic and subject to change during cell differentiation. We use the NBDiff distribution to model the read count difference between the two samples, and employ three-state HMM: where the basal state means these two signals are similar, the second state represents the signal in test sample A is greater than that in the test sample B and the third state represents the opposite case (details given in <a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S2.1</a>). </div></div><div class="section-nav"><a class="prev-section-link" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#sec-2" title="2 METHODS">Previous Section</a><a class="next-section-link" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#fn-group-1" title="Footnotes">Next Section</a></div><h2>3 EVALUATION AND APPLICATIONS</h2><div id="p-17">We simulated H3K36me3 ChIP-Seq data and compared RSEG, SICER (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-8" id="xref-ref-8-1">Zang <i>et al.</i>, 2009</a>) and HPeak (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-5" id="xref-ref-5-1">Qin <i>et al.</i>, 2010</a>). In terms of domain identification, RSEG outperforms SICER and HPeak for single-sample analysis and yields comparable results to SICER for analysis with control samples (<a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S4.1 and 4.2</a>). We applied RSEG to H3K36me3 ChIP-Seq dataset from (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-1" id="xref-ref-1-3">Barski <i>et al.</i>, 2007</a>) and found a strong association between H3K36me3 domain boundaries with TSS and transcription termination site (TTS), which supports that RSEG can find high-quality domain boundaries (<a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S4.3</a>). </div><div id="p-18">We applied RSEG to four histone modification marks (H3K9me3, H3K27me3, H3K36me3 and H3K79me2) from two separate studies (<a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-1" id="xref-ref-1-4">Barski <i>et al.</i>, 2007</a>; <a class="xref-bibr" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#ref-3" id="xref-ref-3-2">Mikkelsen <i>et al.</i>, 2007</a>) (<a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S5.1</a>). In particular, we discovered an interesting relationship between the two gene-overlapping marks H3K36me3 and H3K79me2 through boundary analysis. H3K79me2 tends to associate with 5<sup>′</sup>-ends of genes, while H3K36me3 associates with 3<sup>′</sup>-ends. About 41% of gene-overlapping K79 domains cover TSS in contrast to 11% of K36 domains. On the other hand, 84% of K36 domains cover TTS in contrast to 23% of K79 domains (<a class="xref-table" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#T1" id="xref-table-wrap-1-1">Table 1</a>). In those genes with both H3K36me3 and H3K79me2 signals, H3K79me2 domains tend to precede H3K36me3 domains, for example the DPF2 gene (<a class="xref-fig" href="http://bioinformatics.oxfordjournals.org/content/27/6/870.full#F1" id="xref-fig-1-1">Fig. 1</a>) (see <a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S5.2</a> for more information). This novel discovery demonstrates the usefulness of boundary analysis for dispersed histone modification marks. </div><div class="fig pos-float odd" id="F1"><div class="fig-inline"><a href="http://bioinformatics.oxfordjournals.org/content/27/6/870/F1.expansion.html"><img alt="Fig. 1." src="http://bioinformatics.oxfordjournals.org/content/27/6/870/F1.small.gif" /></a><br />
<div class="callout">View larger version:<br />
<ul class="callout-links"><li><a href="http://bioinformatics.oxfordjournals.org/content/27/6/870/F1.expansion.html">In this page</a></li>
<li><a class="in-nw-vis" href="http://bioinformatics.oxfordjournals.org/content/27/6/870/F1.expansion.html" target="_blank">In a new window</a></li>
</ul><ul class="fig-services"><li class="ppt-link"><a href="http://bioinformatics.oxfordjournals.org/powerpoint/27/6/870/F1">Download as PowerPoint Slide</a></li>
</ul></div></div><div class="fig-caption"><span class="fig-label">Fig. 1.</span> <br />
<div class="first-child" id="p-19">The H3K36me3 and H3K79me2 domains and their boundaries at DPF2 (chr11:64,854,646–64,880,304).</div></div></div><div class="table pos-float" id="T1"><div class="table-inline"><div class="callout">View this table:<br />
<ul class="callout-links"><li><a href="http://bioinformatics.oxfordjournals.org/content/27/6/870/T1.expansion.html">In this window</a></li>
<li><a class="in-nw-vis" href="http://bioinformatics.oxfordjournals.org/content/27/6/870/T1.expansion.html" target="_blank">In a new window</a></li>
</ul></div></div><div class="table-caption"><span class="table-label">Table 1.</span> <br />
<div class="first-child" id="p-20">Location of H3K36me3 and H3K79me2 domain boundaries relative to genes</div></div></div><div id="p-21">Finally we applied our three-state HMM to simultaneously analyze H3K36me3 and H3K79me2 (<a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S5.4</a>). The result agrees with the above observations. The application of our three-state HMM to find differentially histone modification regions is given in <a href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btr030/DC1">Supplementary Section S5.3</a>. </div>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-39645542859891639792011-02-26T00:15:00.000-08:002011-02-26T00:38:48.883-08:00Sustainable Scientific Data Archiving ModelAs many researchers may have noticed, NCBI plans to discontinue the Short Read Archive (SRA) service due to budget constraints. This news surprises me, and, I believe, concerns the broad biomedical research community in general. While the biomedial research enters the -omics era and becomes more and more data-driven, the sudden close of SRA raises the question that how the scientific data be archived with a sustainable model? I discuss two strategies to preserve scientific data in a sustainable manner. The first proposes a central data repository that charges data deposition fee. The other approach proposes that the data is stored in P2P manner and a central gateway gathers metadata and tracks links to P2P seeds. <br />
<br />
Sustainable data archiving model includes the following aspects: first the data should include necessary and accurate metadata; second, the data should be stored securely and remains authentic and correct for a long time; third the data should also include essential softwares and scripts to analyze the data; finally the data should be easily searched and accessed by the broader research community now and for a considerate period in the future so that researchers may use the dataset from different perspectives and even re-analyze the data in the future if new hypothesis and analytic methods emerges. <br />
<br />
However as has been note elsewhere, there is a disconnection between the effort to produce the data and the effort to preserve the data. Simply put, funding agencies provide the money for produce the money but not the money to maintain the data. The grant for producing the data is in rather smaller time sale, usually two to five years. Once the grant is over, the project is done and the original researchers switched to other projects, the data produced is in the danger of being lost. Fortunately the biomedical research community has a pretty good record in depositing biological datasets for public research as has been exemplified by GeneBank and GEO. The Short Read Archive is designed to meet the requirements of the massively parallel sequencing reads data. However the discontinuance of this services demonstrates the uncertainty of current data sharing model due to lack of specific funding. Therefore I am considering the following two strategies for the sustain scientific data. <br />
<br />
In the first strategy, we still rely on a central data repository like SRA that curates, stores and distributes biological datasets. To meet the financial requirement of such central repository, it charges certain amount of fee for the data hosted. It works as following: when the original data producer finish their research and submit a paper to a journal. The journal requires that their data is deposited in a certain repository and charges data deposition fee. Next the journal allocates the major part of the data deposition fee to the central data repository. The proposed data deposition fee is charged only once which can therefore be covered by the initial grant of the original data producer. With the ever-decreasing cost of data storage, the continual influx of single-time data deposition fee should keep the central data repository working.<br />
<br />
The second strategy is initially brought up to me by my friend Li Xia and are further inspired by Morgan Langille, the creator of BioTorrents. In the strategy, the data set is stored by multiple hosts <br />
who may have the resources and interest to keep the dataset. Next a central gateway keeps tracks of the BitTorrents seeds to the raw data and also stores the metadata associated with each data, such as the contact of data producer, experimental protocols and descriptions of the raw data. Especially the central gateway stores the version of the raw dataset and the MD5 or SHA sum for the data so that the data users can make sure they are obtaining updated and authentic dataset from essentially unreliable and untrustable data hosts in a P2P network. Since the central gateway needs only to track these metadata, its running cost is significantly smaller than the central data repository and therefore it can work just as a new section in the NCBI infrastructure.<br />
<br />
I hope this discussion publicize the urgency for sustainable scientific data archiving so that the biomedical research community will work out a way after SRA ends.Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-74032746394035776352009-10-12T15:57:00.000-07:002009-10-15T16:16:59.721-07:00Next Generation Genome Browser<div style="text-align: center;"><div style="text-align: left;"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS68AMKhZUjaJqM8wurqOF_7AUpdxYVvklD20XAxeR1S48Al4CTQf_nF4Wm9Pp9Iy38oVzGXJ-B84X4ppASi0t9KK82BAtRAfqhi2b3-Qp1dJmw3HVMDsJZ9F0BHUywpVD_Ke0/s1600-h/ucsc_gb_smn1_human.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 251px; height: 194px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1yyc-_FFo2xROG0VlObxEe36bKK_3XpMP56oFnf3VToz2gBDSrg5aEK2zlKgUm0Ethg7ZbZc21_Mmpt7w85fGARiwBC_kb7OYOIby1SuE8O-EUxZHWC7Bt6b8S7hYZfZ2hv69/s320/google-map-la.jpg" alt="" id="BLOGGER_PHOTO_ID_5391867737878984146" border="0" /><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 279px; height: 189px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS68AMKhZUjaJqM8wurqOF_7AUpdxYVvklD20XAxeR1S48Al4CTQf_nF4Wm9Pp9Iy38oVzGXJ-B84X4ppASi0t9KK82BAtRAfqhi2b3-Qp1dJmw3HVMDsJZ9F0BHUywpVD_Ke0/s320/ucsc_gb_smn1_human.png" alt="" id="BLOGGER_PHOTO_ID_5391867623089287570" border="0" /></a><br /></div><br /></div><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />When David, Fang Fang and I talk about <a href="http://genome.ucsc.edu/">UCSC Genome Browser</a> today, I said "I would like a genome browser like Google Map". Later I find I become more excited about this idea: the next generation genome browser, which provides an more user friendly and powerful platform to ornanize and display genomic information.<br /><br />What does the next genome browser ("genome map") look like ?<br /><br />First, smoother zoom in and zoom out. Genomes are organized in hierarchical structure. Sometimes we need a birdview of the whole genome and sometimes we are interested in subtle local structures. It is of great value if we are change the resolution when examining the genome. So, we need dynamic and smoother zoom in and zoom out just like the little sliding bar in Google map. (update: I came across Jbrowser and Anno J browser that seems to have this function. See reference)<br /><br />Second, advanced searching functions. Current genome browser are only able to search by genomic location, as a result the vast amount of annotation information can not be searched in genome browser. It will be cool there is a search box. Users input a keyword , such as a gene name and the our genome map display those regions match the query.<br /><br />Third, what kind of web technology should we need? Ajax? Database back end? XML? Maybe google map is a good starting point.<br /><br />There seems great possibility that such a genome map will appear and what other features are you looking for in the next generation genome brower?<br /><br />Ref:<br />1. Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. <a href="http://www.ncbi.nlm.nih.gov/pubmed/19570905">JBrowse: A next-generation genome browser.</a> <i>Genome Res.</i> <b>(2009) </b><a href="http://jbrowser.org/">http://jbrowse.org/</a><br />2. AnnoJ Browser <a href="http://www.annoj.org/index.shtml">http://www.annoj.org/index.shtml</a>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-50177016660646436642009-03-06T13:49:00.000-08:002009-03-06T13:51:22.215-08:00Exon/Intron Statistics in Human GenomeData from: <a href="http://www.bioinfo.de/isb/2004/04/0032/main.html#tab-1">http://www.bioinfo.de/isb/2004/04/0032/main.html#tab-1</a><br /><br /><table cellpadding="5"><tbody><tr><td><b>Table 1:</b> </td> <td> Exon - intron distributions for human genome </td> </tr> </tbody></table> <table border="1" cellspacing="0" width="700"> <colgroup> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> <col style="background: rgb(255, 255, 206) none repeat scroll 0% 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <col> </colgroup> <tbody><tr bgcolor="#e4e4e4"> <td class="small" rowspan="2" valign="top"><b>Chr #</b></td> <td class="small" rowspan="2" valign="top"><b>Total # genes</b></td> <td class="small" rowspan="2" valign="top"><b>Total # exons</b></td> <td class="small" rowspan="2" valign="top"><b>Total # introns</b></td> <td class="small" rowspan="2" valign="top"><b>Max # exons/gene</b></td> <td class="small" rowspan="2" valign="top"><b>Chromosome size (determined)</b></td> <td class="small" rowspan="2" valign="top"><b>Avg # of exons/gene</b></td> <td class="small" colspan="2" valign="top"><b>Avg length (bp)</b></td> <td class="small" colspan="2" valign="top"><b>Std dev.</b></td> <td class="small" colspan="2" valign="top"><b>Total length (bp)</b></td> <td class="small" colspan="3" valign="top"><b>Shortest (bp)</b></td> <td class="small" colspan="3" valign="top"><b>Longest (bp)</b></td> </tr> <tr bgcolor="#e4e4e4"> <td class="small">exon</td> <td class="small">intron</td> <td class="small">exon</td> <td class="small">intron</td> <td class="small">exon</td> <td class="small">intron</td> <td class="small">exon</td> <td class="small">intron</td> <td class="small">gene</td> <td class="small">exon</td> <td class="small">intron</td> <td class="small">gene</td> </tr> <tr> <td class="small">1</td> <td class="small">2514</td> <td class="small">22345</td> <td class="small">19831</td> <td class="small">107</td> <td class="small">226828929</td> <td class="small">8.89</td> <td class="small">167.01</td> <td class="small">4736.52</td> <td class="small">229.37</td> <td class="small">14268.19</td> <td class="small">3731870</td> <td class="small">93929919</td> <td class="small">2</td> <td class="small">1</td> <td class="small">78</td> <td class="small">8449</td> <td class="small">476158</td> <td class="small">980961</td> </tr> <tr> <td class="small">2</td> <td class="small">1354</td> <td class="small">12506</td> <td class="small">11152</td> <td class="small">148</td> <td class="small">238349289</td> <td class="small">9.24</td> <td class="small">163.98</td> <td class="small">5883.23</td> <td class="small">226.88</td> <td class="small">17012.24</td> <td class="small">2050855</td> <td class="small">65609873</td> <td class="small">2</td> <td class="small">1</td> <td class="small">90</td> <td class="small">7572</td> <td class="small">483412</td> <td class="small">1897544</td> </tr> <tr> <td class="small">3</td> <td class="small">1394</td> <td class="small">13517</td> <td class="small">12123</td> <td class="small">118</td> <td class="small">195073306</td> <td class="small">9.70</td> <td class="small">164.06</td> <td class="small">6375.63</td> <td class="small">224.21</td> <td class="small">21019.22</td> <td class="small">2217700</td> <td class="small">77291760</td> <td class="small">2</td> <td class="small">1</td> <td class="small">150</td> <td class="small">6654</td> <td class="small">497816</td> <td class="small">990999</td> </tr> <tr> <td class="small">4</td> <td class="small">926</td> <td class="small">8299</td> <td class="small">7373</td> <td class="small">85</td> <td class="small">187239983</td> <td class="small">8.96</td> <td class="small">174.78</td> <td class="small">7168.94</td> <td class="small">266.64</td> <td class="small">19497.08</td> <td class="small">1450541</td> <td class="small">52856617</td> <td class="small">2</td> <td class="small">53</td> <td class="small">132</td> <td class="small">6255</td> <td class="small">494708</td> <td class="small">1467842</td> </tr> <tr> <td class="small">5</td> <td class="small">1186</td> <td class="small">9946</td> <td class="small">8760</td> <td class="small">90</td> <td class="small">177696509</td> <td class="small">8.39</td> <td class="small">189.50</td> <td class="small">7277.28</td> <td class="small">332.86</td> <td class="small">21277.20</td> <td class="small">1884777</td> <td class="small">63748970</td> <td class="small">2</td> <td class="small">1</td> <td class="small">150</td> <td class="small">6574</td> <td class="small">370360</td> <td class="small">930401</td> </tr> <tr> <td class="small">6</td> <td class="small">1306</td> <td class="small">11406</td> <td class="small">10100</td> <td class="small">145</td> <td class="small">169212327</td> <td class="small">8.73</td> <td class="small">173.62</td> <td class="small">5961.61</td> <td class="small">253.56</td> <td class="small">18967.75</td> <td class="small">1980397</td> <td class="small">60212251</td> <td class="small">2</td> <td class="small">31</td> <td class="small">159</td> <td class="small">7152</td> <td class="small">469892</td> <td class="small">1377570</td> </tr> <tr> <td class="small">7</td> <td class="small">2508</td> <td class="small">23045</td> <td class="small">20537</td> <td class="small">82</td> <td class="small">310210944</td> <td class="small">9.19</td> <td class="small">167.87</td> <td class="small">6703.87</td> <td class="small">271.88</td> <td class="small">20177.41</td> <td class="small">3868769</td> <td class="small">137677396</td> <td class="small">2</td> <td class="small">1</td> <td class="small">14</td> <td class="small">11923</td> <td class="small">458139</td> <td class="small">1641567</td> </tr> <tr> <td class="small">8</td> <td class="small">908</td> <td class="small">7823</td> <td class="small">6915</td> <td class="small">86</td> <td class="small">143297300</td> <td class="small">8.62</td> <td class="small">171.16</td> <td class="small">7354.15</td> <td class="small">258.43</td> <td class="small">21384.09</td> <td class="small">1339052</td> <td class="small">50853964</td> <td class="small">2</td> <td class="small">54</td> <td class="small">84</td> <td class="small">7308</td> <td class="small">453268</td> <td class="small">2055833</td> </tr> <tr> <td class="small">9</td> <td class="small">1033</td> <td class="small">8941</td> <td class="small">7908</td> <td class="small">72</td> <td class="small">117790386</td> <td class="small">8.66</td> <td class="small">170.66</td> <td class="small">5351.68</td> <td class="small">253.19</td> <td class="small">14121.26</td> <td class="small">1525926</td> <td class="small">42321074</td> <td class="small">2</td> <td class="small">33</td> <td class="small">105</td> <td class="small">6598</td> <td class="small">276306</td> <td class="small">865661</td> </tr> <tr> <td class="small">10</td> <td class="small">1017</td> <td class="small">10273</td> <td class="small">9256</td> <td class="small">69</td> <td class="small">132016990</td> <td class="small">10.10</td> <td class="small">153.79</td> <td class="small">6412.91</td> <td class="small">219.97</td> <td class="small">20271.48</td> <td class="small">1579898</td> <td class="small">59357955</td> <td class="small">2</td> <td class="small">52</td> <td class="small">105</td> <td class="small">7812</td> <td class="small">482575</td> <td class="small">1727184</td> </tr> <tr> <td class="small">11</td> <td class="small">1567</td> <td class="small">12459</td> <td class="small">10892</td> <td class="small">87</td> <td class="small">130908954</td> <td class="small">7.95</td> <td class="small">177.66</td> <td class="small">4341.42</td> <td class="small">237.03</td> <td class="small">15362.46</td> <td class="small">2213526</td> <td class="small">47286795</td> <td class="small">3</td> <td class="small">1</td> <td class="small">87</td> <td class="small">6183</td> <td class="small">437543</td> <td class="small">1463302</td> </tr> <tr> <td class="small">12</td> <td class="small">1299</td> <td class="small">12399</td> <td class="small">11100</td> <td class="small">89</td> <td class="small">129826379</td> <td class="small">9.55</td> <td class="small">158.07</td> <td class="small">4570.21</td> <td class="small">192.23</td> <td class="small">12979.23</td> <td class="small">1959945</td> <td class="small">50729293</td> <td class="small">2</td> <td class="small">30</td> <td class="small">81</td> <td class="small">6324</td> <td class="small">328545</td> <td class="small">1248678</td> </tr> <tr> <td class="small">13</td> <td class="small">426</td> <td class="small">3784</td> <td class="small">3358</td> <td class="small">83</td> <td class="small">95749578</td> <td class="small">8.88</td> <td class="small">183.47</td> <td class="small">7351.75</td> <td class="small">396.79</td> <td class="small">19082.4</td> <td class="small">694268</td> <td class="small">24687182</td> <td class="small">2</td> <td class="small">37</td> <td class="small">279</td> <td class="small">11555</td> <td class="small">317646</td> <td class="small">1175762</td> </tr> <tr> <td class="small">14</td> <td class="small">854</td> <td class="small">6837</td> <td class="small">6106</td> <td class="small">114</td> <td class="small">87191216</td> <td class="small">8.01</td> <td class="small">176.24</td> <td class="small">5653.70</td> <td class="small">276.66</td> <td class="small">19076.38</td> <td class="small">1204982</td> <td class="small">33826109</td> <td class="small">2</td> <td class="small">51</td> <td class="small">51</td> <td class="small">11304</td> <td class="small">479079</td> <td class="small">1210740</td> </tr> <tr> <td class="small">15</td> <td class="small">843</td> <td class="small">8106</td> <td class="small">7263</td> <td class="small">104</td> <td class="small">81992482</td> <td class="small">9.62</td> <td class="small">169.79</td> <td class="small">4660.70</td> <td class="small">271.38</td> <td class="small">11542.05</td> <td class="small">1376321</td> <td class="small">33850721</td> <td class="small">2</td> <td class="small">1</td> <td class="small">168</td> <td class="small">9527</td> <td class="small">207178</td> <td class="small">620362</td> </tr> <tr> <td class="small">16</td> <td class="small">1093</td> <td class="small">9986</td> <td class="small">8893</td> <td class="small">62</td> <td class="small">79932432</td> <td class="small">9.14</td> <td class="small">166.96</td> <td class="small">3661.25</td> <td class="small">242.60</td> <td class="small">13092.99</td> <td class="small">1667340</td> <td class="small">32559472</td> <td class="small">2</td> <td class="small">1</td> <td class="small">75</td> <td class="small">8607</td> <td class="small">466049</td> <td class="small">1167938</td> </tr> <tr> <td class="small">17</td> <td class="small">1459</td> <td class="small">13179</td> <td class="small">11720</td> <td class="small">74</td> <td class="small">79376966</td> <td class="small">9.03</td> <td class="small">165.08</td> <td class="small">3193.16</td> <td class="small">215.89</td> <td class="small">9875.72</td> <td class="small">2175698</td> <td class="small">37423835</td> <td class="small">2</td> <td class="small">30</td> <td class="small">63</td> <td class="small">4786</td> <td class="small">283762</td> <td class="small">712668</td> </tr> <tr> <td class="small">18</td> <td class="small">367</td> <td class="small">3333</td> <td class="small">2966</td> <td class="small">75</td> <td class="small">74658403</td> <td class="small">9.08</td> <td class="small">174.9</td> <td class="small">7905.40</td> <td class="small">256.53</td> <td class="small">19377.24</td> <td class="small">583054</td> <td class="small">23447419</td> <td class="small">3</td> <td class="small">67</td> <td class="small">225</td> <td class="small">4721</td> <td class="small">411175</td> <td class="small">1189866</td> </tr> <tr> <td class="small">19</td> <td class="small">1609</td> <td class="small">12169</td> <td class="small">10560</td> <td class="small">106</td> <td class="small">55878340</td> <td class="small">7.56</td> <td class="small">187.31</td> <td class="small">2032.87</td> <td class="small">279.92</td> <td class="small">4741.54</td> <td class="small">2279436</td> <td class="small">21467122</td> <td class="small">2</td> <td class="small">1</td> <td class="small">81</td> <td class="small">5059</td> <td class="small">170796</td> <td class="small">298909</td> </tr> <tr> <td class="small">20</td> <td class="small">775</td> <td class="small">6492</td> <td class="small">5717</td> <td class="small">80</td> <td class="small">59424990</td> <td class="small">8.38</td> <td class="small">160.34</td> <td class="small">4403.10</td> <td class="small">215.29</td> <td class="small">13613.39</td> <td class="small">1040952</td> <td class="small">25172558</td> <td class="small">3</td> <td class="small">54</td> <td class="small">135</td> <td class="small">3738</td> <td class="small">303713</td> <td class="small">1108855</td> </tr> <tr> <td class="small">21</td> <td class="small">309</td> <td class="small">2539</td> <td class="small">2230</td> <td class="small">47</td> <td class="small">33924367</td> <td class="small">8.22</td> <td class="small">168.59</td> <td class="small">5086.89</td> <td class="small">306.51</td> <td class="small">16098.67</td> <td class="small">428056</td> <td class="small">11343761</td> <td class="small">3</td> <td class="small">74</td> <td class="small">102</td> <td class="small">5916</td> <td class="small">323563</td> <td class="small">833627</td> </tr> <tr> <td class="small">22</td> <td class="small">671</td> <td class="small">5173</td> <td class="small">4502</td> <td class="small">54</td> <td class="small">34352072</td> <td class="small">7.71</td> <td class="small">171.14</td> <td class="small">3924.83</td> <td class="small">281.85</td> <td class="small">12999.39</td> <td class="small">885356</td> <td class="small">17669584</td> <td class="small">3</td> <td class="small">42</td> <td class="small">38</td> <td class="small">6762</td> <td class="small">447252</td> <td class="small">492969</td> </tr> <tr> <td class="small">X</td> <td class="small">1048</td> <td class="small">8568</td> <td class="small">7520</td> <td class="small">79</td> <td class="small">152118949</td> <td class="small">8.18</td> <td class="small">185.33</td> <td class="small">7627.85</td> <td class="small">299.66</td> <td class="small">23527.35</td> <td class="small">1587926</td> <td class="small">57361443</td> <td class="small">2</td> <td class="small">54</td> <td class="small">129</td> <td class="small">6102</td> <td class="small">493512</td> <td class="small">2217347</td> </tr> <tr> <td class="small">Y</td> <td class="small">98</td> <td class="small">660</td> <td class="small">562</td> <td class="small">44</td> <td class="small">24649555</td> <td class="small">6.73</td> <td class="small">173.74</td> <td class="small">5288.54</td> <td class="small">255.05</td> <td class="small">19676.46</td> <td class="small">114670</td> <td class="small">2972162</td> <td class="small">3</td> <td class="small">67</td> <td class="small">228</td> <td class="small">2493</td> <td class="small">400349</td> <td class="small">681119</td> </tr> </tbody></table>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-46669321328655390672009-02-18T16:28:00.000-08:002013-07-01T17:42:11.851-07:00Emacs Note<b>1. Packages for using emacs as an IDE</b><br />
<br />
http://xtalk.msk.su/~ott/common/emacs/rc/emacs-rc-cedet.el.html<br />
http://cedet.sourceforge.net/<br />
http://cscope.sourceforge.net/<br />
http://ecb.sourceforge.net/<br />
<br />
<b>2. How can I use emacs without gui when I work on a remote machine with a slow connection?</b><br />
<blockquote>
emacs -nw</blockquote>
<br />
<b>3. In emacs shell mode, what setting need I modify to make the shell promote PS1 display correctly, e.g. with color like in a terminal?</b><br />
<br />
Add the following code in your .emacs<br />
<blockquote>
(autoload 'ansi-color-for-comint-mode-on "ansi-color" nil t)<br />
(add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)</blockquote>
<br />
<b>4. After I update to emacs 23, invoking flyspell-mode gives the following error "Enabling flyspell-mode gave an error</b><b>". </b><br />
<b><br />
</b><br />
This is caused by the conflicts between site dictionaries and the dictionaries in emacs 23.It can be fixed as following:<br />
<blockquote>
cd /usr/share/emacs23/site-lisp/dictionaries-common<br />
sudo rm *.el *.elc</blockquote>
<b>5. How do </b><b>I enable double spaces in emacs? </b><br />
<b> </b><br />
This feature is provided in the package setspace. Add the following commands in the preamble of your tex file.<br />
<blockquote>
\usepackage{setspace}<br />
\doublespacing</blockquote>
<b>6. Which font looks pretty in emacs?</b><br />
My personal favourite is Nimbus Mono L regular. <br />
<br />
<b>7. In org-mode, how can I change the default browser?</b><br />
Add the following two lines in your .emacs file:<br />
<blockquote class="tr_bq">
(setq browse-url-browser-function (quote browse-url-generic))<br />
(setq browse-url-generic-program "google-chrome")</blockquote>
<span style="background-color: white; font-family: Verdana, Georgia, serif; font-size: 14px;">Similarly, if you want to use the open sourced version of Chrome Browser instead of the Google rebranded version, replace "google-chrome" with "chromium-browser"; if you want to use firefox, replace </span><span style="background-color: white; font-family: Verdana, Georgia, serif; font-size: 14px;">"google-chrome" with "firefox". </span><br />
<span style="background-color: white; font-family: Verdana, Georgia, serif; font-size: 14px;"><br /></span>
<span style="background-color: white; font-family: Verdana, Georgia, serif; font-size: 14px;"><b>8. I copy some text in emacs, how can I paste the text another application?</b></span><br />
In your .emacs file. add the follow line<br />
<blockquote class="tr_bq">
(setq x-select-enable-clipboard t)</blockquote>
Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-11460603305183467712009-02-16T12:11:00.001-08:002013-12-30T14:44:22.972-08:00Latex Notes<h3>
Tips: </h3>
1. How to type mathematical symbols<br />
<div class="post-body entry-content">
<a href="http://www.artofproblemsolving.com/LaTeX/AoPS_L_GuideSym.php">http://www.artofproblemsolving.com/LaTeX/AoPS_L_GuideSym.php</a><br />
<br /></div>
<span class="post-author vcard">2. How to display the text under an max or sup operator?<br />
<br />
</span>3. Use the following packages to make your docs more pretty<br />
\usepackage{times, fullpage}<br />
<br />
4. How can I input the addition assign (+=) operator in latex? <br />
\mathrel{\mathop+}=<br />
<br />
5. How can I move all figures and tables to the end of article?<br />
Use the package endfloat <a href="http://www.ctan.org/pkg/endfloat">http://www.ctan.org/pkg/endfloat</a><br />
<br />
6. How can I edit and generate files in Chinese?<br />
Use the xelatex command, see a simple template at https://github.com/songqiang/latex-templates/blob/master/latex-template-xelatex.tex <br />
<b></b><br />
<b></b><br />
<b><br /></b>
<b>Good read:</b><br />
<ol>
<li>陈硕: 用 LaTeX 排版技术书籍 <a href="https://github.com/chenshuo/typeset">https://github.com/chenshuo/typeset</a></li>
<li>无有的笔记空间: LaTeX 排版学习笔记 <a href="http://zoho.is-programmer.com/posts/30662.html">http://zoho.is-programmer.com/posts/30662.html </a></li>
</ol>
<br />
<br />Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-45443054498438012492008-12-04T15:06:00.000-08:002009-09-22T18:56:30.235-07:00Installing Ubuntu on HP Pavilion dv 4 1114nrI got a new HP Pavilion dv4 1114nr in this holiday season. It has Windows Vista Home Edition pre-installled. Here is a log on how I install ubuntu on this laptop<br /><br />1. Create Windows Vista recovery disk<br />Boot into Window s Vista. First, since HP does not provide recovery disk with new laptops any longer, you need to create your own recovery disks in case you need Windows Vista in the future. Start -> Recovery Disk Creation and follow the instructions.<br /><br />2. Re-participation the hard drive<br />Windows Vista comes with hand drive resizing and re-participation utilities. That's cool! It saves our trouble to search for a 3rd party software.<br />Follow the instructions in the following documents:<span style="font-size:100%;"><br /></span><a href="http://lifehacker.com/software/vista/screenshot-tour--repartition-your-hard-drive-in-windows-vista-231613.php" class="top">1.</a><span style="font-size:100%;"><a href="http://lifehacker.com/software/vista/screenshot-tour--repartition-your-hard-drive-in-windows-vista-231613.php" class="top"> Screenshot Tour: Repartition your hard drive in Windows Vista</a></span><br />2.<a href="http://windowshelp.microsoft.com/Windows/en-US/Help/f2e9a502-e63c-413d-8804-87326ef4f4cc1033.mspx" logredir=" CTT=InContent"> Can I repartition my hard disk?</a><span id="E6"></span><br /><br />3. Download<br />Don't bother to download ubuntu installation iso and create your own installation CD. If you have internet access (a fair weak condition, isn't it?), you can use Unetbootin (<a href="http://en.wikipedia.org/wiki/UNetbootin">http://en.wikipedia.org/wiki/UNetbootin</a>).<br /><br />I am not exactly sure. There seems a bug with Unetbootin.<br />I participated my hard drive into three particitions: C: windows system partition; D: HP recovery partition; F: unformated free partition, which is intended for Linux installation.<br /><br />But when I select mode as Hard Drive, only C: partition is displayed; I have to select USB Live mode and select F: partition there. I am not sure what this implies, still waiting for the result.<br /><br />5. sound issues<br />After the installation, the speaker and the microphone does not work. Particularly, I could not use skype :-(.<br /><br />Solution to "no sound problem"<br />Open<br />sudo vi /etc/modprobe.d/alsa-base<br />Add the following line to the end of the file<br /><blockquote>options options snd-hda-intel model=laptop enable_msi=1</blockquote><br />Solution to microphone problem:<br />It is possible due to the mic is muted.<br />Open Volume Control by double clicking the icon at top-right corner. Select preference and select the device for recording and playback. And cancel the mutation option.<br /><br />Solution to skype "Audio playback" problem<br />Excute the following command in a terminal<br /><blockquote> <ul><li><br /></li></ul></blockquote><blockquote>killall pulseaudio<br />sudo apt-get remove pulseaudio # this seems not necessary<br />sudo apt-get install esound<br />sudo rm /etc/X11/Xsession.d/70pulseaudio<br /></blockquote>refer to <a href="http://www.econowics.com/news-from-the-net/170/skype-problem-with-audio-playback-ubuntu-810-intrepid-ibex/">http://www.econowics.com/news-from-the-net/170/skype-problem-with-audio-playback-ubuntu-810-intrepid-ibex/</a><br /><br />refer to<br /><a href="https://help.ubuntu.com/community/HdaIntelSoundHowto"></a><a href="https://bugs.launchpad.net/ubuntu/+bug/269586">https://bugs.launchpad.net/ubuntu/+bug/269586</a><a><br /></a>https://help.ubuntu.com/community/HdaIntelSoundHowto<br /><br /><br />6. install skype<br />7. install songbird<br />8. install Java Runtime Environment<br />9. install Open Office 3.0<br />10. install Mac4lin<br />11. install VLC and other codecs<br />12 install sopcast and gsopcast (online TV channel)<br />13 install fcitx Chinese input<br />First remove default scim framework and install fcitx<pre></pre><blockquote><pre>sudo apt-get autoremove scim<br />sudo apt-get install fcitx<br /></pre></blockquote>next modify Xsession to automatically start fictx for all users. Open<span style="font-family: monospace;"><br /></span><blockquote>sudo gedit /etc/X11/Xsession.d/95xinput<br /> </blockquote>and chang it to<pre><blockquote>export XMODIFIERS=@im=fcitx<br />export XIM=fcitx<br />export XIM_PROGRAM=fcitx<br />export GTK_IM_MODULE=fcitx<br />export QT_IM_MODULE=XIM<br />fcitx<br /></blockquote>Open</pre><blockquote><div class="codecontent">sudo vim /usr/lib/gtk-2.0/2.10.0/immodule-files.d/libgtk2.0-0.immodules</div></blockquote>Change the line about xim to<br /><blockquote>"xim" "X Input Method" "gtk20" "/usr/share/locale" "en:ko:ja:th:zh" </blockquote>======<br />Well, I come back to update this post. I just returned this hp laptop. This was the first time I bought a laptop from HP, unfortunately it was an disappointing experience. I have two issues to complain. The cpu fan is too noise. Even after I disabled the feature "Keep fan running" in BIOS, the fan still makes too much noise. The CD -ROM drive is not quiet either; it feels earthquake when the CD drive is working.<br /><br />The recovery too is also annoying. I could not recovery my laptop to factory configuration, either via harddrive recovery tool or via recovery CDs. It failed with the "error 1002"; and the HP customer service can not provide any useful help (they outsource custume serive to India, as a result we have to adapt to Indian English).<br /><br />Anyway, I will blacklist this model from HP: HP Pavilion dv4.<br /><br />Reference:<br /><a href="http://lifehacker.com/software/vista/screenshot-tour--repartition-your-hard-drive-in-windows-vista-231613.php" class="top"></a>1. <span style="font-size:100%;"><a href="http://lifehacker.com/software/vista/screenshot-tour--repartition-your-hard-drive-in-windows-vista-231613.php" class="top">Screenshot Tour: Repartition your hard drive in Windows Vista</a></span><br />2.<a href="http://windowshelp.microsoft.com/Windows/en-US/Help/f2e9a502-e63c-413d-8804-87326ef4f4cc1033.mspx" logredir=" CTT=InContent"> Can I repartition my hard disk?</a><span id="E6"></span><br />3. Unetbootin <a href="http://unetbootin.sourceforge.net/">http://unetbootin.sourceforge.net/</a><br />4. Tutorial: Ubuntu Linux on HP Pavilion<br /><a href="http://aldeby.org/blog/index.php/howto-ubuntu-linux-on-hp-pavilion-dv2000-dv6000-dv9000-series-laptops">http://aldeby.org/blog/index.php/howto-ubuntu-linux-on-hp-pavilion-dv2000-dv6000-dv9000-series-laptops</a><br />5. <a href="http://www.dailygyan.com/2008/11/10-things-you-should-do-immediately.html">http://www.dailygyan.com/2008/11/10-things-you-should-do-immediately.html</a><br />6. Top 10 Ubuntu downloads <a href="http://lifehacker.com/5227309/top-10-ubuntu-downloads">http://lifehacker.com/5227309/top-10-ubuntu-downloads</a><br />7. <a href="http://theindexer.wordpress.com/2009/04/24/to-do-list-after-installing-ubuntu-904-aka-jaunty-jackalope/">http://theindexer.wordpress.com/2009/04/24/to-do-list-after-installing-ubuntu-904-aka-jaunty-jackalope/</a><br />8. Install Microsoft YaHei font <a href="http://hi.baidu.com/zzy011/blog/item/6651e3ed44a9c62f63d09f37.html">http://hi.baidu.com/zzy011/blog/item/6651e3ed44a9c62f63d09f37.html</a><br /><h2 style="text-align: center;"><span style="font-size:14;"><strong><strong></strong></strong></span></h2>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com4tag:blogger.com,1999:blog-25528959.post-60844558987833272132008-11-08T12:53:00.001-08:002011-12-05T19:31:59.581-08:00<R>andom Notes<h4>
<b>1. how to estimate the running time of a R function?</b></h4>
R has a function proc.time() <a href="http://rweb.stat.umn.edu/R/library/base/html/proc.time.html" id="j0ew" title="http://rweb.stat.umn.edu/R/library/base/html/proc.time.html">http://rweb.stat.umn.edu/R/library/base/html/proc.time.html</a><br />
sample code<br />
<pre>## a way to time an R expression: system.time is preferred
> ptm <- proc.time()
> for (i in 1:50) mad(stats::runif(500))
> proc.time() - ptm
user system elapsed
0.039 0.001 0.052
## End(Not run)
</pre>
<h4>
<b>2. string manipulation in R</b></h4>
define a string<br />
> s = "some characters"<br />
<br />
convert other type into a string<br />
> s = as.character(some_variable_in_other_type)<br />
<br />
Convert a string into numbers<br />
> pi = as.numeric("3.14159")<br />
<br />
<br />
string length<br />
>nchar(s)<br />
<br />
string concatenation<br />
> s1 = "string1"<br />
> s2 = "string2"<br />
> paste(s1, s2, sep = "")<br />
<br />
given a vector of strings, vs, return a string that is the concatenation of vs's elements<br />
> vs = c("song", "qiang")<br />
> paste(vs, collapse = "")<br />
"song qiang"<br />
<br />
string splicing<br />
suppose s is a string, how do we slice a substring of the s given starting position and ending position?<br />
we use the following function. there is no default value for stop. it the value of stop is larger the the total<br />
length of string, it is truncated to the length of the string<br />
> substr(s, first = 1, stop = 12)<br />
<br />
string split<br />
<br />
> strsplit("song qiang", split=" ")<br />
[1] "song" "qiang"<br />
<br />
<br />
<h4>
<b>3. when making figures with legend box, the text expand out of legend box when we use dev.copy2eps() to convert the figure image to a eps file</b></h4>
This problem comes from the different specification of font sizes in difference devices. A ugly way to solve this problem is to specify text.width=strwidth("some string"), <br />
where "some string" refers to the longest legend text plus some extra characters. The optimal number of extra characters should be determined by trial and error.<br />
<br />
<b>4. How to handle exceptions in R?</b><br />
Read about two functions try and tryCatch (<a href="http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-save-the-result-of-each-iteration-in-a-loop-into-a-separate-file_003f">R FAQ 7.32</a>). An example with try is shown below:<br />
<blockquote class="tr_bq">
for(i in 1:16)<br />{<br /> result <- try(nonlinear_modeling(i));<br /> if(class(result) == "try-error") next;<br />}</blockquote>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-735068609396554332008-11-08T12:51:00.001-08:002013-07-31T10:44:54.339-07:00GNU/Linux NotesGNU/Linux Notes<br />
<br />
<b>1. How to speed up my Linux booting?</b><br />
See Bootchart <a href="http://www.bootchart.org/index.html" id="p6vo" title="http://www.bootchart.org/index.html">http://www.bootchart.org/index.html</a><br />
and remove unnecessary services in the booting process<br />
<br />
<br />
<span style="font-weight: bold;">2. One important thing to remember when creating a SVN repository</span><br />
In Subversion 1.1, a repository is created with a Berkeley<br />
DB back-end by default. This behavior may change in future<br />
releases. Regardless, the type can be explicitly chosen with<br />
the <tt class="option">--fs-type</tt> argument:<br />
<pre class="screen">$ svnadmin create --fs-type fsfs /path/to/repos
$ svnadmin create --fs-type bdb /path/to/other/repos
</pre>
<div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;">
<br />
Do not create a Berkeley DB repository on a network<br />
share—it <i>cannot</i> exist on a remote<br />
filesystem such as NFS, AFS, or Windows SMB. Berkeley DB<br />
requires that the underlying filesystem implement strict POSIX<br />
locking semantics, and more importantly, the ability to map<br />
files directly into process memory. Almost no network<br />
filesystems provide these features. If you attempt to use<br />
Berkeley DB on a network share, the results are<br />
unpredictable—you may see mysterious errors right away,<br />
or it may be months before you discover that your repository<br />
database is subtly corrupted.<br />
If you need multiple computers to access the repository,<br />
you create an FSFS repository on the network share, not a<br />
Berkeley DB repository. Or better yet, set up a real server<br />
process (such as Apache or <b class="command">svnserve</b>), store<br />
the repository on a local filesystem which the server can<br />
access, and make the repository available over a network.<br />
<a href="http://svnbook.red-bean.com/en/1.1/ch06.html" title="Chapter 6. Server Configuration">Chapter 6, <i>Server Configuration</i></a> covers this process in<br />
detail.</div>
<span style="font-weight: bold;">3. count file numbers in a directory and its directory</span><br />
<br />
total number of files<br />
<span style="font-style: italic;">find . some_directory|wc -l</span><br />
<br />
list number of files in each directory in detail<br />
<blockquote>
<pre>#! /usr/bin/python
import os
import sys
def count(p):
if not os.path.isdir(p):
print "%s\t%d" % (p, 1)
return 1
pls = os.listdir(p)
s = 0
for d in pls:
if os.path.isdir(d):
s += count(d)
else:
s += 1
print "%s\t%d " % (p, s)
return s
p = sys.argv[1]
count(p)
</pre>
<br /></blockquote>
<br />
<b>4. Ubuntu DNS Server Problem</b><br />
Problem Description: I run Ubuntu 9.04 on my computer and use Wicd (Wired and Wireless Network Manager) to configure network settings. However, sometimes when I use wireless network, Wicd is able to connect to routers (pingable), but it fails to parse domain names. There is something wrong with DNS server.<br />
<b><br />
</b><br />
Tentative Solution: 1) First disable all settings related to DNS inside Wicd, i.e. do not use either static or global DNS server; 2) edit /etc/resolv.conf, add available DNS servers; 3) restart computer. 4) [Optional] sometimes if we configure wicd to automatically connect and use static DNS server, Wicd freezes while setting static server. In this case, we can edit /etc/wireless-settings.conf to disable automatic connection and static DNS server.<b></b><br />
<b><br />
</b><br />
<b>5. How to rename files or directories in order to remove white spaces in the filename?</b><br />
<blockquote>
for i in $(ls -1 *|grep " "); do<br />
mv "$i" $(echo $i|sed 's/ /-/g'); <br />
done</blockquote>
<br />
<b>6. How to backup files (or directories) with tar and 7-zip?</b><br />
First we create tar balls with the tar utility and then compress the tar balls with the 7z program. If the content of the file is sensitive, you can encrypt it with the internal encryption option in 7z or with GPG. The code is as following:<br />
<blockquote>
for i in *; do<br />
tar cfv "$i.tar" "$i" && \<br />
7z a "$i.tar.7z" "$i.tar" && \<br />
# rm -rf "$i" && \<br />
# rm -rf "$i.tar"; done<br />
done</blockquote>
<br />
<b>7. how do I output the matching regex pattern in a line?</b><br />
use grep -o PATTERN.<br />
Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0tag:blogger.com,1999:blog-25528959.post-13638197977399384462008-05-07T23:59:00.000-07:002013-02-12T14:45:17.234-08:00Connecting USC VPN Network in Ubuntu[Update 2013-02-12]<br />
Surprisingly, this old post still receive visitors occasionally. Right now, If you just want to browse the internet and download some papers, you may try the web svn service: <a href="http://sslvpn1.usc.edu/">sslvpn1.usc.edu</a>.<br />
<br />
[Original Post:]<br />
At USC, when you use computers on campus, you can use directly electronic resources, databases, electronic journals because you are in USC private network. Now suppose that you go back to your apartment off campus or you travel away from USC, how can you get access to those electronic resources that USC pays for? That's where VPN come into place. VPN, also called IP tunneling, is a secure method to access computer resources in a private network. VPN stands for "virtual private network". Generally speaking, USC runs a VPN server which listens to your call in and access request. You need to run a VPN client on your own computer, which connects to the server and offer you access to USC resources as you are in USC private network.<br />
<br />
However, ITS only provieds official support of VPN clients for Windows (<a href="http://www.usc.edu/its/vpn/vpn3k47win.html#help">link</a>)and Mac OS (<a href="http://www.usc.edu/its/vpn/macosx.html">link</a>). Here we give a VPN solution for linux users (take Ubuntu 8.04 for example).<br />
<br />
1. Install <strong>Network Manager Applet</strong> through the <strong>Add/Remove</strong> in the Ubuntu menu. (Most time, this applet should be installed defautly; if so, just skip to step 2);<br />
<br />
2. Install the VPN plug-in <strong>network-manager-vpnc</strong>. <strong></strong> Open <span style="font-weight: bold;">Synaptic Package Manager</span>, search for <span style="font-weight: bold;">package network-manager-vpnc</span> and install;<br />
<br />
3. Left click the <span style="font-weight: bold;">network manager applet</span> (usually in the top right corner of your screen) and select <em>VPN Connections</em>-><em>Configure VPN</em>-><em>Add</em>. Type a name in the <span style="font-weight: bold;">Connection Name</span> box, USC VPN for example; In <span style="font-weight: bold;">Gateway</span> field, type ; In <span style="font-weight: bold;">vpn3k.usc.edu</span>; In <span style="font-weight: bold;">Group Name</span> field, type <span style="font-weight: bold;">USC</span>. Click the <span style="font-weight: bold;">Optional</span> tab, select <span style="font-weight: bold;">Override user name</span>, type in your USC account (the same as your USC email) in the textbox below. Click <span style="font-weight: bold;">Apply</span>. Close the window titled <span style="font-weight: bold;">VPN Connections</span><br />
<br />
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRDIp-WGV7_8CosyV6kgEBq1m0pQy40CRIjOykrL5fxRHzf8ctCD-oh1bTcP5zCrV9kP2jiSDOON3e4QDET_0BkZbo2J9IXSC95JUMkqnF4tcOt2K2rpms_4n7qF6WXQFW8FIW/s1600-h/Screenshot-Edit+VPN+Connection.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5197911738394698114" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRDIp-WGV7_8CosyV6kgEBq1m0pQy40CRIjOykrL5fxRHzf8ctCD-oh1bTcP5zCrV9kP2jiSDOON3e4QDET_0BkZbo2J9IXSC95JUMkqnF4tcOt2K2rpms_4n7qF6WXQFW8FIW/s400/Screenshot-Edit+VPN+Connection.png" style="cursor: pointer; margin: 0px auto 10px; text-align: center;" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_09OV2g6SANXdBdwJulSpPZWPhTbkf02sCWDltCfCswZ8u1HzEe2soiJ2v9Tko1jCjGEE09h5X6-V8P-HhMa_kUKjUJtStPgEoeuTXMv1oImJlYCi028BfaizbJsKnAJINRNM/s1600-h/Screenshot-Edit+VPN+Connection-1.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"> <img alt="" border="0" id="BLOGGER_PHOTO_ID_5197912464244171170" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_09OV2g6SANXdBdwJulSpPZWPhTbkf02sCWDltCfCswZ8u1HzEe2soiJ2v9Tko1jCjGEE09h5X6-V8P-HhMa_kUKjUJtStPgEoeuTXMv1oImJlYCi028BfaizbJsKnAJINRNM/s400/Screenshot-Edit+VPN+Connection-1.png" style="cursor: pointer; margin: 0px auto 10px; text-align: center;" /></a></div>
<br />
4. Left click the <span style="font-weight: bold;">network manager applet</span> and select <em style="font-weight: bold;">VPN Connections</em> then click on USC connection (USC VPN) to connect. In the above <span style="font-weight: bold;">password</span> box, type in your password associated with your USC account; in the below <span style="font-weight: bold;">Group password</span>, type <span style="font-weight: bold;">GoTrojan</span>. OK, we are done!<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXRt3eSGglRFXVIDK46Eutl6P5wo8cMt10FAHDqdkG1XDDjB3i702VsX_f2ZMtqLMBLQ0bbG8Iy3OXJFYVN_apIX6R7TAzOLTo66qbsGf0wfRWyVVWpU8WrpPXSM9qZcf3bFt3/s1600-h/Screenshot-Authenticate+VPN.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5197911064084832626" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXRt3eSGglRFXVIDK46Eutl6P5wo8cMt10FAHDqdkG1XDDjB3i702VsX_f2ZMtqLMBLQ0bbG8Iy3OXJFYVN_apIX6R7TAzOLTo66qbsGf0wfRWyVVWpU8WrpPXSM9qZcf3bFt3/s400/Screenshot-Authenticate+VPN.png" style="cursor: pointer; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
This tutorial is based on Ubuntu. I think you can also configure VPN client in Debian, Fedora, OpenSuse and other Linux distrobutions.<br />
<br />
References:<br />
1.VPN Client on Ubuntu <a href="https://help.ubuntu.com/community/VPNClient">https://help.ubuntu.com/community/VPNClient</a><br />
2. Configuring the Cisco VPN 3000 Client (Windows 2000/XP/Vista) <a href="http://www.usc.edu/its/vpn/vpn3k47win.html#help">http://www.usc.edu/its/vpn/vpn3k47win.html#help</a>Song Qianghttp://www.blogger.com/profile/12615028439413983120noreply@blogger.com0