To convert a file (input.txt) to all lower case (output.txt), choose any ONE of the following:
The Beautiful science
As usual, in Linux, there are more than 1 way to accomplish a task.
dd
$ dd if=input.txt of=output.txt conv=lcase
tr
$ tr '[:upper:]' '[:lower:]' < input.txt > output.txt
awk
$ awk '{ print tolower($0) }' input.txt > output.txt
perl
$ perl -pe '$_= lc($_)' input.txt > output.txt
sed
$ sed -e 's/\(.*\)/\L\1/' input.txt > output.txt
We use the backreference \1 to refer to the entire line and the \L to convert to lower case. To convert a file (input.txt) to all upper case (output.txt):
dd
$ dd if=input.txt of=output.txt conv=ucase
tr
$ tr '[:lower:]' '[:upper:]' < input.txt > output.txt
awk
$ awk '{ print toupper($0) }' input.txt > output.txt
perl
$ perl -pe '$_= uc($_)' input.txt > output.txt
sed
$ sed -e 's/\(.*\)/\U\1/' input.txt > output.txt
These oneliners can be used to convert the lowercase chars in FASTA file to uppercase and vice versa etc. Cheers
(Source: linuxcommando.blogspot.de)
Fire up the terminal and replace the username and password with your login details and you can donwload a heavy file or a list of files in the backgroud.
where files, is a text file containing the link of the files to be downloaded.
Many of you might have observed a problem with the removal of the files having a special characters in them. For instance, $1.txt, -bg etc.
# removing $1.txt
rm \$1.txt
# if its the only one file
rm *.txt
# removing -bg, its a little tricky as - denotes the parameter input for rm. So, from the manual we will use ‘- -‘ - - A - - signals the end of options and disables further option processing. Any arguments after the - - are treated as file-names and arguments. An argument of - equals to - -
rm - - -bg
For some other complex characters. you can always use grep match. Consider a file name; r.34$@
# command
rm `ls * | egrep ‘*@’`
I hope that helps.
Cheers
Yo guys, suppose you are running a perl code from someone, and get a error while execution like :
Can’t locate Parallel/ForkManager.pm in @INC (@INC contains: blah blah)
It means you are missing ForkManager.pm (ofcourse), but how to install it, which library corresponds to it???
To find the right package, use ‘apt-file’ utility but install it first as:
sudo apt-get update sudo apt-get install apt-file sudo apt-file update
Now search the module
sudo apt-file search ForkManager.pm
which gives
libproc-processtable-perl: /usr/lib/perl5/Proc/ProcessTable.pm
so , just install the library then
sudo apt-get install libproc-processtable-perl
Thats it, go and have fun. Cheers
(Source: vdr-portal.de)
Hola!
Wanna fetch lot of files from different links in one line.
Command
wget -b -i links.txt
* put all the links in a text file called links.txt
** -b puts the wget process in background which is cool, if you dont want your terminal flooded with text.
More examples here : http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/
Cheers
Hola!
Opening file, adding ‘>’ and new line character “\n” to the start of each line “$0” and saving the output.
cat file.seq | awk ‘{ print “>\n” ” “$0 }’ > file.fasta
Cheers
Hola!
When you excede the disk quota, sometimes you get error while removing files.
Trick :
copy /dev/null to files and then you can remove them.
# removing all files at a once in script
for i in `ls *`; do cp /dev/null $i; rm $i; done
Explanation:
ZFS is a copy-on-write filesystem, so a file deletion transiently takes slightly more space on disk before a file is actually deleted. It has to write the metadata involved with the file deletion before it removes the allocation for the file being deleted. This is how ZFS is able to always be consistent on disk, even in the event of a crash.
Cheers
(Source: bfg.uni-freiburg.de)
Hola people!
After extending the reads directionally (+ & -) in a bed (or bedGraph) file, the most likely error is that, the addition causes some reads to show postions, which are out of limits as compared to the real genomic coordinates.
So, to fix this there’s an utility called bedClip.
Usage:
bedClip input.bed mm9 output.bed
mm9 - mm9 is a per chromosome total length text file.
or use the following awk script, it replaces -ve coordinates in 2nd column by 0, but doesn’t removes them.
awk '{ if( $2 ~ /^-/ ){sub($2,0, $2); print $0;}else{print $0}}' file.bed >out.bed
Have Fun
Sukhdeep Singh
Hola!
A little scientific post, converting newly mapped chip-seq BAM file to bedGraph with strand information. A little better lookin modification from the original post of Aaron Quinlan.
# getting ‘+’ strand coverage
bamToBed -i file.bam | awk ‘$6==”+”’ | genomeCoverageBed -i stdin -bg -g mm9 > plus.bg
# getting ‘-’ strand coverage
bamToBed -i file.bam | awk ‘$6==”-“’ | genomeCoverageBed -i stdin -bg -g mm9 > minus.bg
# adding strand information to the respective bedGraph files obtained from the previous step as we already know the file source in terms of strand
cat plus.bg | awk ‘{print $0”\t+”}’>plus_strand.bg
cat minus.bg | awk ‘{print $0”\t-“}’>minus_strand.bg
# concatanating these to files to a single file
cat plus_strand.bg minus_strand.bg > full_strand.bg
# sorting file using chromosome and co-ordinate information
sort -k1,1 -k2,2n full_strand.bg > full_strand_sorted.bg
# adding fourth column with some id, in this case ‘row number’ to make it a 6 column bed file
awk ‘{OFS=”\t”; print $1,$2,$3,NR,$4,$5}’ full_strand_sorted.bg > wanted.bg
Output
chr1 3001029 3001080 1 1 -
chr1 3001447 3001498 2 1 +
chr1 3002155 3002206 3 1 -
chr1 3002351 3002402 4 1 -
chr1 3004372 3004423 5 1 +
chr1 3004950 3005001 6 1 -
chr1 3005014 3005065 7 1 -
chr1 3006174 3006225 8 1 -
chr1 3006337 3006388 9 1 -
chr1 3006445 3006496 10 1 +
(Source: groups.google.com)
Leute!!
Suppose you have a list of 20 dataframes, each of data frame has 10 columns but different rows. You want to know the length of rows in each dataframe without using loop in a single line.
sapply(x,function(y)length(y[[1]]))
where x is a list
If you are after a specific column, just replace the number [[1]] with it.
Cheers
Sukhi