There are many scripts using perl,php,python etc. that will do this for you
but the way you are about to see will make you smile of the simplicity of it .
instead of going over the file line by line and search inside , i am going to use
a tool that is going to do that for me . this tool is lynx , the console browser .
and here is how it works :
lynx -dump file_name.html
now, lets say our table looks like this :
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
to create a csv file from it , one would do something like this :
use ‘tr’ command to fold all spaces
tr -s " "
now , lets use sed to do the rest of the work for us .
this sed command will remove the first space/tab from the beginning of the lines
sed 's/^[ t]*//'
this sed command will place comma “,” as delimiter instead of space delimiter
sed 's/ /,/g'
So in the end we will end up with a simple one line command that creates a csv from html
lynx -dump file_name.html | tr -s " "|sed -e 's/^[ t]*//' -e 's/ /,/g' > file_name.csv
* note : the method shown here can work as long as there are no spaces in cell data