Welcome to the Linux Foundation Forum!

iconv and sed help

Hi,

I have a file which is a UTF-8 file type which i need to convert into ISO-8859-1 file type.

Now the UTF-8 file type contains characters like å/ä/ö and i dont want these characters.

So, i apply the sed command.

$ sed "s/å/aa/g; s/ä/aaa/g; s/ö/ooo/g" utf8.txt > output.txt


Now when i view this file, there are no such characters like å/ä/ö

Then,

i use iconv command to covert that UTF-8 (output.txt) file type into ISO-8859-1 file type

$ iconv -c -f UTF-8 -t ISO-8859-1 < output.txt > newfile


BUT

when i view the file type using file command it tells that its an ASCII file type not the ISO-8859-1

$ file newfile
newfile: ASCII text, with CRLF line terminators


I don't understand what went wrong. I have also attached that UTF-8 file with this post.

Please help.

usmangt

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,177
    I have went through your exact procedure on slackware 13.1 and my output file is showing as:
    ut3.txt: ISO-8859 text, with very long lines

    The way that the data is read and displayed may be controlled by a deeper configuration within your OS, can you share what distro you use so those familiar with it can tell you where those settings are?
  • I am using Linux Fedora 13 distribution.
  • Hi,

    I am so Sorry that i have attached the wrong file (actually both are of same name but in different folder on my machine).

    This is the one which is causing the problem.
  • Here is the file.

    Don't know why it become such long name when uploading.

    [file name=utf8-7a6351909c73ba4a81575d6ad10cf46f.txt size=1131]http://www.linux.com/media/kunena/attachments/legacy/files/utf8-7a6351909c73ba4a81575d6ad10cf46f.txt[/file]
  • Posts: 2,177
    Now that I have processed your original file I am getting the same issue, it appears that something is different between the files.

    The two files are very different. I have concatinated your command to
    1. sed "s/å/aa/g; s/ä/aaa/g; s/ö/ooo/g" utf8.txt|iconv -c -f UTF-8 -t ISO-8859-1 -o out.txt

    when I ran that command against both files I got the following output:
    1. matt:~/Desktop$rm *.txt.txt;for i in `ls|grep utf|grep -v "txt\.txt"`;do sed "s/å/aa/g; s/ä/aaa/g; s/ö/ooo/g" $i|iconv -c -f UTF-8 -t ISO-8859-1 -o $i.txt ;file $i;file $i.txt;done
    2. utf8.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators
    3. utf8.txt.txt: ISO-8859 text, with very long lines, with CRLF line terminators
    4. utf82.txt: UTF-8 Unicode text
    5. utf82.txt.txt: ASCII text

    Based upon the output it looks as though the line terminators in the second file are not ISO-8859-1 compliant, but the iconv applications does not correct those.
  • Thank you for analyzing and checking it. Yes i doubt the same thing also concern about the ' - ' ( minus symbol/character ) in the file.

    Do you think if there is a solution for this.


    Thank you

    usmangt
  • Posts: 2,177
    Can you tell me if the two files were created on different platforms, such as file1 being created in windows and file2 being created in Linux?
  • Well both are created on Linux

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training