At first I tried a procedure using iconv.
This procedure is described below and it was based on one found at http:/bryan-murdock.blogspot.com/2008/11/convert-postgresql-database-from-latin1.html
but unfortunately it didn't work for me.
So, I decided to try another alternative, that should be simple and fast, which is described here.
Procedure using editor's encoding and manual settings
Dump the database:
pg_dump -U postgres rental > rental/rental.dbs.out
Edit the file dumped on the prior step using an editor which has encoding features.
My default editor is SciTE and it was the editor that I used to do this.
Note: this editor has Windows and Linux versions.
On the editor, change the original encoding used for the file, switching to UTF-8, and "save as" the document with a new name.
Here, on this procedure I've saved it as "rental/rental.dbs.out.utf8".
After that, change the PostgreSQL's encoding on the same file.
You'll find something like this:
--
-- PostgreSQL database dump
--
SET client_encoding = 'LATIN1';
SET standard_conforming_strings = off;
...
Switch to:
--
-- PostgreSQL database dump
--
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
...
Save the document and exit.
Remember, save it as UTF-8.
NOTE: if the dump file opens on the editor and shows gibberish and goofy signals instead of the normal data,
you must check your editor encoding or the procedure used to generate the dump file.
The text must appear legible as it should be.
Drop the old latin1 database:
dropdb rental
Log as postgres or another user with the necessary privileges:
psql -U posgres
Create the new utf-8 database:
create database rental with template = template0 owner = postgres encoding = 'utf8';
Connect to the new database created:
\c rental
Restore the converted dumped file - the one which you edited and saved as UTF-8 changing its encoding:
\i rental/rental.dbs.out.utf8
It's done.
Below, you find the final checking topic with the result obtained.
Handling Huge Files
But if the file is huge, for instance bigger than 300MB?
Using editors with huge files become difficult.
The solution is to split the file and when the procedure is finished, join the split files again.
To split you can you use cutf, a simple command line tool.
For help, just type cutf.
To append the files back, you can use:
On linux:
cat file_to_append >> file_to_join
On windows:
type file_to_append >> file_to_join
That's it.
Final checking
rental=# \c postgres
You are now connected to database "postgres".
postgres=# create database rental with template = template0 owner = postgres encoding = 'utf8';
CREATE DATABASE
postgres=# \c rental
You are now connected to database "rental".
rental=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
rental=# show server_encoding;
server_encoding
-----------------
UTF8
(1 row)
rental=# select * from category;
cod_category | description
---------------+-------------------
1 | ação
2 | aventura
3 | biográficos
4 | comédia
5 | drama
6 | épicos
7 | espionagem
8 | fantasia
9 | ficção científica
10 | ficção histórica
11 | guerra
Attempt using iconv
Dump the database:
pg_dump -U postgres rental > rental/rental.dbs.out
Convert it to utf-8 with iconv:
iconv --from-code latin1 --to-code utf-8 rental/rental.dbs.out > rental/rental.dbs.out.utf8
Drop the old latin1 database:
dropdb rental
Created the new utf-8 database:
psql -U posgres
create database rental with template = template0 owner = postgres encoding = 'utf8';
Restore the converted backup:
psql -U postgres rental < rental/rental.dbs.out.utf8
>Before
postgres=# \l
List of databases
Name | Owner | Encoding
----------------+----------+----------
rental | postgres | LATIN1
rental=# select * from category;
cod_category | description
---------------+----------------------
1 | ação
2 | aventura
3 | biográficos
4 | comédia
5 | drama
6 | épicos
7 | espionagem
8 | fantasia
9 | ficção científica
10 | ficção histórica
11 | guerra
>After
rental=# \l
List of databases
Name | Owner | Encoding
----------------+----------+----------
rental | postgres | UTF8
rental=# select * from category;
cod_category | description
---------------+-------------------------------------------
1 | aÃ\u0083§Ã\u0083£o
2 | aventura
3 | biogrÃ\u0083¡ficos
4 | comÃ\u0083©dia
5 | drama
6 | Ã\u0083©picos
7 | espionagem
8 | fantasia
9 | ficÃ\u0083§Ã\u0083£o cientÃ\u0083Âfica
10 | ficÃ\u0083§Ã\u0083£o histÃ\u0083³rica
11 | guerra
No comments:
Post a Comment