Rob Hills
2017-06-10 07:08:57 UTC
Hi,
I am debugging a project to convert a Forum from WebWiz to phpBB.
WebWiz stores its data in an Access DB and our phpBB forum will be on MySQL.
I am using mdb-tools version 0.7.1 on Ubuntu 16.04LTS 64-bit. According
to mdb-ver, the mdb file I am working with is JET4.
The problem I am trying to solve involves forum post text that includes
some characters outside the basic character set. A specific example is
the "half space" character whose UTF-8 representation I believe is the
3-byte sequence E2 80 89. My problem is that when I use mdb-export,
these characters end up being converted to ââ¬â° (Hex: C3 A2 E2 82 AC E2
80 B0).
If I open this database in M$Access and use its export tool, I end up
with the expected UTF-8 representation of these characters in my output
file (E2 80 89).
I've Googled extensively and tried various permutations of the
MDB_JET3_CHARSET and MDBICONV environment variables without any change
to the output.
For example the following command
mdb-export test-forum.mdb tblThread
produces exactly the same output as:
MDB_JET3_CHARSET="UTF-8" mdb-export test-forum.mdb tblThread
other tries include:
MDB_JET3_CHARSET=UTF-8 mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET="utf-8" mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET=utf-8 mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET="utf-8" mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET="CP1252" mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET=CP1252 mdb-export test-forum.mdb tblThread
In each case, the output is the same: normal text is exported correctly,
but the extended characters seem to be double-encoded. As the original
DB is 200MB, I have created a stripped down copy containing just one row
in this table with the "message" field containing text that includes a
number of these special characters.
Is this a bug?
I'm happy to PM a copy of my test DB (377K) if anyone wants to
investigate further.
Cheers,
I am debugging a project to convert a Forum from WebWiz to phpBB.
WebWiz stores its data in an Access DB and our phpBB forum will be on MySQL.
I am using mdb-tools version 0.7.1 on Ubuntu 16.04LTS 64-bit. According
to mdb-ver, the mdb file I am working with is JET4.
The problem I am trying to solve involves forum post text that includes
some characters outside the basic character set. A specific example is
the "half space" character whose UTF-8 representation I believe is the
3-byte sequence E2 80 89. My problem is that when I use mdb-export,
these characters end up being converted to ââ¬â° (Hex: C3 A2 E2 82 AC E2
80 B0).
If I open this database in M$Access and use its export tool, I end up
with the expected UTF-8 representation of these characters in my output
file (E2 80 89).
I've Googled extensively and tried various permutations of the
MDB_JET3_CHARSET and MDBICONV environment variables without any change
to the output.
For example the following command
mdb-export test-forum.mdb tblThread
produces exactly the same output as:
MDB_JET3_CHARSET="UTF-8" mdb-export test-forum.mdb tblThread
other tries include:
MDB_JET3_CHARSET=UTF-8 mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET="utf-8" mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET=utf-8 mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET="utf-8" mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET="CP1252" mdb-export test-forum.mdb tblThread
MDB_JET3_CHARSET=CP1252 mdb-export test-forum.mdb tblThread
In each case, the output is the same: normal text is exported correctly,
but the extended characters seem to be double-encoded. As the original
DB is 200MB, I have created a stripped down copy containing just one row
in this table with the "message" field containing text that includes a
number of these special characters.
Is this a bug?
I'm happy to PM a copy of my test DB (377K) if anyone wants to
investigate further.
Cheers,
--
Rob Hills
Waikiki, Western Australia
Mobile: +61 (412) 904-357
Rob Hills
Waikiki, Western Australia
Mobile: +61 (412) 904-357