אריאל קלגסבלד Ariel Klagsbald
2011-09-21 09:55:12 UTC
I hope this is the place to post such a problem. And I also hope my
diagnosys is correct (that it's really is an encoding problem. I'm not
sure).
Well, I have a large mdb file, in which one of the fields contains strings like
0007-20101223-214033-שמות-בגדר_שם.mp3
or
0007-20110714-213442-יום_טוב_שני_של_גלויות.mp3
That is, part english, part numbers and part Hebrew (yes, that's
hebrew, in case you can't see it in your browser).
When I use mdb-export to extract data from this file, I get the
numbers correctly, but only them. The hebrew and english parts are
simply missing (even the '3' in the 'mp3' suffix). That is, when I
extract the latter example I get only
0007-20110714-213442
I'll add that other fields contain only hebrew (e.g.
יום טוב שני של גלויות, יב' תמוז, תשע'א
in the example ebove), and they seem to be extracted correctly. That
is, I get some gibberish which I guess is the correct data, only my
terminal can't present it.
I though it might be an encoding problem, so I've played a bit with
MDB_ICONV, MDB_JET_CHARSET, MDB_JET3_CHARSET and MDB_JET4_CHARSET but
it showed no difference.
The file seems to be JET4 (so mdb-ver claims). I've no idea what
encoding does it use (I don't know how to find out. Any ideas?), but I
guess it's utf-8 (only a guess).
I'll be grateful for any help!
Ariel.
diagnosys is correct (that it's really is an encoding problem. I'm not
sure).
Well, I have a large mdb file, in which one of the fields contains strings like
0007-20101223-214033-שמות-בגדר_שם.mp3
or
0007-20110714-213442-יום_טוב_שני_של_גלויות.mp3
That is, part english, part numbers and part Hebrew (yes, that's
hebrew, in case you can't see it in your browser).
When I use mdb-export to extract data from this file, I get the
numbers correctly, but only them. The hebrew and english parts are
simply missing (even the '3' in the 'mp3' suffix). That is, when I
extract the latter example I get only
0007-20110714-213442
I'll add that other fields contain only hebrew (e.g.
יום טוב שני של גלויות, יב' תמוז, תשע'א
in the example ebove), and they seem to be extracted correctly. That
is, I get some gibberish which I guess is the correct data, only my
terminal can't present it.
I though it might be an encoding problem, so I've played a bit with
MDB_ICONV, MDB_JET_CHARSET, MDB_JET3_CHARSET and MDB_JET4_CHARSET but
it showed no difference.
The file seems to be JET4 (so mdb-ver claims). I've no idea what
encoding does it use (I don't know how to find out. Any ideas?), but I
guess it's utf-8 (only a guess).
I'll be grateful for any help!
Ariel.