Title: Input encoding Post by: sciurius on December 27, 2018, 12:32:56 PM A footnote in the documentation reads:
Quote MMA is pretty open about the “encoding” of the file, but to keep Python 3.x happy you should use “cp1252” (a standard Windows format). Can you elaborate on this apparent restriction? Title: Re: Input encoding Post by: bvdp on December 27, 2018, 04:02:53 PM A footnote in the documentation reads: Quote MMA is pretty open about the “encoding” of the file, but to keep Python 3.x happy you should use “cp1252” (a standard Windows format). Can you elaborate on this apparent restriction? It's pretty much just a matter of what the various routines which do character conversions are happy with. If you prepare your input files with what we used to call ASCII or Latin8 you'll be fine. If you want more details than my little brain can provide, a starting point is: https://en.wikipedia.org/wiki/Windows-1252 MMA will get upset and probably crash and burn and delete all the data on the servers in Washington, DC if it encounters non-ascii data in it's input ... but, for the most part it's nothing to worry about :) Title: Re: Input encoding Post by: sciurius on December 28, 2018, 02:35:13 PM So far I've been unable to delete all the data on the servers in Washington 8).
There are three places where the cp1252 encoding is enforced:
I'll try to work out some enhancements. Now if only we had a git repo ;D. Title: Re: Input encoding Post by: bvdp on December 28, 2018, 05:31:22 PM Until I converted MMA to work in both python 2 and 3 there was no encoding at all. It's really just a "problem" with python3.x :)
However, I don't see it really being that much of a problem. - when opening source files in PY3 one really does need to guess as to the nature of the file. I don't think that restricting to a "latin 8" type of character set is a big deal. If non-english characters are needed, they can be inserted as multi byte things. - I really don't have any access to non-latin8 data. But, it might be a thought to have an environment variable "MMA_ENCODING" and to insert that for encoding values in the 3 locations where it is used. At least I'd be off the hook if there are any problems :) Easy enough to do at this end: Just look for the variable and save it in globals and then insert it when needed. I think I picked cp1252 as a "reasonable value to use". |