Kara-Moon Forum

Developers & Technology => Musical MIDI Accompaniment (MMA) => Topic started by: sciurius on December 27, 2018, 12:32:56 PM



Title: Input encoding
Post by: sciurius on December 27, 2018, 12:32:56 PM
A footnote in the documentation reads:

Quote
MMA is pretty open about the “encoding” of the file, but to keep Python 3.x happy you should use “cp1252” (a standard
Windows format).

Can you elaborate on this apparent restriction?


Title: Re: Input encoding
Post by: bvdp on December 27, 2018, 04:02:53 PM
A footnote in the documentation reads:

Quote
MMA is pretty open about the “encoding” of the file, but to keep Python 3.x happy you should use “cp1252” (a standard
Windows format).

Can you elaborate on this apparent restriction?

It's pretty much just a matter of what the various routines which do character conversions are happy with. If you prepare your input files with what we used to call ASCII or Latin8 you'll be fine. If you want more details than my little brain can provide, a starting point is: https://en.wikipedia.org/wiki/Windows-1252

MMA will get upset and probably crash and burn and delete all the data on the servers in Washington, DC if it encounters non-ascii data in it's input ... but, for the most part it's nothing to worry about :)


Title: Re: Input encoding
Post by: sciurius on December 28, 2018, 02:35:13 PM
So far I've been unable to delete all the data on the servers in Washington  8).

There are three places where the cp1252 encoding is enforced:

  • When opening the .mma source
    This can be dealt with by opening the file in raw mode, and try convert it from utf8 first, if that fails, use cp1252.
  • When decoding strings read from MIDI
  • When encoding strings written to MIDI
    Unfortunately there is no officially defined way to set encodings in the MIDI file, but there are some ways to deal with this. Think the popularity of Karaoke in Japan.

I'll try to work out some enhancements. Now if only we had a git repo  ;D.


Title: Re: Input encoding
Post by: bvdp on December 28, 2018, 05:31:22 PM
Until I converted MMA to work in both python 2 and 3 there was no encoding at all. It's really just a "problem" with python3.x :)

However, I don't see it really being that much of a problem.

 - when opening source files in PY3 one really does need to guess as to the nature of the file. I don't think that restricting to a "latin 8" type of character set is a big deal. If non-english characters are needed, they can be inserted as multi byte things.

 - I really don't have any access to non-latin8 data. But, it might be a thought to have an environment variable "MMA_ENCODING" and to insert that for encoding values in the 3 locations where it is used. At least I'd be off the hook if there are any problems :) Easy enough to do at this end: Just look for the variable and save it in globals and then insert it when needed. I think I picked cp1252 as a "reasonable value to use".