[Q&A Friday] Choose the encoding charset for zip filenames

Fri 20 July 2012 By Laurent Doguin

UTF-8 ZIP file encoding UTF-8 zip file encoding

Here's a question from Michal: how can I change the encoding of the zip file created by Nuxeo?

He explains that when he downloads a zip containing documents in his WorkList, the Polish characters in the filenames disappear. So what he did is set the zip.entry.encoding property from nuxeo.conf to UTF-8. This is actually the default value used by Nuxeo so it won't change anything. The problem comes from the ZIP specification which does not state anything about charset encoding. So this will change from one ZIP client to another. And unfortunately there's not much you can do. As Florent explained to me a while ago:

This is a well-known problem, Windows and MacOS/Unix use different conventions for filename encoding in ZIP created from their desktop zippers/unzippers. Create a ZIP containing a file with a non-ASCII filename on Windows and it comes out garbled on MacOS/Unix and vice versa - this has nothing to do with Nuxeo.

That's why we've added the possibility to choose ascii instead of utf-8 for the zip charset. Support for other charsets is still ongoing work and can be tracked by this jira ticket. If you want to help, the code is on Github, pull requests are always welcome :-)

Category: Product & Development
Tagged: Java, Q&A