Kennis Blogs File upload and character sets / encoding

File upload and character sets / encoding

Door Gert-Jan van de Streek / / 1 min

When working with file uploads from a browser it is good to realize that you don't know what is coming. Character set wise that is. You simply do not get a hint from your browser that says: here is a UTF-8 encoded Unicode text file. Or, beware, this document I am sending you now is created on a Windows machine, using the windows-1252 character set.

Why don't browsers do this? The answer is rather simple: they don't have a clue either. The file is read from disk and most file systems don't store meta information on character set or encoding.

How do we correctly deal with that? There is only 1 valid option. The person uploading the file must tell us what it is. If you have a form with a file upload, put a drop down next to it with a list of character sets and let the user indicate what he is sending. If it's a system sending in files via REST make sure you know what it sending, or give it a parameter to indicate the character set used.

That is the only solution that is 100% guaranteed. If you want to try something more advanced, look at IBM's icu project (Java / C/ C++). It has functionality that detects the charset or encoding of character data in an unknown format, but the results can not be guaranteed to always be correct.

| Software Development

Door Gert-Jan van de Streek / okt 2024

Vond je deze post leuk?

Dan denken we dat dit ook wat voor jou is.

Lees meer

Software is maintenance

Door Gert-Jan van de Streek / jan 2016 / 1 Min

Building software for fun

Door Gert-Jan van de Streek / mrt 2013 / 1 Min

Food for thought in a constant debate: hours vs story points

Door Gert-Jan van de Streek / jul 2012 / 1 Min

Project versus product

Door Gert-Jan van de Streek / mrt 2012 / 1 Min

Social intranets solve real business problems

Door Avisi / jan 2012 / 1 Min

Design for the present, not for the future

Door Gert-Jan van de Streek / dec 2011 / 1 Min

Serious business, or just tinkering?

Door Avisi / nov 2012 / 1 Min

Quality - The Team and the Details

Door Barri Jansen / okt 2012 / 1 Min

What makes a great software engineer?