Quantcast
Channel: Detect encoding of URL in Java - Stack Overflow
Viewing all articles
Browse latest Browse all 4

Detect encoding of URL in Java

$
0
0

I have a case of mixed Data in a Database, and I am trying to see if this is a problem that can be solved. What I have is a partial URL in one of three formats:

/some/path?ugly=häßlich // case 1, Encoding: UTF-8 (plain)/some/path?ugly=h%C3%A4%C3%9Flich // case 2, Encoding: UTF-8 (URL-encoded)/some/path?ugly=h%E4%DFlich // case 3: Encoding: ISO-8859-1 (URL-encoded)

What I need in my Application is the URL-encoded UTF8-version

/some/path?ugly=h%C3%A4%C3%9Flich // Encoding: UTF-8 (URL-encoded)

The Strings in the DB are all UTF-8, but the URL-encoding may or may not be present and may be of either format.

I have a method a that encodes plain UTF-8 to URL-encoded UTF-8, and I have a method b that decodes URL-encoded ISO-8859-1 to plain UTF-8, so basically what I plan to do is:

case 1:

String output = a(input);

case 2:

String output = input;

case 3:

String output = a(b(input));

All of these cases work fine if I know which is which, but is there a safe way for me to detect whether such a String is case 2 or 3? (I can limit the languages used in the Parameters to European languages: German, English, French, Netherlands, Polish, Russian, Danish, Norwegian, Swedish and Turkish, if that is any help).

I know the obvious solution would be to clean up the data, but unfortunately the data is not created by myself, nor do the people who do have the necessary technical understanding (and there is plenty of legacy data that needs to work)


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images