Chinese URLs turned into 404
My site lost a bunch of traffic lately, because iPhone/iPad users are hitting 404 pages.
Traffic from iPhone/iPad Safari to my website through Facebook somehow ended up in a wrong URL (404 error ) since 8/28/2014. For example, if a iPhone user click the shared link (that has Chinese characters in the URL) on
However, it appears not all iPhone users have the problem.
I've checked my error log, not seeing anything suspicious. Once I made my URL English, the problem does not exist anymore (tested). Any ideas?
Thanks,
Allen
4 Replies
It is possible that the iPhones that are receiving this issue are using browsers (or versions of browsers) that do not support unicode formatting in URLs.
When a URL does have unicode characters, the browser usually changes those characters into their "percent" (%) value. Such as when you go to a URL with a space in it, the browser will change it to %20 for the space.
It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.
Thanks,
Dave.
@drussell:
It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.
Well, if you really needed that traffic, you could probably write a few lines of code to detect when the URL has been encoded with a different character set (e.g. GB or EUC-JP instead of UTF-8) and convert them back to your website's charset. With some clever mod_rewrite tricks, you could then redirect those users to a URL with the proper charset. But let's leave that as an exercise for OP…
@drussell:
Allen,
It is possible that the iPhones that are receiving this issue are using browsers (or versions of browsers) that do not support unicode formatting in URLs.
When a URL does have unicode characters, the browser usually changes those characters into their "percent" (%) value. Such as when you go to a URL with a space in it, the browser will change it to %20 for the space.
It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.
Thanks,
Dave.
Thanks for your reply! The browser that is having this issue is Safari (mobile). I was suspecting the same thing (unicode characters), but everything was fine before 8/28/2014, which is a little odd to me..
@hybinet:
@drussell:It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.
Well, if you really needed that traffic, you could probably write a few lines of code to detect when the URL has been encoded with a different character set (e.g. GB or EUC-JP instead of UTF-8) and convert them back to your website's charset. With some clever mod_rewrite tricks, you could then redirect those users to a URL with the proper charset. But let's leave that as an exercise for OP…
Thank you, hybinet! As a front-end guy, this almost sounds too dreamy to be true. I hate to ask, but any further instruction on this method (or any other suggestions) would be greatly appreciated.