Chinese URLs turned into 404

Hello,

My site lost a bunch of traffic lately, because iPhone/iPad users are hitting 404 pages.

Traffic from iPhone/iPad Safari to my website through Facebook somehow ended up in a wrong URL (404 error ) since 8/28/2014. For example, if a iPhone user click the shared link (that has Chinese characters in the URL) on https://www.facebook.com/onefunnyjoke/p … 0155632338">https://www.facebook.com/onefunnyjoke/posts/777440155632338, it goes to a page that does not exist.

However, it appears not all iPhone users have the problem.

I've checked my error log, not seeing anything suspicious. Once I made my URL English, the problem does not exist anymore (tested). Any ideas?

Thanks,

Allen

4 Replies

Allen,

It is possible that the iPhones that are receiving this issue are using browsers (or versions of browsers) that do not support unicode formatting in URLs.

When a URL does have unicode characters, the browser usually changes those characters into their "percent" (%) value. Such as when you go to a URL with a space in it, the browser will change it to %20 for the space.

It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.

Thanks,

Dave.

@drussell:

It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.

Well, if you really needed that traffic, you could probably write a few lines of code to detect when the URL has been encoded with a different character set (e.g. GB or EUC-JP instead of UTF-8) and convert them back to your website's charset. With some clever mod_rewrite tricks, you could then redirect those users to a URL with the proper charset. But let's leave that as an exercise for OP…

@drussell:

Allen,

It is possible that the iPhones that are receiving this issue are using browsers (or versions of browsers) that do not support unicode formatting in URLs.

When a URL does have unicode characters, the browser usually changes those characters into their "percent" (%) value. Such as when you go to a URL with a space in it, the browser will change it to %20 for the space.

It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.

Thanks,

Dave.

Thanks for your reply! The browser that is having this issue is Safari (mobile). I was suspecting the same thing (unicode characters), but everything was fine before 8/28/2014, which is a little odd to me..

@hybinet:

@drussell:

It is likely that the browsers that are not able to hit those pages are not changing those URLs to the correct "percent" (%) formatting. With this being said, there's not much that you can do from the server-side.

Well, if you really needed that traffic, you could probably write a few lines of code to detect when the URL has been encoded with a different character set (e.g. GB or EUC-JP instead of UTF-8) and convert them back to your website's charset. With some clever mod_rewrite tricks, you could then redirect those users to a URL with the proper charset. But let's leave that as an exercise for OP…

Thank you, hybinet! As a front-end guy, this almost sounds too dreamy to be true. I hate to ask, but any further instruction on this method (or any other suggestions) would be greatly appreciated.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct