My observations on this bug.
1) Its happens only when entered for first time after we open the notepad program.
2) when it is saved in any other encoding format other than ANSI, it wont happen.
3) It is happening only when the first 18 characters are 4letter word(space) followed by 2 three letter words(space in between) and then a five letter word.Including digits its happening. above this if the total characters are even, its happening.
4)If the phrase contains any Upper case letters in between other than 1st and last character, it wont happen again.
Though I could hear the reason as "
1) You are saving to 8-bit Extended ASCII (Look at the Save As / Encoding format)
2) You are reading as 16-bit UNICODE (You guessed it, look at the Save As / Encoding format)
This is why the 18 8-bit characters are being displayed as 9 (obviously not supported by your codepage) 16-bit UNICODE characters"
I would like to conclude that why it is happening for short peice of text?
2)If we erase the Junk characters and re type phrase "Bush hid the facts" or "aaaa aaa aaa aaaaa" or "1111 111 111 1111" , it is appearing fine.
The reason what I would hear for this is "Text files containing UTF-16 is supposed to start with a BOM, so you can read those two chars and the application will know it is UTF-16. But so many applications does not do that (you have probably noticed the two small chars Notepad adds to the start of a file sometimes),
So what should a poor Notepad do? Well, it can alwys use the IsTextUnicode() Win32 API. YOu pass it some text, and it tries to guess if it is Unicode or not. But what if you just give it so little text, and maybe even just lowercase? Well, then it isn't too easy to tell if it really was unicode or not you gave it. And in these small strings you guys have found, it yes indeed does break. Poor Notepad gets the blame for a bad API, and other faulty apps.."
more interesting links
http://blogs.msdn.com/michkap/archive/2006/06/14/631016.aspx