What Is Text Character Corruption?
People who work extensively with Plain Text files, suffixed with the .TXT file extension, will occasionally come across documents showing garbled text instead of what is expected. This phenomenon frequently happens when the corrupted text document is written in a foreign language which does not use the Latin alphabet but can happen for all files if there are inconsistencies in the settings used when the file was saved.
Character corruption happens when the save file uses a different type of default file encoding from the end user's program. Most computer programs use UTF-8 encoding by default but foreign characters normally have one or multiple language-specific encoding systems as well. Asian languages, for example, use a 16-bit encoding system; hence when the document is opened on a machine that uses an 8-bit system (which UTF-8 is), the text will be replaced with garbled symbols.
Rest assured however the corrupted text is not lost. There are many ways to fix corrupt character encoding, including using specialised software made for this exact scenario. However, if you only wish to fix one or two documents, downloading and installing new software can be a hassle. Here I will show you how to fix these corrupted text files in Microsoft Word, which is likely already installed for computers running the Windows operating system.
Fixing Corrupted Text in Microsoft Word
If you use a Windows machine, it is likely that you will already have Microsoft Word installed. Microsoft Word has a built-in character encoding converter which can be used to save the file in the desired encoding.
This fix will work with Microsoft Word 2003 and up.
Step 1: Open Up the Document in Microsoft Word
Windows will open plain text files (.txt extension) using the Notepad program by default. To open the corrupt document in Microsoft Word:
1. Right click on the document
2. Select "Open with"
3. Choose "Word"
Step 2: Convert Files from Encoded Text
The Convert File dialogue box should open automatically when it detects a file with corrupted encoding. Choose "Encoded Text" from the list of options and press "Ok."
If the dialogue box did not appear, it will need to be manually triggered. Go to "File" -> "Options" -> "Advanced" and scroll down until the "General" section is reached. In the "General" section, check the box that says "Confirm file format conversion on open." Exit Word, and reopen the corrupt document again, and the dialogue box will appear.
Step 3: Choosing the Correct Encoding
The encoding selection dialogue box should automatically suggest a correct encoding. If it does not, you can manually select from the list of encoding.
Choose "Auto-Select" if you are unsure of the source encoding, or choose from the list if you know the language the file is in. You will be able to check if the corrupted file is corrected from the preview window.
Final Step: Saving the Document as a Readable Plain Text File
The recovered text can now be read in Microsoft word, but may still show up as corrupted in plain text processing software as many are not written to process specialised character encoding. To prevent this, it is best to save the document in a common text encoding, such as UTF-8 or UTF-16.
To do this, click on the "File" tab in the top left corner of your document, and choose "save as" from the list. Choose which folder to save to, and choose "Plain Text Document " as the file format. Click "Save."
A new "File conversion" dialogue box will open up. From the list, choose an encoding for the final document. The preview box will highlight words which will not be saved correctly in red, so take care to choose an encoding which compliments the document. When in doubt, it is best to use a Unicode format as the encoding, as it is designed to accommodate all the world's writing system.
Finally, click "Ok" to save your corrected document.
Your document should now display correctly in your chosen plain text processing software, such as Notepad.
This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.
© 2018 Ivy Gao
Muna Alam Mary on July 02, 2020:
i don't have the English language in encoding list. what should i do?
nora on December 12, 2019:
it didn't work for me. text is still corrupted.
Mustafa Shujaie on December 11, 2019:
Thank you. You are an angle
Doris James MizBejabbers from Beautiful South on December 14, 2018:
This is something I did not know. I've been using computers since the mainframes of 1984, but as writer/editor, not a techie. I find the most common problems occur when trying to download or copy from webpages. Hopefully this will work on those. Very good article, Ivy. I see that you are new. Welcome to HubPages.