Results 1 to 18 of 18

Thread: Finding/fixing EOL & Encoding problems in Notepad++

  1. #1
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Finding/fixing EOL & Encoding problems in Notepad++

    EOL (end of line) characters are invisible characters that mark the end of a line in text files. They define the line breaks. Windows uses a combination of CR and LF characters (carriage return and line feed) whereas Mac and Linux uses only LF. As M2TW is a Windows game it requires the CR+LF format.

    Sometimes when editing a text file the editor uses the wrong EOL characters: LF instead of CR+LF. As a result the file looks perfectly fine but the game doesn't work correctly, for no apparent reason.

    Notepad++ is an excellent editor and includes some handy tools to find and fix these problems. It can also be the cause of the problems but so can any other editor.

    Clicking the "Show All Characters" button on the toolbar displays the EOL characters...

    Click image for larger version. 

Name:	EOL1.jpg 
Views:	121 
Size:	52.4 KB 
ID:	310912

    In that image we can see that lines end with LF. It should look like this instead...

    Click image for larger version. 

Name:	EOL2.jpg 
Views:	96 
Size:	54.1 KB 
ID:	310911

    The status bar at the bottom shows the EOL format being used. "Dos\Windows" indicates that the correct format is in use: CR+LF. (But not always! See below.)

    Fixing is simple. If the status bar does not say "Dos\Windows" then use Edit > EOL Conversion > Windows format from the menu...

    Click image for larger version. 

Name:	EOL3.jpg 
Views:	48 
Size:	46.2 KB 
ID:	310910

    After fixing the file, save it.


    Not Always...

    I have found that sometimes Notepad++ says "Dos\Windows" but some lines still have LF instead of CR+LF. There is a way to hunt them down...

    Using Notepad++'s "Find" function, search the file for [^\r]\n using the Regular expression option...

    Click image for larger version. 

Name:	EOL4.jpg 
Views:	41 
Size:	41.5 KB 
ID:	310909

    It will find all occurrences of lines that end with LF only.

    (NOTE: earlier versions of Notepad++ wouldn't find them. I don't know in what version it started working but v6.6.7 works.)

    Even better is to use the "Find in Files" function...

    Click image for larger version. 

Name:	EOL5.jpg 
Views:	33 
Size:	42.9 KB 
ID:	310908

    ...because that will search every file at once when you hit the Find All button.
    • Directory: use the mod folder and be sure to tick the In all sub-folders option.
    • Filters: not required but *.txt *.xml *.modeldb will limit the search to text files only, avoiding image files etc. which will only slow it down.




    Encoding

    While we're talking Notepad++, notice the encoding display in the status bar. For files in the "text" folder this should be "UCS-2 Little Endian" and for all other files it should be "ANSI". The Encoding menu can be used to change it if it is not correct. (I have read that the "Convert to..." options should be used instead of the "Encode in..." ones.)

    This has nothing to do with EOL characters but incorrect encoding can be another source of mysterious game issues.

    Note that ANSI files should not contain "extended ASCII" characters such as ë, only 'plain' characters. For example, "Finwë" should not be used in script, export_descr_ancillaries.txt, etc. (even in comments) because ë is not an ANSI character. How the game deals with that I do not know but editors might convert that character into something else so it is best to avoid having them in there to begin with. (They are fine in "text" folder files because they are legal characters in UCS-2 Little Endian encoding.)

    To find non-ANSI character occurrences in your files you can search for them in Notepad++ using [^\x00-\x7F] with the Regular expression option.



    Lastly, if you're not already aware of the brilliant Syntax Highlighter for TW files in Notepad++ then check this out:

    http://www.twcenter.net/forums/downl...o=file&id=1867 (download)
    http://www.twcenter.net/forums/showt...tom-Word-Files (info thread)
    Last edited by Withwnar; August 29, 2014 at 10:39 PM.

  2. #2

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    Another page bookmarked written by Withwnar! I was dying to know why some files showed up with the LF symbol as opposed to the CRLF symbol.

  3. #3
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    Added some more info to the Encoding section.

  4. #4
    Gigantus's Avatar I am not special - I am a limited edition.
    Patrician took an arrow to the knee spy of the council

    Join Date
    Aug 2006
    Location
    Goa - India
    Posts
    53,125
    Blog Entries
    35

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    thanks for the link to the high lighter - lost the bugger at the last drive crash.

    Hope I'll remember this setting\search the next time I am hunting bugs.










  5. #5

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I have saved all my data folder (-minus the text folder) after encoding them with "convert to ANSI". After I have played a few turns and open a file (any non-text folder file) it almost always opens up as UTF-8 w/o BOM. Is this a result of the settings I have on Notepad++, or does the game somehow convert these files after playing a turn a turn or two?

  6. #6
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    Have you tried converting, saving, closing then re-opening a file without running the game in between? Somehow I doubt that the game is affecting those files.

    My guess would have been that those files contain non-ANSI characters but the convert should have taken care of that. Did you try the non-ANSI character search?

    Or maybe it is a setting in Notepad++. Don't know what or where but it's a possibility.

  7. #7
    Gigantus's Avatar I am not special - I am a limited edition.
    Patrician took an arrow to the knee spy of the council

    Join Date
    Aug 2006
    Location
    Goa - India
    Posts
    53,125
    Blog Entries
    35

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    i think there is a default format setting in ++ for TXT format - will check once i have access to my machine










  8. #8

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I just checked, saved a file with no irregular characters after converting to ANSI, it ALWAYS opens back up UTF-8 w/o BOM.

    EDIT: Just googled the answer, if Notepad++ keeps opening your files as UTF-8 w/o BOM, do the following:

    1) open the settings tab
    2) open the preferences
    3) uncheck the "apply to opened ANSI files"

    Files now open up as ANSI.

    In what format should .xml files, an edited .bat file or the battle_models.modeldb file be saved in?
    Last edited by MIKE GOLF; September 01, 2014 at 04:22 PM.

  9. #9
    Gigantus's Avatar I am not special - I am a limited edition.
    Patrician took an arrow to the knee spy of the council

    Join Date
    Aug 2006
    Location
    Goa - India
    Posts
    53,125
    Blog Entries
    35

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I just love that warm feeling when someone confirms that I am right










  10. #10
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    Makes sense.

    Quote Originally Posted by MIKE GOLF View Post
    1) open the settings tab
    2) open the preferences
    3) uncheck the "apply to opened ANSI files"
    The exact location is: Settings > Preferences > New Document > Encoding > UTF-8 without BOM > Apply to opened ANSI files

    Quote Originally Posted by MIKE GOLF View Post
    In what format should .xml files, an edited .bat file or the battle_models.modeldb file be saved in?
    Not sure. All my XML files seem to be ANSI except TATW's config_ai_battle.xml which is UTF-8 and has encoding="utf-8" in the top <xml> tag.

  11. #11
    irishron's Avatar Cura Palatii
    Join Date
    Feb 2005
    Location
    Cirith Ungol
    Posts
    47,023

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    The last few times I've edited the descr_strat, EDB, CS, whichever one Notepad++ saves it in and I believe it's UTF-8 w/o BOM and the game has worked fine.

  12. #12
    Gigantus's Avatar I am not special - I am a limited edition.
    Patrician took an arrow to the knee spy of the council

    Join Date
    Aug 2006
    Location
    Goa - India
    Posts
    53,125
    Blog Entries
    35

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I have once more encountered this case:
    I have found that sometimes Notepad++ says "Dos\Windows" but some lines still have LF instead of CR+LF. There is a way to hunt them down...
    Using Notepad++'s "Find" function, search the file for [^\r]\n using the Regular expression option...
    It's a good number of lines and I am wondering if there isn't a 'replace' entry one could use?

    This replaces LF with LFCR but removes the preceding character entry: \r\n This is due to the search\query I suppose as that character is highlighted in the search.

    In the end I simply copy\pasted the whole content into a new document and then copied it back into the original document - that fixed it on a fast track. Would still be nice to have the replace option.
    Last edited by Gigantus; May 06, 2015 at 04:13 AM.










  13. #13
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    That's right, if the line was this:

    Hello thereLF

    Then "[^\r]\n" will find "eLF" and your replace would make it "CRLF", thus "Hello therCRLF". So yeah, the point of the [^\r]\n search is only to tell you whether you have a problem, not a fix for it.

    Using Notepad++'s menu option should fix all of them in one hit: Edit > EOL Conversion > Windows format. If "Windows format" is disabled, because Notepad++ already thinks that it is Windows format, then use one of the other formats first (e.g. Unix) and then the Windows one.

    Alternatively, this should work:

    Find: \n
    Replace: \r\n

    ...then...

    Find: \r\r\n
    Replace: \r\n

    ...using the "Extended" Search Mode option. The Regular Expression option would work as well.
    Last edited by Withwnar; May 06, 2015 at 04:24 AM.

  14. #14
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I just found a new way to do it. Instead of what I said in the previous post...

    Find: \n
    Replace: \r\n

    ...then...

    Find: \r\r\n
    Replace: \r\n

    ...you can just do this, using the Regular Expression search mode option...

    Find: ([^\r])(\n)
    Replace: \1\r\n

    That would be useful if multiple files have the EOL problem because you could use it in the "Find in Files" tab, thereby fixing all files in one hit.

    Note: you should do the Replace twice because if a line is blank except for a LF then it isn't picked up and replaced the first time. EDIT: ah, but it still doesn't fix the case where the first line is blank except for a LF.
    Last edited by Withwnar; May 06, 2015 at 04:33 AM.

  15. #15
    Gigantus's Avatar I am not special - I am a limited edition.
    Patrician took an arrow to the knee spy of the council

    Join Date
    Aug 2006
    Location
    Goa - India
    Posts
    53,125
    Blog Entries
    35

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I edited my prior post with my 'lazy Sue' solution as I still had the window open without refreshing. Stumbled across it when I wanted to copy that section into a new document for further experimenting.

    In the end I simply copy\pasted the whole content into a new document and then copied it back into the original document - that fixed it on a fast track. Would still be nice to have the replace option.










  16. #16
    Withwnar's Avatar Script To The Waist
    Join Date
    Oct 2008
    Location
    Earth
    Posts
    6,329

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    That's another option. I added a second post, which you might have missed while posting that response.

  17. #17
    Gigantus's Avatar I am not special - I am a limited edition.
    Patrician took an arrow to the knee spy of the council

    Join Date
    Aug 2006
    Location
    Goa - India
    Posts
    53,125
    Blog Entries
    35

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    I'll keep this in my box of gold nuggets:

    Find: ([^\r])(\n)
    Replace: \1\r\n










  18. #18
    Squid's Avatar Opifex
    Patrician Artifex Technical Staff

    Join Date
    Feb 2007
    Location
    Frozen waste lands of the north
    Posts
    17,760
    Blog Entries
    3

    Default Re: Finding/fixing EOL & Encoding problems in Notepad++

    You have another option if the file is set as Windown EOL but you still have Linux/Mac LF only in the document. Use the Edit->EOL Conversion to change the EOL from windows to any other format and then change it back to windows.
    Under the patronage of Roman_Man#3, Patron of Ishan
    Click for my tools and tutorials
    "Two things are infinite: the universe and human stupidity; and I'm not sure about the universe." -----Albert Einstein

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •