File Utilities
Auto NameTable of ContentsReplaceRegex
[ReplaceRegex]
[RenameRegex]
[FileSplit]
[FileLineFilter]
[HtmlFix2Unix]
[Unix2Dos]
[Dos2Unix]
[FiXML]

Bestcode is pleased to provide the following file utilities to make bulk processing of files easier through simple, scriptable command line utilities and powerful regular expression functionality when needed.

File Utilities pack contains ReplaceRegex, RenameRegex, FileSplit, HtmlFix2Unix, FileLineFilterRegex, FiXML, Dos2Unix, Unix2Dos utilities.

ReplaceRegex

Replaceregex command line program finds a given string (or a regular expression pattern) in a batch of files, replaces them with another string and places the output files in a separate directory if such directory is specified. The file size must be small enough to fit in available memory.

Command line parameters are:

-srcdir   adirectory    directory of the original files. By default, this is the current directory.

-destdir   adirectory    destination directory to save modified files. By default, this is the current directory. The source files will be over written.

-find     atext        this is the simple text to replace.

-regex     atext       this is the regular expression pattern to replace. You can use capturing groups as () and you can refer to these groups using $1, $2 etc syntax in the replacement text. You can also have escaped characters like \t, \r, \n.

-replace   atext       replace the text that was found with this one. Captured regular expression groups can be used as $1, $2 and so on.

-fname     apattern    the file name pattern to search, for example *.*

-casesens presense of this flag means search is case sensitive.

-quotes   atext is used to represent double quotes in the -find, -regex and -replace parameters. This is to help escaping quotes inside your -find, -regex and -replace parameters.

-tab       atext is used to represent tab (\t) character in the -find and -replace parameters. This is to help escaping tab inside your -find and -replace parameters. Use regular expression escapes for -regex.

-cr       atext is used to represent carriage return (\r) character in the -find and -replace parameters. This is to help escaping carriage return inside your -find and -replace parameters.  Use regular expression escapes for -regex.

-lf       atext is used to represent new line character (\n) in the -find and -replace parameters. This is to help escaping new line inside your find parameters. Use regular expression escapes for -regex.

-r         recursively process sub directories.

You can read more about ReplaceRegex and regular expression find replace use cases in files and usage examples here.

RenameRegex

RenameRegex command line program renames files whose name matches a regular expression places the output files in a separate directory if such directory is specified.

Command line parameters are:

-srcdir   adirectory     directory of the original files. By default, this is the current directory.

-destdir   adirectory    destination directory to save modified files. By default, this is the current directory. The source files will be over written.

-renameto   atext     New file name. Captured regular expression groups can be used as $1, $2 and so on. For example: $1.txt

-renamefrom   apattern       the file name pattern to find to rename. For example: (report)_[0-9]*.txt

-casesens presense of this flag means search is case sensitive.

-quotes   this flag means the following characters are used instead of double quotes in the -find and -replace parameters. This is to help escaping quotes inside your find replace parameters.

-r         recursively process sub directories.

You can read more about RenameRegex and regular expression file rename use cases and usage examples here.

FileSplit

FileSplit.exe utility is used to break a very large text file into smaller files so that they can be opened in common text editors (such as notepad) and inspected. Most of the time, such very large files are created to contain database exports, or server activity logs. When it is time to locate a piece of information in such big file, everyday editors fall short. FileSplit comes to rescue by producing usable chunks of small files so that you can continue with your work, open them in editors and forward small files over the network.

Command line parameters are:

-fname     filename    the file name to split. For example: webserver.log (You will get files like webserver_1.log, webserver_2.log, ... )

-numlines   anumber     Number of lines in each split file. Default is 10000 lines per file.

You can read more about FileSplit.exe File Splitter Tool and see examples of it in use.

 

FileLineFilterRegex

FileLineFilterRegex command line program eliminates text lines from text files based on regular expression patterns.

This program removes all lines which does not match a regular expression from text files places the output files in a separate directory if such directory is specified. The file size must be small enough to fit in available memory.

This utility is useful in a case where you have a large file and you are only interested in a subset of the lines in it. For example, the webserver logs where you only want to see certain user agent, certain ip, or perhaps certain date, time. Another example where we use this an application log where each thread id appears at the beginning of the line. And, for troubleshooting purposes, we only want to see the lines logged by a certain thread. FileLineFilterRegex conveniently removes all other lines for us so that we can focus on what matters for us.

Command line parameters are:

-srcdir   adirectory     directory of the original files. By default, this is the current directory.

-destdir   adirectory    destination directory to save modified files. By default, this is the current directory. The source files will be over written.

-regex     atext       this is the regular expression pattern to match lines to keep. If a line in the source file does not match this pattern, that line will not be copied to the destination file.

-fname     apattern    the file name pattern to search. Default is *.*

-casesens presense of this flag means search is case sensitive.

-quotes   this flag means the following characters are used instead of double quotes in the -find and -replace parameters. This is to help escaping quotes inside your find replace parameters.

-r         recursively process sub directories.

Fixing Invalid XML Characters

FiXML - Invalid XML characters fixer is a command line tool that removes or replaces invalid XML characters in an XML document. Invalid range of XML characters are defined by XML spec and documents that contain them cannot be parsed in most cases.

Invalid XML characters should not be confused with special XML characters. Special XML characters are those that require escaping such as <, & etc.

The most common invalid XML characters are those control characters below ascii 32. Except a few such as \r, \n, \t, space etc, most of the control characters cannot appear in an XML document. There are also few unicode character ranges that should not appear in XML documents per XML spec.

FiXML helps you convert your XML to valid form. Original file is not touched. Output will be saved to {filename}.fixed.

Usage:

FiXML.exe [-encoding encname] [-replace text] afilename.xml

 

HtmlFix2Unix

HtmlFix2Unix utility is used to convert case insensitive html links to case sensitive versions by lowercasing, or uppercasing them so that the html files can be used on a Unix like case sensitive system.

We needed this tool to convert our Math Parser API documentation. The html documents were generated on Windows by Visual Studio and they contained html links, images and JavaScript that was not case sensitive. For example, html could point to an image <img src=”Abc.jpg”> where as on disk, the image name was actually “abc.jpg”. This kind of increpency made it impossible to place this documenation on our Linux servers. So we developed HtmlFix2Unix to convert relevant portions of our html to lowercase.

You can read more about HtmlFix2Unix here.

Dos2Unix

Dos2Unix command line program converts carriage return, line feed pairs (\r\n) to line feeds (\n) and places the output files in a separate directory if such directory is specified. The file size must be small enough to fit in available memory.

Command line parameters are:

-srcdir   adirectory     directory of the original files. By default, this is the current directory.

-destdir   adirectory    destination directory to save modified files. By default, this is the current directory. The source files will be over written.

-fname     apattern    the file name pattern to search, for example *.*

-r         recursively process sub directories.

Unix2Dos

Unix2Dos command line program converts line feeds (\n) to carriage return, line feed pairs (\r\n) and places the output files in a separate directory if such directory is specified. The file size must be small enough to fit in available memory.

Command line parameters are:

-srcdir   adirectory     directory of the original files. By default, this is the current directory.

-destdir   adirectory    destination directory to save modified files. By default, this is the current directory. The source files will be over written.

-fname     apattern    the file name pattern to search, for example *.*

-r         recursively process sub directories.

Unix2Dos and Dos2Unix file utilities converts line breaks in your text files to the correct format required by your destination operating system for best compatibity between DOS (Windows) and Unix (Linux) operating systems.

Purchasing Bestcode File Utilities Pack

You can pay with credit card and download Bestcode File Utilties Pack immediately from our online store for only $14.95. Licensing is per user.

File Utilities pack contains ReplaceRegex, RenameRegex, FileSplit, HtmlFix2Unix, FileLineFilterRegex, FiXML, Dos2Unix, Unix2Dos utilities.

Online Order Form

System Requirements: Windows with .NET Framework 1.1 or higher.

For technical questions please contact support@bestcode.com

webmaster@bestcode.com