Thursday, May 08, 2008

One record on two lines.

A real-life work problem cropped up the other day. One that needed a quick hack solution to fix the problem. We had a fixed length file, but all the records were broken between two lines.

The file load was going to be done with an SSIS package in SQL Server, but it doesn't nicely handle two line records. We had to combine the two rows into one.

One of the developers did this quickly in VB.Net 2005 because that's what he knows, but it got me thinking. How else could I solve this? What other language do I already know that I could use to fix this problem?

Ruby, Perl, Java, VB6, Can I do it with a macro in Ultra Edit?

Here's several solutions to the same problem. Read 2 lines, write 1, you can also think about it as, remove every other newline character.

VB.Net 2005
Imports System.IO

Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim rdr As New StreamReader("C:\readfile.txt")
Dim wtr As New StreamWriter("C:\writefile.txt")
Dim IStr2 As String
Dim Istr3 As String

IStr1 = rdr.ReadLine

Do While Not IStr1 Is Nothing
IStr2 = rdr.ReadLine
Istr3 = String.Concat(IStr1, IStr2)
Istr3.Replace(Chr(10), "")
Istr3.Replace(Chr(13), "")
wtr.Write(Istr3)
IStr1 = rdr.ReadLine
Loop

rdr.Close()
wtr.Close()
End Sub
End Class


This works, other than the initial trouble of having to tell .NET that I actually wanted the program to have access to the file. That was a problem that it took me about 30 minutes to overcome.

Ruby
counter = 1
outfile = File.open('c:\writefile.txt', 'w')
open('c:\readfile.txt') do |readfile|
readfile.each do |line|
outfile.write line.chomp
# only add the newline for every other line, Don't add it on the last line in the file.
outfile.write "\n" if( counter % 2 == 0 && line != line.chomp)
counter += 1
end
end

outfile.close

I spent about 15 minutes writing the Ruby code for this, mostly because I felt like there should be a cleaner way to solve the problem. I had been using a fairly ugly check to either write the entire line or write the line chomped (without the newline). I decided to always write the line chomped and add my own newline when it was needed. It's the same number of lines with less checks and it looks cleaner.


Perl Version 5.0
open(INPUTFILE, "c:\\readfile.txt") || die "system can't open file for reading";
open(OUPUTFILE, ">c:\\outfile.txt") || die "system can't open file for writing";

my $count = 1;
while() { # reads a line into the default variable $_
if( $count % 2 ==1 ) {
chomp $_; # only chomp the odd number lines
}
print OUPUTFILE; # Writes the value from $_ to the OUTPUTFILE
$count++;
}
close(INPUTFILE);
close(OUPUTFILE);


I don't need enough hard core Perl functionality to stay up to date on the most recent version. However, I like to tweak files occasionally in Perl, just to try and keep it fresh enough in my mind that I could use it for something if I had to.


Ultra Edit.
Open the file.
Put the cursor in the first row.
Start Recording a macro. Perform the actions.
{end} {del} {down arrow}
Stop recording the macro

Play the macro repeating until end of file.

I can perform this operation with the macro faster in Ultra Edit than I could write even the simple programs that were written above. It's not always the best solution, but it helps to know your tools. Sometimes, knowing the strengths of your own tools can save you the hassle of writing something. For a one time operation this is acceptable, it's the things that need repeated that this doesn't work well for.

Java

import java.io.*;

public class converter {

public static void main(String[] args) {
try
{
PrintStream p = new PrintStream( new FileOutputStream("myfile.txt") );

FileInputStream fstream = new FileInputStream("c:\\readfile.txt");
BufferedReader d = new BufferedReader(new InputStreamReader(fstream));

String s = null;
long l = 1;
while ( (s = d.readLine()) != null ) {
if( l % 2 == 1 && l!= 1)
p.print("\n");
p.print( s.replace('\n', ' ') );
l++;
}

d.close();
p.close();
}
catch (Exception e)
{
System.err.println ("Error writing to file");
}
}
}


The Java method is a little longer, but works just as well.

This was a fun exercise. I love solving a problem and actually thinking about the problem. Different languages have different strengths that they bring to the table. In this instance Perl and Ruby are the easiest to write the code for. Java was probably the most difficult code to write because files are buried under more classes than in the other languages.

2 comments:

AriT93 said...

another option would be to read in the number of bytes in the two line record then replace the newlines.

#! perl -w
use strict;
use warnings;

open(IN,"testdata.txt")||die "$!";
open(OUT,">testout.txt")||die "%!";

my $REC_LEN = 82;
my $data;
while(read(IN,$data,$REC_LEN) != 0){
$data =~ s/\n//;
print OUT $data;
}
close(IN);
close(OUT);

Perl also can use seek to

AriT93 said...

my example assumes an 80byte record + 2 newlines;