The Gamer Corner
We play games alright. They're just with your mind. Or are they?
Lost your password?

OOP Discution

OOP Discution – December 21, 2010 9:03 PM (edited 12/21/10 4:03 PM)
Cuzzdog (1522 posts) Head of Gamer Corner R&D
Rating: Not Rated
chaoscat wrote...
I define a set of transformation primitives not unlike cut, grep, uniq -c and the rest, some of which are combination, e.g. "cut | uniq -c", and then define a driver which applies these transformations sequentially. Each transformation is handed a hook to the framework (like a db connection), and it's input and output labels. So the framework gets something like

"cat > a; grep a > b; cut b | uniq -c > c;"

and the framework is smart enough to know when it needs to actually write one of those to disc and when it can just "stream" it, essentially. Then I tell the framework "run this and give me an iterator over the results", which I can then either load into the db or do whatever I need with. It's very flexible in that the function objects can be combined in any sequence on any input or output data, and they can all be chained together. Each function specifies a bit of business logic, so for example, today I'm modifying a rollup that 7 different inputs use to go from daily granularity to hourly granularity. I just need to change the one function object, and all the rollups that use it will now roll up to hours instead of days. I just don't see what other infrastructure could exist - you talk about a class to hold a record, but nothing in this ever directly touches a record. until the very end, where it outputs to the DB, and the records for that are in a class (essentially a wrapper on arraylist) that i get back from the framework.

I thought I'd take this into a thread to make it easier to work with. I'm probably going to say this all wrong, but in my mind the difference between coming at a problem functionally vs OOP is functional programs will be like an itemized list of things that need to be done to get the answer. You look at things from the perspective of the transformations that need to be done. OOP comes at it from the perspective of analyzing what the data is and how you want to model it. Ideally, if the OOP design is good, putting together the sequential steps to achieve the business logic should be pretty easy since you've already done the bulk of the work in your data modeling.


So, let's take a step back from what you described and look at things from the point of view of the data you're working with. Essentially, you have a black box which is going to do the grunt work of data transformations for you. Your program is going to get some raw data to pass to the black box (possibly already stored in a flat file or DB that the block box has access to?), as well as a list of transformations that need to be done on the raw data. The output from the block box is going to be...what? A collection of records you may want to do one final getting ready for output transformation/data analysis rollup on?

Re: OOP Discution – December 21, 2010 10:09 PM (edited 12/21/10 5:09 PM)
chaoscat (452 posts) Ambassador of Good Will
Rating: Not Rated
I'm not sure I'd say my program is going to get some raw data to pass to the framework (aka "black box"). More like I'm going to tell it which data file already stored in the framework to operate on (or in point of fact, which set of data files). Essentially, my program then says "Load this file, label it X" then "Apply that transformation to X, call the result Y" and so on. at some point, it says "give me the contents of Z as an iterator (of something that can be treated like an array)". Each field in the result array corresponds to a database column, and I bulk load the contents of the iterator into the database as the final step.

A key feature is that the transformations should be reusable - so for example there would be a "roll all date time units up to the day level, while keeping all other fields broken out" transform, perhaps called "RollupToDay". This should be able to be applied to any input file and reused in multiple transformation sequences. I want to be able to add a sequence of transformations or change an existing sequence without recompiling, so the sequences will be read from a config file and the driver will build out the appropriate sequence of methods to call (that's the design, I haven't built that yet. The current driver runs one hard coded pipeline.)

The decision between having all the methods on one object and having each method be its own object is not critical to the point here, I think. I like having each one on its own object since it makes it easy to find, and the code is shorter. I don't like single 1000+ line code files when I can avoid them, and since there is no shared state between the methods (indeed, the methods are totally stateless), there's no benefit I can see from cramming them all into one file. I think that statelessness is the key point of divergence in our models, but I simply don't see what state they would have.

If we pretend for a moment that the framework is the unix shell, which isn't a bad way to think of it, a typical transformation class might look like:

public class RollUpFirstThreeFields {
public void applyTransformation(StringBuilder command) {
command.append("| cut -f1,2,3 | sort | uniq -c");
}
}

Currently those classes are independent, but I think they should eventually all implement an interface to make it easy to iterate over a list of them and execute each in turn. Something kind of like Runnable.

Needless to say the implementation code is a little more complex, but most of that consists of figuring out which fields to keep in the cut, and the fact that "uniq -c" is actually like three commands in the framework. Also, the framework doesn't use pipes, so I specify the input and output transformation names, where the input for one transformation is the output from the previous transformation. The framework takes care of optimizing intermediate results that can be omitted (which is desirable as their optimizer is being actively worked on without me having to do anything, so it gets better for free). The application finishes by taking the final result and loading it into a database table. The process is designed to run with zero user interaction - essentially it's a batch mode process.

I simply don't see any more natural object construction here than having each transformation as its own object and no stored state. The only member variables I have at the moment are a few static constants, which are only member variables to avoid paying an instantiation every time a method is called. They are never changed once set up. The only "state" is the initial input which specifies which files to run on.

_________________________________________________
Syllabic (4:14 PM): tozzi are you like dowd's jiminy cricket
Re: OOP Discussion – December 21, 2010 10:57 PM (edited 12/21/10 5:57 PM)
Talraen (2373 posts) Doesn't Play with Others
Rating: Not Rated
I just wanted to chime in on something I read in the chat earlier (I figured this was a better place to do so, if you've gone to the trouble of making a thread).

In my experience, you generally do not want to use static variables unless they fall into one of two categories: something that there can, by definition, only be one of ever, or something that serves as some kind of flag or setting. A database connection fills neither criteria, so I would not make it static. If you were going to go that route, I agree with Cuzzo - make it a normal class with normal functions that work on an object.

For an example of what I mean, consider if you have a static class that contains methods for outputting error messages. You might want the output location to be a static variable, so you can change where the logs are created across all methods without passing any crazy variables. This is the flag/setting style I referred to. You could also have a static variable that refers to something like the headers being sent by a web page, since by definition there is only one set of headers. Beyond that, I tend to avoid static variables unless I'm just taking shortcuts.


--
There is no Mythril Sword in Elfheim
Re: OOP Discussion – December 21, 2010 11:05 PM (edited 12/21/10 6:05 PM)
chaoscat (452 posts) Ambassador of Good Will
Rating: Not Rated
Yeah, I don't remember what I was thinking when I was talking about using a static for the DB connection. It's a parameter to the runner method. The main method instantiates one (based on the config file) and passes it in. The unit test passes in a connection to the test DB. I agree that having the DB static would be a Bad Thing. The connection to the framework is handled the same way.

_________________________________________________
Syllabic (4:14 PM): tozzi are you like dowd's jiminy cricket
Re: OOP Discution – December 22, 2010 3:47 PM (edited 12/22/10 10:47 AM)
Cuzzdog (1522 posts) Head of Gamer Corner R&D
Rating: Not Rated
Ok, first of all, you need to be careful when you use static. Static means that no matter how many instances of an object I create, I want this function/variable to work exactly the same regardless of the local instantiated data. It does not mean you only anticipate instantiating the object once, so you may as well call it static. The reason for that is because if your requirements ever do change, and all of a sudden your function does have a stored state, you're going to have a pain in the ass time rolling out everywhere that's calling the function statically. And, calling something static isn't really buying you much processing wise, just ease of coding.

Now, on to the meat of this discussion. This class you wrote:

chaoscat wrote...

If we pretend for a moment that the framework is the unix shell, which isn't a bad way to think of it, a typical transformation class might look like:

public class RollUpFirstThreeFields {
public void applyTransformation(StringBuilder command) {
command.append("| cut -f1,2,3 | sort | uniq -c");
}
}

Is pretty telling about your frame of mind looking at OOP languages. You're coming at things from a functional programming perspective where you think "I have to DO A, B, and C, so those are my classes". In an OOP environment, you need to think one step above that and say "What data am I working with? How do I want to interact with that data? How might the data interact with each other?"

It sounds like what you have is a bunch of classes, where each class represents one specific type of transformation you may want to pass to your black box. Basically, you came at it thinking, "I need to do transformations A, B, and C, so I better code them". What you really need to be thinking of is that the data you're working with are the transformations themselves. Just based on our conversation, here's how I would model your "RollUpFirstThreeFields" transformation:


public class Transformation{
private String _trans;

public Transformation(){
_trans = new String();
}

public Transformation(String t){
if(ValidateTrans(t)){
_trans = t;
}
else _trans = new String();
}

public Boolean validateTrans(String t){
if (t == looksgood) return true;
return false;
}

public void setTrans(String t){
if(ValidateTrans(t)){
_trans = t;
}
else _trans = new String();
}

public String getTrans(){
return _trans();
}

public void appendToTrans(String t){
if(ValidateTrans(t)){
if (_trans == null) _trans = t;
else _trans += " | " + t;
}
}


Now you have 1, powerful, flexible class that will build all your transactions for you. This one class will replace all of those one off classes you were building and be flexible in the future so it shouldn't be changed much. You said you were going to build in future functionality to load a sequence of transactions from a file? Well here's your hook to do that. Just make a new function loadFromFile(FileName) and you're done. Need to do unit testing? It's no problem. Just hard code an instance you create and set it in different ways you want to test. Now you're not testing that you're individual transformations work, but that your program is properly handling transformations. Once you're confident all your classes are handling data properly, THEN you go and test individual transformations. But by then, your code should be done and all you're doing is tweaking an input file that's setting your instances of the Transformation class.

Re: OOP Discution – December 22, 2010 4:43 PM (edited 12/22/10 11:43 AM)
chaoscat (452 posts) Ambassador of Good Will
Rating: Not Rated
Ok, so a few details. validateTrans is both very hard to write and unnecessary, as the framework will balk if the transformation is not well formed. It's hard to write because the transformations are in essentially their own language and the parser for that is a not insignificant part of the framework code. Even in our simple unix shell command model, you'd have to do a lot of work to make sure the commands were correct (quick - is the default delimiter for cut the same on solaris as it is on gnu?) I avoid this in my model since each transformation is individually tested, and if it passes the tests it is by definition valid. In this case, you're reading the actual transformation code in from your source file, which is exactly what I don't want to do.

Basically, in this case, your source file would literally have (making up some pseudo code syntax here):


apply 'cut -f 1,2,3' to X store in Y;
apply 'uniq -c' to Y store in Z;

I don't want to do that. I want to have my source file say

apply 'RollupMainSummary' to X store in Z;

The reason I want this is there will be multiple inputs and multiple data paths that all follow the same rules:

apply 'RollupMainSummary' to A store in A2;
apply 'RollupMainSummary' to B store in B2;
apply 'RollupMainSummary' to C store in C2;
...


Those could potentially be spread across several files, depending on how I organize my scripts. It's possible, and indeed I've already had this happen once, that the requirement could change so that instead of the "main summary" needing the first three fields, it needs the first four and the seventh fields. I want to be able to change that in a single place. In other words, I want to be able to phrase the control files in the logical language of the "end user" (which in this case is essentially the database designer), not in the detail language of the implementation.

Additionally, this implementation would limit my ability to tweak the transformation details in java. So for this example, we've been using the naive assumption that we want to roll up the first three fields no matter what the data looks like. In reality, what I more often want to do is roll up all fields except a few, and remap the field names based on the result (and yes, I realize I didn't list that as a requirement at the beginning). Under this model, I would need to do that by hand for each input file format. In my model, the transformation code figures that out (without looking at the data - there's still no row object) based on the meta data the framework has about the input files. Essentially, the transformation code for this would ask the framework for the schema for that file, delete the fields it wasn't interested in from the list of field names, and add a 'group by' transformation to the queue on the remaining file names (and do a little additional schema manipulation to keep things neat). As long as I define the schemas correctly when I load the files, I know (with confidence from my unit tests) that the output will look how I want it to.

What's more, the code you have seems to miss the point. That class is doing all the easy work (if we ignore validation, which as I said is reasonable) and leaving the hard work (How do I transform A into Q?) to the config file. You say "Once you're confident all your classes are handling data properly, THEN you go and test individual transformations" but that doesn't make any sense. My classes don't handle data. The framework's classes handle data, I tell the framework what to do with that data. This seems like the most obvious thing in the world to me, but apparently it's not. Maybe this would help - Imagine writing a bunch of "insert into ... select" queries, i.e. queries that move data from one table to another table in the DB, presumably changing it in some way in the process. You don't need a row object to implement these. The RDBMS needs a row object, sure, but you can be blissfully unaware of it. In fact, having a row object, and slurping the data into your program, would be a bad thing, because you are almost sure to do it less efficiently than the RDBMS.

I could write a class like what you have there for the framework I'm using in a few minutes, sure, but I'd still have days of work setting up the input files, and they would be less reusable and less testable than what I have now. In fact, the framework provides a "script mode" in addition to the API I'm using, which does essentially what you have there - it lets you write a plain text file with a list of transformations that it loads, validates and runs. I'm not using it because the files are hard to reuse, hard to refactor and hard to test.

_________________________________________________
Syllabic (4:14 PM): tozzi are you like dowd's jiminy cricket
Re: OOP Discution – December 22, 2010 7:02 PM (edited 12/22/10 2:02 PM)
Cuzzdog (1522 posts) Head of Gamer Corner R&D
Rating: Not Rated
I hear what you're saying, and I understood that the sample we were working with was a very simplified one. However, my point still stands. What you're working on may not have data in the sense of processing database rows, but all programs deal with some kind of data. Good OOP design stems from figuring out what that data is and how to manipulate it. In this case, the transformations themselves are the data you're working with. Sure, hard coding the immediately needed transformations works, but they will very quickly become unwieldy in the OOP design the way you're going.

That code I had was just a very, very simplified idea of where you should be headed, not a final approach by any means. The ValidateTrans function was just meant to be a sample of the kind of logic you could build that would be easy to apply across all instances of transformations.

I understand what you're saying about the given, simplified Transaction class. You would need tons of specialized input program files the way it's slated out, but that's just assuming you go with the basic layout given. What if you used two types of files: a specialized "function" file, and a master control file? So it would look like this:

./Functions/RollupMainSummary
apply 'cut -f 1,2,3' to %1 store in %2;
apply 'uniq -c' to %2 store in %3;

./Functions/DataFilter
apply 'grep %i1' to %1 store in %2;
apply 'cut | uniq -c' to %2 store in%3;

./MyPrograms/Prog1
RollupMainSummary -inFile FileA -outFile FileB
DataFilter -inFile FileB -outFile FileC -i1 "test"

Then, in your Transaction class, when you call the loadFromFile function, it's going to pull together all the needed logic to assemble your finalized transaction. That way when you need to write new transaction "functions" you're not mucking with the main code and having to recompile and deploy, you just write new possible function files. And, if you do ever need some crazy one off transformation it still works without having to make a whole new hard coded function.

Does that syntax I laid out work exactly correct? Maybe not. Maybe you need to pass in that meta data you were talking about before to do the dynamic transaction building. Is this simple and quick to build? No, it's not. But OOP design is typically about long term power and flexibility, not getting off the ground quick. The bottom line is, if you want to gronk OOP design, you need to get into the frameset of identifying what data you're working with and how you can best model it out to make it easy to manipulate, not focusing on the immediate steps 1, 2, 3 of what needs to be done right away.

Re: OOP Discution – December 22, 2010 7:45 PM (edited 12/22/10 2:45 PM)
chaoscat (452 posts) Ambassador of Good Will
Rating: Not Rated
So that design is almost exactly what I have, except instead of being hard to test plain text files, the transformations are testable (and tested) classes. Making them classes lets me use java to do other manipulation like specifying "all fields except 4 and 5" or even "all fields except the ones holding the hour and minute values". To get that same functionality out of the plain text files, I'd need to write a very robust parser, and I simply don't see the need for creating another language and parser to solve this problem. Also, very specifically, I want to recompile if the transformations change. Recompiling means re-running the test suite and will catch errors in the transformations. Treating the transformations themselves as data would make it far to easy for errors to slip through.

I understand this may not look like it to you, but I am trying to design for long term maintainability and not for a quick get-it-out-the-door fix. It would be much faster to just have hand written the script files and fed them to the framework's script runner, but as I said, that approach was rigid and hard to test. My main design goals when building were as follows:

1: The hardest part of this is making sure each transformation produces the correct output for all possible input formats. Therefore, the design should make that easy to test and verify.
2: The process input or output requirements could change at any time. All transformations should be written to work from known states to other known states and only the initial loader and final outputter should care what the required input and output formats are
3: Transformations should loosely correspond to business requirements - e.g. don't have a "RollUpDay" transform, have a "RollUpReportTimeGranularity" transform, so if the requirement changes from having daily rollups to having hourly rollups only one place needs to be changed to have all inputs follow the new model.
4: Easy code reuse - transforms are hard to write correctly. Once written, a transform should be able to be applied in as many situations as possible. This means each transform should make a minimal change to the data without side effects.

Really, we're nit picking granularity right now. You're saying I should have only "primitive" operations in my control files, I'm saying I want to define higher order operations composed of several primitives and some logic from outside the available operations and compose those.

It feels to me, and this has been my thesis from the beginning, that the problem I am working with is not well suited to object modeling. As I've said before, OOP is great for highly non-linear applications, especially user interfaces and game programming, where you have lots of reasonably independent objects interacting with one another. Likewise, transactional data models where there is a straight forward relationship between a row in a database and an object instance make sense. What I am modeling is an inherently linear process (I mean, we even call it a pipeline) with little to no state. In essence, there are very few nouns in this problem. Think of it as function composition in math: F(G(Q(R(x)))) has a lot of verbs (the functions) and only one noun (the variable). My job is to find the right function definitions and order to turn x into something useful. That's essentially what I'm building here. It sounds like a lot of what you're saying is "the hard part is how to call those functions in the right order. Once you solve that, the functions themselves should be easy" but in fact that's a solved problem and the hard part is what F and G and the rest should actually do. To put it in DB terms, it's like saying "what you really need is a good way to load the results of a query into an object. Once you've got that, you can just toss some queries in a config file and you're done."

_________________________________________________
Syllabic (4:14 PM): tozzi are you like dowd's jiminy cricket
Re: OOP Discution – December 22, 2010 8:32 PM (edited 12/22/10 3:32 PM)
Cuzzdog (1522 posts) Head of Gamer Corner R&D
Rating: Not Rated
Well, obviously I'm not going to stumble on the perfect solution to your exact programming needs in this high level discussion, and that's not really my agenda here anyway. I'm just trying to show you how you might go about thinking of this problem from an OOP perspective. You say that what I'm suggesting is as crazy as if I was to pass in SQL calls to the program. And I respond that yes, in the right circumstance, I would pass the SQL into my program. Obviously not necessarily exact SQL calls, but if there was a way to pull out key pieces of the needed SQL logic so that it can be passed into the program and altered on the fly via plain text files, why wouldn't I? I don't want to have to go in, change potentially complex working code, go through the process of compiling and deploying (which depending on your environment can be a huge pain in the ass) just because the schema changed in my database. Personally, I don't see much benefit on figuring out the needed transaction logic and hard coding it into your program vs storing it in a flat file where's it easy to change/make new ones when needed.

Let me run this by you now: You said you have a "RollUpReportTimeGranularity" so that if the requirements go from day rollups, to hour rollups, you only need to make a coding(?) change in one place? Couldn't you use the OOP model instead to make it so you never have to make a code change if the requirements change? So, you would have a class "TimeBasedTrans" with a member variable "TimePrecision". On the constructor, set your precision to something passed in, or some default value. Then, have one function for each transformation that's based around time be defined here and configured based on that TimePrecision variable. This way, even within one execution of your program, you could do rollups on both an hourly and daily level by instantiating two TimeBasedTrans classes.

Re: OOP Discution – December 22, 2010 9:45 PM (edited 12/22/10 4:45 PM)
chaoscat (452 posts) Ambassador of Good Will
Rating: Not Rated
I'm not expecting you to stumble on a perfect solution to this. I'm going to stick with my functional approach, not in the least because it's largely built already, but you've done an admirable job of illustrating a different way of looking at the problem. Just to pick on your example tho, the TimeBasedTrans with a TimePrecision parameter doesn't appeal to me for two reasons.

1 - ease of changing. Yes, when I make a change now, I need to recompile and redeploy, which is potentially expensive. If the change were in a config file, I could skip the recompile step, but I'd still need to redeploy it, which is the more expensive part, as that involves other people.

2 - If I have this in 20 places, I'd need to change each one from "TimeBasedTrans(day)" to TimeBasedTrans(hour)" in the config files. The chances of missing one are high. I could solve this by having a multi layered config system, where in 20 places I have "TimeBasedTrans(%TIMEUNIT)" and defined "TIMEUNIT=day" somewhere else. The problem is that if you continue down that road, and there are a lot of similar parameters, you end up writing a mini-language. Now don't get me wrong, I like mini-languages, but I don't feel that one is appropriate here, not in the least because it would involve a lot of string manipulation which is not the easiest thing to do in java.

It may turn out that what I'm doing will cause a lot of maintenance needs, but I don't think so. If it ends up becoming a pain, I'll come back here and look over these ideas, but it seems to me that moving more of the core logic out of the code and into config files is a recipe for disaster right now. That's essentially the model we used at my last gig, and testing it was a nightmare.

Once again, thanks for your input, and I will give it some more thought as I go further on this application. This has definitely confirmed my theory that I don't really grok object oriented programming, although I think I might rather delve deeper into the land of the lambda than the oceans of objects. Actually, what I really think is that I want to spend more time working with Ruby, since it seems to be a hybrid object oriented and functional language.

_________________________________________________
Syllabic (4:14 PM): tozzi are you like dowd's jiminy cricket
Re: OOP Discution – December 23, 2010 1:13 PM (edited 12/23/10 9:45 AM)
Cuzzdog (1522 posts) Head of Gamer Corner R&D
Rating: Not Rated
One last shot at this Smile

1)
chaoscat wrote...
1 - ease of changing. Yes, when I make a change now, I need to recompile and redeploy, which is potentially expensive. If the change were in a config file, I could skip the recompile step, but I'd still need to redeploy it, which is the more expensive part, as that involves other people.

The thing is, once you don't have to do a compile of the code, you shouldn't need to do a deployment to change a config file. You could write a simple shell/perl script set on a scheduler that you do have control over kicking off to go in and do that change for you. Or make two versions of the config file and pass that in as a parameter to your application when you kick it off (which I assume you would be doing anyway). And, laying in that kind of framework now will make it very easy to some kind of GUI go in and set that on the fly. I don't know if you'll ever have users hook into this system, but maybe you'll want an admin GUI for yourself to run reports or something.

Trust me, you learn a lot of tricks for changing systems on the fly when it takes 2 weeks and a mountain of paperwork to do a deployment Big Smile

2) One last idea to show what kind of added benefit you could get from putting all your functions in a single class:

In your example function:

Someone wrote...
public class RollUpFirstThreeFields {
public void applyTransformation(StringBuilder command) {
command.append("| cut -f1,2,3 | sort | uniq -c");
}
}

You pass the transformation being built into the function itself. This means all of your transformation functions need to mirror this. It also means that the code maintaining what the current transformation being built is is different from the tools you use to do that building. This could make it a pain to trace back exactly where you need to look for the tools you need to control the transformation. Instead, think of it this way:


public class Transformation{
private ArrayList _trans;

public Transformation(){
_trans = new ArrayList();
}

public StringBuilder getTrans(){
StringBuilder s = new StringBuilder();
//Loop through _trans to cat all transformations to s
return s;
}

public void addRollUpFirstThreeFields(){
_trans.add("| cut -f1,2,3 | sort | uniq -c");
}

public void addFilterData(String s){
_trans.add("| grep " + s + " ");
}

public void removeTran(int i){
_tran.remove(i);
}

With that, the guts of how you're putting together and maintaining a transaction are transparent to the rest of the code. Now in your main, instead of thinking "Ok, I want to build a transaction. What do I need for that? Well, I need a StringBuilder, and that needs to be set a certain way first. Then call the static functions from all these different classes. And an integer to keep track of how many transformations I've pushed on the final transformation, and..." all you have to do is:
Transaction t = new Transaction();
t.addRollUpFirstThreeFields();
t.addFilterData("Hello");
myBlackBox.run(t.getTrans());

In other words, your business logic is just worried about the what needs to be done, and not the ugly how do I do what needs to be done. And, incidentally, if you go with this structure, making your TimeBasedTrans class becomes even easier since all you'll need to do is implement your Transaction class in your TimeBasedTrans class, and you carry over all that high level transaction tools to your specialized class.

And I think this would take care of your point #2 where you thought you would need to create a mini language to effectively use that time variable. Now in main you would just have:

String precision = <Read time precision from passed in config file>
TimeBasedTrans t = new TimeBasedTrans(precision);
t.addRollUpFirstThreeFields();
t.addFilterData("Hello");
t.addRollUpOnTime();
myBlackBox.run(t.getTrans());


Re: OOP Discution – December 24, 2010 4:19 PM (edited 12/24/10 11:19 AM)
chaoscat (452 posts) Ambassador of Good Will
Rating: Not Rated
Someone wrote...

The thing is, once you don't have to do a compile of the code, you shouldn't need to do a deployment to change a config file. You could write a simple shell/perl script set on a scheduler that you do have control over kicking off to go in and do that change for you. Or make two versions of the config file and pass that in as a parameter to your application when you kick it off (which I assume you would be doing anyway). And, laying in that kind of framework now will make it very easy to some kind of GUI go in and set that on the fly. I don't know if you'll ever have users hook into this system, but maybe you'll want an admin GUI for yourself to run reports or something.

It would require a deployment because I have zero access to the production system. We are managing deployments via debian packages, so for me to make a change like that would require building a new debian package and pushing it out. The thing is, it's not equivalent to putting in a new report (which is sort of user level data), it's like putting in a new database schema (since the ultimate output is data in the database, most substantive changes would involve a schema change as well). Everything that touches it on both ends needs to be aware of that change. The code that manages the acutal reports is much more friendly to GUI administration.

Even presuming I had a nice web GUI to tweak the data flow (which I think is a step in the wrong direction, since text boxes are vastly inferior editors to vi/vim/emacs/eclipse/etc), other things that aren't easy to tweak via web UI would need to be modified as well, not all of which are within my control. Also, there's the matter of versioning. I suppose I could also whip up some web interface to SVN that would allow me to commit as my user and such, but there's a lot of reinventing the wheel there for no apparent gain. I have a tool suite that does all that already, that I have invested years into learning. Why write (or install if something already exists) a brand new tool suite to solve the same problem?

Someone wrote...

With that, the guts of how you're putting together and maintaining a transaction are transparent to the rest of the code. Now in your main, instead of thinking "Ok, I want to build a transaction. What do I need for that? Well, I need a StringBuilder, and that needs to be set a certain way first. Then call the static functions from all these different classes. And an integer to keep track of how many transformations I've pushed on the final transformation, and..." all you have to do is:
Transaction t = new Transaction();
t.addRollUpFirstThreeFields();
t.addFilterData("Hello");
myBlackBox.run(t.getTrans());

Yes, that's is a good idea, but in point of fact the framework already does that for me. Literally, my code looks like:

FrameworkServer fw = new FrameworkServer(options);
rollup1.addTransaction(framework,in1, out1);
rollup2.addTransaction(framework,out1,out2);
//etc

The reason I like passing in the framework like that is so I can run the tests against the test settings in the framework (e.g. run locally and not on the cluster) without having to touch anything in the transaction code. I do the same thing with my database connections for the same reason, and have generally seen this suggested as good practice. I suppose there's no reason I couldn't hand the configuration info to a constructor and let that build the framework server object, but then i'd really have to worry about making things static and managing instances of that object so everything ended up going to the same instance of the framework. Now, I know that the only place new FrameworkServer() gets called is in initilization functions (either the test setUp method or the main method of the driver), so it's easy to see that there is no duplication. The framework server itself doesn't implement a singleton pattern because it's logical to think you might want more than one framework connection at a time, possibly to different clusters for example, or as different users, that's just not the use case I have.

As for putting the time precision in a config file, I agree it's doable, but I don't see the point. It's not something that should change frequently or trivially (by the same token, I wouldn't have some script which automatically updated my DB schema based on a config file - one false move could accidentally delete a column of data), so requiring a code change seems reasonable for that. I would (and do) use config files for things like "we're changing the path for your input files" or "Run this against the backup DB until the upgrade on the production DB is finished" type things. I guess it's really just a style preference, but I find the division you are describing hard to debug. In essence, I move from one place where I can look at all the logic for a particular transformation to having to say "well, is the transformation wrong in the config, or am I parsing the config wrong, or am I doing the wrong thing with the result of the parse", which amounts to three places to look for an error.

Thanks for the discussion, I will definitely think about this more as I move forward with the project.

_________________________________________________
Syllabic (4:14 PM): tozzi are you like dowd's jiminy cricket
Active Users: (guests only)
1 user viewing | Refresh