Data Understanding - a first glance at the data
More then 2 weeks have passed since the beginning of the Data Mining Cup 2007 and I know you guys are pretty far by now. I guess many of you already assembled dozens of prediction rules and models and are finetuning their systems.
But as I know myself, we all tend to draw conclusions much too fast. Let’s see if we have spend enough time on understanding and restructuring the data.
This is the very first statistic you should look at. The frequency of customer respondings on coupon A or B or Not responding at all. Remember from the Scenario, that each person of the 50k recieved coupon A and B via mail and here we have the respondings:
After performing some simple grouping operations on the customers you can see groups with significantly different frequencies.
Shouldn’t we treat them different?
Next time more on simple and not so simple grouping.
5 Comments so far
Leave a reply
It’s nice to see some charts, but without any absolute values this is useless. I can give you easily a group with 100% N (containing 200 out of 50.000 datasets)
cheers!
ernst
It’s not my intention to broadcast the solutions but to give some hints
And how is your ranking?
HI!
nice bolg!
I ,a studnent from UNI Kassel, also took part at the DMC 2007.
Now i´m writing a short text about my approach(something about Genetic Programming).
Now I need some other approachs from competitors to name them in my text.
Maybe you coud also say something about your ranking.
so if you like mail me :-), or post it in you blog.
kind regrads stefan
Contact me via email. You will find it on the “about” page.