BryanK -> RE: A sneaky little addition to v1.1 (10/23/2005 10:09:45 AM)
|
Totally past my bedtime (thanks Shaun;) but here goes... The data normalization process in the Anakit database messes with Puresim's import algorithims something fierce. Lots of people come back with maxed out range and arm values -- and they're people that shouldn't (like Greg Vaughn). Potential and speed values are also whacked. At heart, these problems arise from the fact that the Anakit database is correcting for things that puresim itself has already taken into account. Like the difference in eras over time. OOTP doesn't do that explicitly, so the database was intended to back into it. In effect, Puresim is over correcting when it uses the Anakit database, and since basically all the alternative databases build off Anakit as a base, this might be a nonstarter at present. Having said that, I was able to cut out Anakit's updated debut year info and melde it into the standard lahman database correctly. But there is a catch. Either puresim is keying off the first year a player has a stat line OR the debut year tells Puresim which players it should look to import, yet there are no stats for those players until they make their major league debut -- which is usually a few years later. As such, they all get skipped for having insufficent stats. There's a couple ways around this. 1. Have someone (me?) go in and manually adjust the stat lines to be in synch with the debut years. Not gonna happen. First, I don't have the time to do that. Second, it's very time intensive and would have to be done each time a new database is released. Third, this ammounts to moving up a player's rookie season so that it occurs when they're younger, and that's going to make them too good too fast. 2. Refine the import process so that Puresim uses the debut year to find who it should import in any given year, and then independantly builds the player model based on that players' statistics -- even if they're from a few years down the road. Might require a tweak to the way potential and ratings are assigned when there is a gap between debut and the first usable stat line, though, since this stands to result in the same too-good-too-fast problem solution (1) had. Still, I think this is probably the best way. One possible hiccup is that the debut year data is not standard across databases: the original lahman database has the day and month info as well, while the fixed Anakit info is just the draft year. That could make it a little tricky to code so that Puresim could work with both forms; but I don't even think this is always in the same form in the default Lahman database, so it might not be an issue. Here are links to the databases, in case someone who knows more than I do wants to take a look ;) Gambo Anakit Debut=Draft Year
|
|
|
|