Hey everyone! Long time no see. There haven’t been many developments with the site the past couple months (since I removed the ads), but there are a couple things I want to bring to light today.
1. We need Dark Explorers scans!
Word on the street is that there were prereleases last weekend. If you went to one (or are going to one during an upcoming weekend), it would be awesome it you could help contribute card scans! You can find instructions on how to calibrate your scanner for optimal results here.
You do not need to crop the scans; I’ll be able to find someone to take care of that, unless you’re good at it. Last time Spike P. helped out with the image editing, and he did a stellar job.
I’ve got all the cards in the system as drafts right now; we just need the images and I’ll be able to publish them. I know there are some low quality scans floating around, but I’d rather wait for better ones since I know I’d eventually have to replace them.
2. New Rating System
I mentioned in the past that I thought the current rating system was going pretty well, and that the ratings would even out over time to become fairly accurate. This, however, hasn’t really panned out, as it seems like the first couple people that rate a card have a huge influence on its score.
For example, if a card starts off with a couple 10 ratings, subsequent raters usually assume the card is good, and give it a good rating as well (even if the card actually sucks). Vice-versa happens as well (bad ratings for good cards).
It pisses me off that the rating system hasn’t worked out because I really wanted this to be a good resource for newer players, so they could actually get a decent idea how good or bad a card really is. Right now, the ratings are a pretty lousy indicator of a card’s playability.
However, I do have another (better) rating system in mind. This is an early idea I had for how to get the most accurate ratings, but it was going to be more complicated to implement, so I took the easy way out and went with a pre-packaged ratings plugin (which is what we’re using now).
While I’m laying in bed, trying to fall asleep, I often philosophize and think about things deep topics which I don’t quite yet understand. One night last week I thought about the alternate rating system, and actually figured out a way to build it. I ran the scenarios through my head, then scribbled down the components necessary to make it work (so I wouldn’t forget anything), and set the paper aside to let the plan marinate a little longer in my brain.
Here’s basically how it’ll work (and PLEASE give me input if you have any ideas on how to make it better):
A. Head-to-Head Matchups
Right now you are shown one card, and you pick a score from 1 to 10. Numbers are extremely arbitrary, and even I, who has played this game for years, don’t even know how to accurately rate a card. There are so many variables to consider, and numbers don’t really mean anything.
How do you quantify the “goodness” of a card, you know? “Gyarados SF is a 9/10.” Ok… what does that mean?
Instead, I think it’s a lot better to compare cards. If you show me Gyarados SF vs Volbeat TM, I can say with certainty that overall, Gyarados SF is the better card. Maybe not in every single game situation, but overall, Gyarados is the more playable card. Gyarados SF vs Professor Juniper? That’s a little more difficult, but I would say Juniper is the more playable (and overall stronger or game changing) card.
You can be a lot more certain about which cards are good when you do one-on-one comparisons. If you do enough of these comparisons (maybe a few hundred for each card), I think you can start to build an accurate picture of where cards rank in terms of playability.
B. Click to Rate
The way this will work is that each card will have a link the says “Click to Rate,” and when you click it, an overlay window will display with the current card vs a random card. You will be prompted to click which card you think is better, then the window will refresh and a new random card will be matched up against our hero.
A new random card will be displayed until your mouse breaks, or you decide to click off the window. I want to get as much data as possible, so I’m not going to limit the number of matchups that show up. The more data we can collect, the better.
C. Random Card = ANY Random Card
I was considering limiting the random card that appears to be in the same set as the initial card, but the issue with doing that is not all sets are created equal. Power Keepers is a much weaker set than Next Destinies, for example.
If Power Keepers cards were only matched up against Power Keeper cards, then some cards would end up seeming a lot better than they really are. Vice-versa applies with Nest Destinies (good cards would seem worse than they really are).
I know with the power creep that has infested the game, newer cards are for the most part going to seem better than older cards… but’s accurate, I think. It might be better to compare cards only to others that are modified legal at the time, but cards always end up being in multiple formats, and either lose or gain power over time. Comparing vs any random card should be good enough.
Side note: I’m also considering maybe keeping track of both a card’s rating compared to all cards AND compared to its set. That might be the way to go.
D. Rating Calculations
Each time a card is displayed in a matchup, it will get +1 to its number of impressions. If a card is picked, it gets +1 to its score. The card that isn’t picked gets 0 added to its score. The rating will simply be displayed as the card’s score divided by its number of impressions, multiplied by 100 to get a percentage.
For example, if Gyarados SF is pitted in 100 matchups, and is picked as the better card 87 times, it will have a rating of 87%.
E. Minimum Number of Impressions
I think it’s important each card receives a minimum number of impressions before its rating is displayed, in order to prevent rating bias. If a card is rated once, and receives a 100% rating, then subsequent raters might think the card is godly, and keep picking it, even though it’s not that hot.
My initial thoughts were to make a minimum of 100 impressions before the rating is displayed. That seems like it should be a decent sample size, but I’m not sure. I should have paid better attention during statistics class… I forget how to tell what sample size makes a number “statistically significant.” I can tell you though that there are almost 6,000 cards in the database.
F. Limit the Voters
Lastly, I’m considering only letting authorized registered users vote, at least to start off. I want to prevent trolls like J-Wittz from giving Hoppips perfect ratings. I don’t like making people register, but it might be the best way to keep things legitimate.
At first I thought the trolling might be funny and went along with it, but it’s bad for the site. The database becomes a lot more helpful when the ratings are accurate.
G. Potential Issues
The main thing I’m not sure how to deal with is repeat ratings. What I mean by that is the same two cards getting matched up against one another, repeatedly, before they’ve been matched up with unrated cards.
If Gyarados SF got matched up against Gust of Wind 9 times in a row, then finally got paired up with a meager Magikarp, it might have only a 10% rating when it’s really not that bad. Ideally, it would be matched up against every card out there one time, then repeat the cycle.
With a smaller database, I feel it would be a bigger issue, but with 6,000 cards, my theory is that things should even out. I’m sure there is some way to prevent the same two cards from being matched up before all 6,000 are cycled through, so I’ll have to look into this.
I think that’s about it… I’ll try to starting coding tonight, though playoff hockey has put a damper on my productivity the past week and a half. Expect it to be done sometime in May.
Please leave feedback if you have any, and thanks for reading!
What about if a card gets paired with a reprint of itself?
The deck-specific decisions are going to be tough, though. Collector Vs Dual Ball? 40 HP vs 40 HP Tynamo? PONT vs Juniper (this is especially tough, since we usually just say “both”)?
Getting paired vs a reprint is bound to happen, though it’s unlikely. If the cards get enough ratings, then it’ll make little difference. It’s too much work for me to try and shore up every instance of reprints from showing.
The question is going to be “Which card has (had) the most impact on competitive play?”
I think that makes it fairly easy decide between cards. 30 HP vs 40 HP Tynamo is pretty much a no-brainer in this case (30 HP has had way more of an effect). Collector vs Dual Ball? Collector – overall – has had the bigger impact through time. PONT vs Juniper? I’d have to say Juniper.
Why is it that now that I like 40 HP Tynamo, everyone switches to 30 HP? Especially with SAB.
The only issue I have with this is that often times, the competitive influence of a card might not perfectly correlate to just how much potential the card would have in the setting. This is particularly noteworthy among unevolved pokemon. A HUGE amount of them would have no real impact on the competiitve field at all. An apt comparison would be Stormfront machop vs. base set macho pis a fairly apt comparison.
In its format, base set machop was one of the most powerful evolution capable basics out there. In fact, it’s arguably better, stat for stat, than stormfront machop! On the other end, base set hitmonchan was more dominant than the entire line.
Stormfront machop, on the other hand, could be evolved immediately into Machamp via rare candy, turning into one of the most stupefyingly powerful stage 2 pokemon in the entire tcg.
Stormfront machop was more important in competitive play, because its evolutionary line would actually get you somewhere…but the base set version was one that people could actually field and rely on. So ,while base set machop was something that COULD be useful, stormfront machop was the card that had more competitive influence, and thus would receive a higher rating.
I’m fine with the way it is, but I’m not sure if this metric is the best for casual players, or for people seeking to evaluate basic pokemon, it might be a bit off.
Thanks for the feedback. No system will ever be perfect, but I think this is going to turn out to be pretty darn solid. The more ratings a card gets, the more accurate the rating should be. A few close matchups shouldn’t have that much of an effect on the rating.
If a card goes up against a reprint, rate them based on artwork/awesomeness.
For some of the more popular cards that directly compete with each other, maybe you could skew the RNG to match them up against each other? Like, make things like Typhlosion Prime have higher odds of matching up with Emboar and Eelektrik.
The idea is to match each card against every other card one time, then repeat. If Typhlo Prime got matched up against Eelektrik all the time, then the numbers would be skewed.
Eelektrik vs Typhlo 9 times, Eelektrik might be 9-0.
Typhlo finally gets matched up against a weaker card, and wins, it’s now 1-9. That’s not right.
Ohhhh, I get it now. This rating system should be much better than the current one then!
Problem: Newer player sees Hitmonchan Base Set vs. Hitmonlee UD (same card with 10 more HP and 20 more damage on 2nd attack) and doesn’t know what power creep is. Or someone just not knowing what power creep is in general.
Which is why I will likely limit who can vote!
Wait… I can vote, right?
I’ll have to get you and some other set up to vote, but yes! I’m still working on it though.
So, what happened to voting, So why not just limit it to a sixprizes account or something?
I’m working on it right now. I’ve hit some technical difficulties and am trying to fix it.
Thanks! I just wondered what happened. Glad you are working on it!
Good luck with this.
If you’re stuck at any point and you use PHP / AJAX, I think I can help if you ever need it :P (but I hope you won’t need it =P )
Hah, that’s exactly what I’m working with! I actually think I’ve got it working now, but want to fine tune a little bit with a couple things since the loads are slower when it’s live.
I could use a couple people to help beta test it, so send me an e-mail if you’d like to try it out!
Just a suggestion – I think it would be best if you picked the random cards from a pool of other sets that the main card was in rotation with at any time – there’s no point in comparing Mewtwo EX with Base Set Charizard; but certainly there is with say Darkrai EX.
The rating system would still connect all the cards indirectly because of how the rotation works.
Also a page that just paired two random cards would be great for people trying to just generally waste some time by rating cards!
Good point about the cards overlapping. I’ll think about it… coding will be more difficult if I try to limit the sets, but it might be worth it.
And I love the idea of just having a page with 2 random cards! I’ll definitely try to whip something up.
I’m still in the basic stages of the project right now, just so everyone knows… I was hoping to get more done today, but 6P got hacked and I had to deal with that all afternoon.
My only question is… Would the cards be rated according to their playability at the time? I mean BS Hitmonchan was obviously a really good card, but compared even to today’s “okay” cards, it’s not that great… So?
The question is going to be “Which card has (had) the most impact on competitive play?”
I think that makes it pretty clear.
Out of all of the rating systems I’ve seen, I personaly like the one on the Dan-Ball Powder Game rankings (link found here->http://dan-ball.jp/en/javagame/dust/search/). Basically how it works is that rather than a scale it has a total vote system. The more people endorse an entry, the higher the ranking is. To help keep unfair votes down, there is a one-vote per entry per IP adress I belive (not the best but it keeps the average Joe Schmo away). There are different colors signifieing how endorsed an entry is (although; it hasn’t changed since it’s creation and the player base rose significantly since so it’s not a very good scale as is.) Lastly, one is only allowed to vote on an entry within three months of submission which keeps older entries from staying at the top since they’re the oldest and have had more chances to be endorsed. If I missed anything I provided the link.
Couldn’t you show the Ratings only after the person voted on that card as a short term fix? That way they would be a little less biased.
That’s a great idea, but the software I’m using shows the rating by default and I’m not sure if that functionality can be easily changed. I wish I could work on this now, but I have probably 2-3 more weeks of 6P stuff I’ve got to work on first.
I just noticed that I can vote on each card an unlimited amount of times. Huh.
It lives… I need a few beta testers though! Message me on 6P if you want to help: http://www.sixprizes.com/forums/posts/83211/
I might be partly responsible for the thing with stupid ratings. My favorite pokemon is Riolu and I went through and rated them all with 10’s. Then other high rates started and well sorry.