What's up with absolutely tiny amounts of tests per day? 200 tests per day is tiny. My FE team only has two devs and the product is fairly small, but we still managed to amass a couple of hundreds of E2E tests in several years. With tests run in CI on every PR or commit, this won't scale at all.
Presumably, this is backed by some sort of LLM to browser MCP integration. If true, how do you ensure tests don't randomly fail because of inherit unpredictability?
Good questions! You're right the 200 test runs limit is too low. I badly want to increase this, but currently each test costs $.05 to run so I need to get the costs down to enable that.
But my next update will fix that – and also answer your second question. What I've realized is we don't actually need to use the AI on every test run. We just need to use the AI on the initial run, then subsequent tests can just replay the playwright steps the AI chose.
And if a test fails, before throwing a failure we can re-run it with the AI assistance to see if the test truly failed from a user perspective. Hopefully when I add this functionality it can address your concerns around unpredictability :)
Looks interesting!
What's up with absolutely tiny amounts of tests per day? 200 tests per day is tiny. My FE team only has two devs and the product is fairly small, but we still managed to amass a couple of hundreds of E2E tests in several years. With tests run in CI on every PR or commit, this won't scale at all.
Presumably, this is backed by some sort of LLM to browser MCP integration. If true, how do you ensure tests don't randomly fail because of inherit unpredictability?
Good questions! You're right the 200 test runs limit is too low. I badly want to increase this, but currently each test costs $.05 to run so I need to get the costs down to enable that.
But my next update will fix that – and also answer your second question. What I've realized is we don't actually need to use the AI on every test run. We just need to use the AI on the initial run, then subsequent tests can just replay the playwright steps the AI chose.
And if a test fails, before throwing a failure we can re-run it with the AI assistance to see if the test truly failed from a user perspective. Hopefully when I add this functionality it can address your concerns around unpredictability :)
Looks pretty cool, is this a Show HN?