I just had a phone call from a Drobo senior engineer. He was very frank and direct. It was the sort of conversation two developers have when nobody from management is in the room.
Without going into detail, I have to say that I was impressed. They have of course been testing this setup thoroughly, since the very first Lion developer previews. The Drobo engineer outlined for me the testing procedures they’re using right now, to try to replicate the failures some of us are seeing. They haven’t been able to replicate it. If you can’t make something bleed, it’s hard to kill it.
If you’ve ever shipped software, you’ve faced this situation. A customer experiences some bug, maybe even an intermittent one, that you can’t reproduce yourself. It is maddeningly frustrating for both the developer and the customer.
We were on the phone for 45 minutes. He had very specific logfiles that he wanted from my system. He laid out for me the plan they have for killing this problem, and the multiple approaches seem very sound to me.
They do indeed need the performance tests that first-level support has been asking us to run.
Based on what I learned today I’m going to hang in there for a while longer.
I have created a new Time Machine share on the FS, and I’m running a backup to it now from one of my Lion machines. It’s working fine. I’m going to give it a couple more hours, then kill it, and apply the procedure that Sébastien used. The only change I’ll make is to mount the shares manually using SMB, instead of having to play Beat The Clock.
One other update is that my Snow Leopard machine, which was (immediately after the new firmware) seeing absurdly low throughput, is now functioning fine. I didn’t touch anything. I just let it work.