Tue 02 Dec 2008

RSS Feed

Edited by Paul Hales

Published by Incisive Media Investments Ltd.

Terms and Conditions of use.

To advertise in Europe e-mail here

To advertise in Asia email here.

To advertise in North America email here.

Join the INQbot Mail List for a weekly guide to our news stories:

Subscribe

Nehalem and Larrabee dissected

IDF San Francisco One day married via QuickPath?

THE BIG NEHALEM show off was not unexpected this Intel Developer Forum, as you have read on our pages before. Yes, there still might be a chance of some last-minute performance quirks on the DP Gainestown platform to work out, but otherwise the stuff works fine.

We had a chance to play with several desktop platforms - both the widely present extreme Bloomfield and "far future" Lynnfield, as well as a few server and workstation setups on Intel, Asus and Supermicro mainboards. The Jones Farm, Hillsboro Oregon server testing team put together nice setups on which we got some really nice results in the new Sandra 2009 beta, including memory bandwidths exactly thrice the current Skulltrail platform ones - despite similar memory latency on these early mobos.

Sometimes, one gets really peed off when certain "partner" vendors, despite all the effort put together to get these early machines shown under so many veils of secrecy, can't even properly configure the number of memory DIMMs for the tests, not to mention the timings - like put only one DIMM in there, huh.

Basic rule for Gainestown DP tests: six, or multiple of six, identical - and I mean IDENTICAL - DIMMs, one per each channel in the same first positions, of course with the best bandwidth and latency timings possible, at the lowest voltage. There will be mobos there supporting the 1.35 v DDR3-1333 and faster clocks, helping the memory signals be at the voltages much closer to the CPU voltage itself - helps minimise the power differences across different parts of the Nehalem die and simplify the setup overall.

And oh yes, non registered, non ECC DIMMs do seem to work fine on the Asus DP Nehalem - perfect for making an easy Skulltrail 2 or such platform with "normal " (read XMP DDR3-2000 and such) memory. Six channels of DDR3-2000 and above will give us 100 GB/s of raw memory bandwidth - a nice milestone to reach.

According to our dear "All-round roadmap guru" friend Stephen Smith, there are even mobo vendors playing around with Gainestown setups using TWO X58 Tylersburg I/O Hubs (former North Bridge for the uninformed) for four x16 v2 PCI-E links without using those Nvidia "bridges". Yeah, you lose I/O symmetry, and God forbid the PCI cards sitting on the different IOHs try to talk to each other - three QPI hops across two CPUs and two IOHs would be needed: how about an, umm, "sideport" for some direct inter-IOH comms, like the one on ATI RV770? Unless, of course, you link the two spare unused QPI links on both X58's together.

On the Bloomfield UP side, the Intel Frenchman, Francois P, benchmark guru from the Satan Clara offices, was there to impress the press with his SSDed Nehalem uber-desktops running DeepViewer across 30-inch screens. It was fun looking at several gigapixels worth of RAW images plus some videos (displayed all in parallel) with real-time zoom, pan and scroll, all using hand movements detected via webcam. Yeah, you may need a Bloomfield Extreme - sorry, Core i7 900 Extreme Edition - running at one number above 3 GHz, with say 12 GB RAM and quad SSDs in RAID 0 setup to drive that fully.

Nevertheless, this is the best file manager I ever saw - no more icons and such, you see each file as it is, in real time. What's that for a new usage model?

In front of the (im)press audience, Francois P openly warned "those living in Singapore" - looking straight at yours truly - to ensure that, when running their new baby, the windows are closed and air-conditioning is on full blast. Now, this is actually not a joke: Nehalems seem to be more "environment aware" in the case of unstable or unfriendly temperature conditions. So, if playing with extreme Bloomfield, you should ensure acceptable environment temperature and not stinge on really good cooling, or alternatively, go for AMD graphics groups' favourite haunt, Iceland.

Overall, the desktop platform seems to be very stable, including the DX58SO "Smackover" Intel desktop mobo, but the memory setups might be more generous on Asus and other Taiwanese X58 mainboards: six DIMMs, equal across three channels, sound much better than four, with that one spare better never be populated due to the memory performance drop it causes.

So, Nehalem has arrived and is here to stay, whether Core i7 desktop or Xeon workstation and server.

Interestingly, almost every Intellite we spoke to mentioned the word 'Larrabee' at some point. And, it's not just that initial PCI-E card we're talking about. For better or worse, at the end, Larrabee is an X86 processor - yes a very specialised one, but still an X86er which in theory could run that very same X86 code, and one day maybe even boot an X86 OS depending on the MMU in there. An Intel compiler, say rev 11, could easily make a fat binary that runs on both X86 CPU-only and X86 CPU+GPU, or should we say, CPU+coprocessor - remember the 8087 days?

Hold on, coprocessors should operate in the same memory space as the main CPU for simple coding, after all - PCI-E may only allow shared virtual memory, even after all the tricks are exhausted. Those that use Quadrics QsNet supercomputer interconnect from Old Blighty, the only one that enables this at high speed over PCI, know this well.

Well, there is always Intel's closely-guarded crown jewel, QPI. Ultra high bandwidth, memory-level latency, and yes coherent shared physical memory enabling a Nehalem (Westmere or Sandy Bridge actually) and a Larrabee, say Gen2, to be seen as one logical CPU - now we talk business... or a marriage made in heaven?

Yeah, AMD can do exactly the same over HyperTransport 3.1 as a high-end 'Fusion' matching multiple CPUs and GPUs, and Nvidia can do too since they have HyperTransport as well - oh sorry, they got no CPUs... yet.

Too bad, Intel could have done all this six years ago if the previous "Only The Paranoid Survive" big boss made use of the excellent Alpha platform that awaited Intel on the silver platter, rather than succumbing to the "Not Invented Here" syndrome and going for the bulky, slow cargo ship Itanic that sucked in billions. Alpha EV7 had all these interconnects, local memory, I/O buses and beautiful scaling - not to mention the most elegant RISC architecture ever still able to run X86 code through that FX!32 masterpiece, and even an EV9 8,192-bit wide vector design readied by an Euro-American collaboration. And it was 2001.

Maybe Alpha is an ideal CPU for Nvidia, after all: i has no need for X86 licence, can outclass everything else, and Nvidia is used to big dies anyway. µ

Comments

Larrabee is Intels Secret Weapon.

I Stated Case from when Mainboards where cpu, as Bit O' Honey was used as Fastner. However, Now all those larrabee teams & Billions funded to make/test/repeat may actually be Intels saving Grace.

AMD is Pushing Deep Serial Solution to 8 Cores Start Ups, yet 100 or 1,000 seem bit Large. Intel could push 8 Core from nest Before its ?Time or stuck in short track from developement standpoint.

Larrabee Gained Ground to Be Recognized Potentate\. Probably Misty May of all around/long time in earnings Gold, Practical Model for Good long Set in paractical applications dept.?
While Complexity of AMD/Ati is about to spring into what could form 32X slot from one controller, just double 16X pins on slot & controller does rest. Call that 32X X2, as two gpus' seem in order, somewhere on thing or new 2X model(Like that NEW 8 Medalie Mike, Completely Capable of Setting MANY WR/Golds), usefully strong in present .

With So Much Scrubbing & Reinforcing its Hard NOt To believe AMD will Walk d' Walk on this ONE before larrabee gets Wind in Sails, Finally.From Land to Land Seek Great Whites....

Yea,AMD/Ati. Yea,Intel.

TS drashek

posted by : ThomasofLarrabee, 21 August 2008

Iceland

er... Greenland?
posted by : mike, 21 August 2008

Well Done!

A most insightful and telling report!
SSD Stripping RAID - I'll notify the vice squad. How will the new 6GB/s
SATA (Will it première with QPI DPs?), benefit RAID 0 (or any for that matter), with respect to stripe width and size scalings for both SSDs and HDDs in terms of server, HPC, HTPC, and WS uses configured for each 32 and 64 bits stack up in both HT3.1 and QPI? Please contrapose with your answer
as affects inter-IOH comms and real versus shared virtual memory bandwidth linkings and I/O symmetry, sideports, etc. I should also expect that you would be so good as to expound the Bavarian hops to suit the English market.

Oh, err, Nebojsa, please hurry; the Home Office has lost a computer memory stick containing personal details about tens of thousands of criminals, and the Beeb requires your answers to compile this Friday's 7 days News Quiz.
posted by : Adm. â‚­arlsbad, 21 August 2008

Blablabla...

I was expecting to find the answer to the main question: did Intel succeeded to clock Nehalem substantially higher then almost similar design AMD ?
posted by : Slava, 23 August 2008
IThound
Search for solutions, reports & analysis

Newsletter signup



 

Top INQ Stories