|
              |     Total Hits: 269594 |
    |
|     Past 7 days: 2419 hits |
Single vs Multicore Processors
October 27, 2005 on 8:24 pm | In Technology |I was going to write a sort of comparison between the Sony, Toshiba, and IBM Cell processor, and the recently announced PWRficient from PA Semi, but I felt I should first take a look at recent years changes in the microprocessor industry.
Until a few years back, every processor targeted at the desktop and entry to mid level server markets where designed and optimized to execute single threaded applications sequentially as fast as possible. The mantra was to increase the raw performance of the processor when running a single threaded application. Even tasks that could be massively paralleled where mostly coded in single threads to optimize their execution on those processors. Not that the programmers lacked the skilled, like some like to think, or that its a hard task to break up an application into multiple threads that can run in parallel. Any self respecting programmer should be able to do that.
This push for raw clock performance lead to a quick jump in clock speeds that was poorly matched in technology development of silicon manufacturing. By poorly matched I am not referring to gate switching speeds which enable for higher clock numbers, but rather to preserving the operational power efficiency of those processors. This rather crazy hype in raw performance by means of increasing clock speed was finally stopped by the astronomically high thermal power dissipation numbers that made current microprocessors have power dissipation figures that are close to nuclear reactors when measured on a per area unit base. Back in the mid 90s, thermal leakage comprised about 5% of the total power that a processor consumed, if not less. Now we have figures that are over 60% for thermal leakage of the total power that current processors require to operate which leads to ridiculously large and in many times noisy cooling solutions that need to be used just to keep them running. And even then, these processors would be running at such high temperatures that would only require a few seconds to fry those processors when the fan of that 750g heat sink fails despite the heat sink having an area of over 1 square meters.
Hitting these simple physics barriers, or more precisely, knowing that they will hit them sooner than they expected, the microprocessor giants went back to the drawing board and decided to change the way things work. The new marketing theme they introduced, albeit too late (better late than never, right?), was parallelism. Even then, the move was to tape out products as soon as possible without necessarily redesigning those products the way they are supposed to be made. Basically, all they did was glue a pair of the processors that they are currently selling and let the marketing department find a way to convince the public that this was, as always, the way things were meant to be. There was no real going back to the drawing board to design a real parallel processor that could take the computing world to the next level.
If you asked me, why would anyone want a 3.8GHz, single core processor that demands a lot of power, and hence a lot of cooling, yet has poor Instruction Per Cycle (IPC) figures, instead of a 400-500MHz processor that is comprised of 8 or more cores or execution engines that are really and efficiently integrated (and not glued) together? Think of the graphics cards that are in the market today, and their impressive performance figures, cut down the transistor count by a quarter (which also means manufacturing costs) and you will get an idea of what I am talking about.
Now what would an old design like the 486 need to have redesigned in order to keep it up with today’s standards? I don’t think that matters as long as you keep the transistor count low per core. First, the instruction set would need to be updated to today’s standards like adding SSE (1,2, and 3), maybe even 64-bit extensions. Execution units redesigned to be able to execute those instructions effectively. The addition of some extra registers, some power management abilities like dynamic clock throttling and ability to turn off unused functional units and entire cores. Throw in a nice 2MB shared cache between the cores, a wide memory interface that could nicely run in dual channel configuration for very high bandwidth memory access scheme like a dual channel 128-bit wide memory controller (even with cheap DDR400, that would give a 12.8GB total memory bandwidth, though memory modules will have to be installed in quadruples), and you will have a very happy processor that is really capable of giving the best current desktop processors a money for their run.
Now, some people may argue about the inadequacy of such a processor for some applications like games, office applications, and the such. My reply is that if the designers of those applications went back to the drawing board they would be able to find various ways to exploit parallelism in their applications. Take a word processor for example, one core could handle spell checking, another text formatting, a third can handle the user interface part, and you could even throw grammar check in the face of a fourth core. Each of those tasks is not a heavy load on a processor by itself, it’s the combination. Even games could highly benefit from parallel machines, its just that there wasn’t a drive on the hardware side to push game programmers to exploit that on the software side. All those parts of an application can be done in parallel while an extra core or two take care of the operating system to keep user responsiveness high, very high I would dare say.
The technology to make such a product has long existed, and due to lower clock speeds, such solutions would run on very low power figures compared to today’s mainstream processors even without implementing any form of power management.
In my next post, I want to talk about a new processor architecture that has been making the news for the past few days, which is PWRficient from PA Semi, and the way it approaches parallel thread execution in comparison to IBM’s Cell.
No Comments yet »
RSS feed for comments on this post. TrackBack URI
Leave a comment
You must be logged in to post a comment.
Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^
Free website monitoring service