Total Hits: 651654
Past 7 days: 3351 hits

On the Samsung Series 9 (900X3C)

March 13, 2014 on 11:37 pm | In Technology | No Comments

Recently, I bought a used, mint condition, Samsung series 9 ultra-portable, or Ultrabook as Intel likes to call them these days. Specifically, the model I bought is a NP900X3C-A04. This configuration is powered by a 17W IvyBridge Intel I7-3517U CPU, 4GB of 1600MHz DDR3L (low power), a standard MSATA LiteOn 256GB SSD, and a 13.3 1600×900 Samsung PLS panel. There are plenty of reviews that detail benchmarks of this thin little machine. So instead of rehashing those, I’m sharing my impressions after a couple of weeks of living with it from a (power) user’s perspective.

The first thing that strikes the new owner is how quickly it boots. Power-on to login is dealt with in 10 seconds flat! And the first thing that strikes the user after logging in is how bright and vivid the display is. Colors are rich, contrast levels are very good, and viewing angles are impeccable. I find 30-40% brightness to be perfectly comfortable for daily use, and closer to the 80% on most other laptops. Even in the day, unless you’re outside in direct sunlight, anything above 50% will start to hurt your eyes after a couple of minutes.

But what I like the most in this thin sliver of aluminum are the little details. Things most reviewers don’t even notice, or just skimp by without much attention. Things like how the lid closes itself just millimeters before meeting the body, reassuring the user it IS indeed closed shut, or how the display hinge is stiff enough to hold the display steady without the slightest hint of wobble, yet still light enough it can be opened with a single hand without needing to grab the body with the other fearing it would lift. The display bezel, while being one of the thinnest compared to any laptop, integrates a very thin rubber lip that goes around the sides and top of the display. Its job? Simply to provide a soft cushion for the lid when closed so it doesn’t rub against the deck, scratching it, while giving the closing action a soft, quiet, feel. This one in particular, I haven’t seen mentioned in any review, but is certainly one of the nicest little details about this laptop.

900X3C
Despite being ridiculously thin at 12.9mm (0.51in), it feels very solid in the hand. The display lid doesn’t have any play, hardly has any give if you try to twist it with both hands, and even pressing hardly against its back will not result in any ripples on the display. The story is pretty much the same with the body. The palmrest is rock solid, and so are the keyboard deck and bottom of the laptop. Its almost as if it is made from a solid block of aluminum. The sides have a silver, polished, cut to them that give the profile a very slender look and the illusion that its even thinner than it already is.

The backlit chiclet keyboard is one of the best to be found in ultra-portables. Its no Thinkpad, but its certainly one of the better typing experiences. The keys offer crisp feedback; there is no confusion as to whether a keystroke registered or not under your finger. The backlight is a subtle aqua tone that, even at its highest level, is still very comfortable to the eye in total darkness yet clearly visible in a well lit room at night. It also goes very well with the dark navy color of the body. The keys themselves are very generously sized and well spaced. Both shift keys, Tab, Enter, Caps-Lock, and backspace are standard sized. The only shrunken keys are the arrow and function keys. One nice feature about the keyboard is the Fn-lock key, nested between the F12 and Insert keys on the top row. When activated, the F1-F12 keys will lock to their alternative functions such as brightness, volume and backlight level controls. This key, along with the Wi-Fi, mute, and caps-lock keys have tiny blue LEDs to indicate whether they’re on or off. These LEDs are bright enough to be easily noticed, but not so bright to blind you like some other laptops. The same goes for the power and charging LEDs.

The Elan touchpad is very generously sized and has a very smooth silky texture. Its very precise and supports multi-finger gestures in Windows 8/8.1 that work quite well. Its a click-pad with no physical buttons, so clicking anywhere on it will register a left click, while clicking on the lower right part (if you divide the touchpad area in two columns and four rows, it will be the bottom right quadrant) will register a right click.

Connectivity-wise, there are two USB ports, one on each side. The left being USB 3.0, while the right one is of the 2.0 variety. The left side is also home to a Micro-HDMI and proprietary micro-Ethernet (Gigabit) while the right side houses a combo headphone/microphone 3.5mm jack and micro-VGA ports. Samsung includes the ethernet adapter in the box, while the VGA adapter is an accessory separately sold.

Two stereo speakers are located on the bottom sides, below the palmrest. While they lack in bass, as can be expected from such a thin machine, they are surprisingly loud and clear.

There are two fans to cool the ULV CPU that suck cool air from the bottom and push the warm air towards two exhaust vents on the back of the body. Under normal use, only one will turn on when needed to keep things cool. Under battery operation, the laptop stays cool – and quiet – and will barely get warm to the touch under load. In this mode, the CPU will top at the advertised 1.9GHz under load. I haven’t tested how long a charge will last, but reviews around the net range from 6-8.5 hours, depending on what you’re doing. Windows tells me I’ll get somewhere between 6 and 7 hours when I’m browsing the net on battery. Pretty decent for such a light and small machine. Plugged to the AC adapter, the CPU will Turbo-Boost to 3.0GHz under single threaded loads, and 2.8GHz under multi-threaded loads. Here, under heavy CPU load, both fans will kick in and while not loud by any measure (even for a meeting), they become quite noticeable. The palmrest will become warm, but not uncomfortably so. The bottom will become quite warm under load but I haven’t found it to be too warm to be uncomfortable on the lap, let alone categorize it as hot to the touch. I stress tested it a bit by rendering four, quite heavy, CAD designs in parallel. During the twenty-something minutes it took OpenSCAD to render them, all four virtual cores (dual cores with Hyper-Threading) were reporting 100% load under task manager. Yet, switching back and forth between the four instances of OpenSCAD and the seven open tabs on Chrome never felt anything but snappy. You can argue whether the credits for this go to Intel, Samsung, or Microsoft, but the end result is that even under load this machine maintains a smooth user experience.

Clearly, I quite like this machine and I’m quite impressed by its design, build quality, attention to detail, and performance. But it’s not all roses and rainbows. For starters, brand new my configuration was $1800, way too much IMO for something that can’t be a first computer for power users. The keyboard, while providing excellent feedback is extremely shallow. I have gotten used it, but its still not the most comfortable experience. The clickpad touchpad is pretty loud. Connectivity is anemic, to put it midly. Why on earth did Samsung bestow only 2.0 speeds for the right side USB port is beyond me. The same goes for using single-channel memory for the 4GBs of RAM. The lack of an integrated 3G/4G modem or at least the option to add one is also a bit of a let down. Its not like this is a budget laptop and they had to cut corners somewhere. The power adapter, while very compact and light, uses the bulky C5 connector instead of the smaller C7 with its much thinner cable. If C7 is good enough for the likes of Apple and Lenovo, I don’t see why it’s not good enough for Samsung. The power cord is as heavy as the adapter itself and about twice the size!

To sum it up, this is a very well thought out, and very well made laptop. It has some downsides, but if you can get past them, its really a great little machine. Its very stylish, but in a subtle and quiet (read, non-flashy) way. Brand new, at retail price, I would never consider buying it. But thanks to Samsung’s accelerated refresh cycles of the Series 9 (no less than three in the past 12 months!), which has since been renamed the ATIV 9, NP900X3C models powered by IvyBridge i5 and i7 processors are hitting the used market at a third, or even less, of their price barely a year ago for very clean, near mint, units. At this price point, it becomes a whole other proposition. It becomes one of the best value ultra-portables out there in terms of size, weight, design, build quality, and performance, and should be on the top of the list of anyone looking for an ultra-portable on a budget.

Concurrent Blinkies

January 19, 2014 on 3:55 am | In Technology | No Comments

Now that we have a working, minimal, Hello World, its time to take it to the next level. This time, we’ll write a concurrent program that blinks two leds from two threads on two ports using two timers. Its twice the fun all around! :P

Create a new project adding all the appropriate includes, or use the same one we created in the previous post. I’ll be using the same project modifying the code inside HelloWorld.xc. Lets take a quick look at the code:


#include<xs1.h>

out port p1 = XS1_PORT_1A;
out port p2 = XS1_PORT_1D;

timer t1, t2;

void blinky1(){
    unsigned int time;
    char state =0;

    t1 :> time;
    time += XS1_TIMER_MHZ * 1000* 300;
    while(1){
        t1 when timerafter(time) :> void;
        state = ~state;
        p1 <: state;
        time += XS1_TIMER_MHZ * 1000* 300;
    }
}

void blinky2(){
    unsigned int time;
    char state =0;

    t2 :> time;
    time += XS1_TIMER_MHZ * 1000* 200;
    while(1){
        t2 when timerafter(time) :> void;
        state = ~state;
        p2 <: state;
        time += XS1_TIMER_MHZ * 1000* 200;
    }
}

int main(void){
    par{
        blinky1();
        blinky2();
    }
}

We begin by including xs1.h, which enumerates the hardware resources on the XCore platform providing easy access to hardware resources; and also provides a lot of helper functions that simplify the setup and configuration of those hardware resources.

Page 10 of the XMOS StarterKit hardware manual says we have two little green user LEDs connected to two 1-bit ports (1A and 1D). Those are the little guys we’ll be blinking. In order to drive those leds, we’ll need to access the two ports to which they are connected. Since ports have to be declared as global variables in XC, we next declare our two ports for output:

out port p1 = XS1_PORT_1A;
out port p2 = XS1_PORT_1D;

The definitions for XS1_PORT_1A and XS1_PORT_1D come from xs1.h. XMOS was nice enough to enumerate all ports available on their XCore using simple, descriptive, names. They were also kind enough to make a table that lists all those names in a table included in their Introduction to XS1 Ports.

Since we’ll be blinking two leds on two separate threads, we’ll need a way to track time on each thread so we know when to turn on and off each led. Luckily for us, each XCore tile includes 10 timers we can use in our programs. For now, we’ll just declare a couple:

timer t1, t2;

Now, we come to our blinky thread functions. Lets have a look at the first one:

void blinky1(){
    unsigned int time;
    char state =0;

    t1 :> time;
    time += XS1_TIMER_MHZ * 1000* 300;
    while(1){
        t1 when timerafter(time) :> void;
        state = ~state;
        p1 <: state;
        time += XS1_TIMER_MHZ * 1000* 300;
    }
}

We begin by declaring an unsigned 32-bit integer called time that we’ll use to hold our time readings from timer t1. Since XC doesn’t provide a native boolean data type, we’ll use a char named state to hold our port state and set its initial state to zero. Then we read timer t1‘s state into time using an input construct (:>). Next, we increment the value of time by 300ms (XS1_TIMER_MHZ defines the clock speed in MHz at which the timer counts. We multiply this by 300,000 to ontain a 300ms cycle) and then we dive into an infinite loop.

The loop begins with a when statament that makes use of the timerafter() function. The conjunction of when and timerafter causes the input operation to wait until the timer reaches the value of time. Since we don’t care about the current value of the timer, we read that into void.

Next, we do a bitwise NOT operation on state to flip its bits before outputting its value to port p1. Note that only the most significant bit of state will actually be used since we are outputting to a 1-bit port.

Finally, we increment our time variable by the equivalent of another 300ms wait before repeating the loop again.

The second blinky function for the second thread is the same but uses timer t2 and has a cycle of 200ms instead.

The program concludes with its main function:

int main(void){
    par{
        blinky1();
        blinky2();
    }
}

The only thing we’re doing here is spawn blinky1 and blinky2 functions in two concurrent threads by calling them both inside a par statement.

Build the program in xTIMEcomposer by hitting CTRL-B and make sure there are no errors. Once it builds successfully, connect your StarterKit to your computer and hit CTRL-F11 to upload and run the program. If everything went successfully, your StarterKit should begin blinking its LEDs like this:

xTIMEcomposer Hello World!

January 12, 2014 on 5:22 am | In Technology | No Comments

So, now that we’re introduced to the XMOS StarterKit, covered the basics about the XCORE architecture, and xC programming language, its time to dive in and make our first program.

xC programs are developed, compiled, and debugged from within XMOS’ xTIMEcomposer IDE (Integrated Development Environment). The IDE is available for download freely at XMOS’ site after registration.

If you haven’t already downloaded and installed it, go ahead and do it. I’ll wait :)

Now that we all have xTIMEcomposer installed, the first thing we’ll need to know about it is that it is based on Eclipse. This means that if youe ever want to figure how to do something in xTIMEcomposer, just google how it is done in Eclipse. Chances are xTIMEcomposer will be the same.

So, launching xTIMEcomposer for the first time, we’ll be greeted with a small window asking where we want to locate your workspace. A workspace is the folder containing one or more projects that may comprise a bigger application. Like Eclipse, xTIMEcomposer will default to a folder named workspace inside your user’s home. I like to create a folder there that groups things based on type or category, so I created a folder for my XMOS projects. Hence, my workspace is located at c:\users\USER_NAME\XMOS\workspace. You might also want to check the “Use this as the default and do not ask again” checkbox if you don’t plan on working on several XMOS projects at the same time to skip this step every time you launch xTIMEcomposer.

01

When we launch xTIMEcomposer for the first time, it will ask us to login using the same username and password we used to register at the XMOS site before downloading it. Note that username by default will be your email.

02

Once we are logged in, its time to create a new project for our Hello World program, To do this, click on the File menu, choose New, and select xTIMEcomposer Project

03

The new project wizard will open. Under Project Name write “Hello World”. Under Location verify Create new project in workspace is selected. Under Create a new application based project radio button click on the Target Hardware drop down, and select “XMOS starterKIT”.

04

Then, make sure the Copy XN file into new application is checked, and click Finish

05

xTIMEcomposer will take a couple of seconds to create your new project, and then we will be presented with your project’s workspace:

06

For now, we are concerned with three areas in the workspace only:

  • The top left window titled Project Explorer: This is a tree view of your new project and all its resources (source files, make file, XN file, etc).
  • The top center big window: This is your code editor, where we will write all your program code. A code file is opened by double clicking on its name in the Project Explorer. Note that the code editor is tabbed, so we can have several code files open at the same time.
  • The bottom center window titled Console: This is where xTIMEcomposer will output everything it has to say, and where your StarterKit will output its messages (like print commands).

The first thing we want to do with our new project is include XMOS’ library files, so we can reference those libraries and use their functionalities. To do this, in the Project Explorer, right click on the project’s name (Hello World) and select Properties from the context menu (note the “Alt+Enter” shortcut next to it? This is the shortcut to access a project’s properties window quickly):

07

The project properties window is comprised of two areas:

  • On the left, a tree view of all the properties available for the project.
  • In the center region is a list of all the include directories that will be referenced in our project
  • And on the right, a details window showing all the options and the possible values for each.

From the tree view, expand the C/C++ General group, and click on Paths and Symbols. The properties window will update to show the properties and values of the paths and symbols of the project.

Under the Includes tab, click on C Source File under Languages, then click on the Add button on the right:

08

In the Add directory path window that will appear, we will write the full path of the directory (folder) we want to add to the list of include directories of our project under Directory.
Make sure to check both Add to all configurations and Add to all languages checkboxes before clicking OK.
There are four directories we want to add. Assuming you installed xTIMEcomposer under “C:\Program Files (x86)\XMOS\xTIMEcomposer”, they are:

  • C:\Program Files (x86)\XMOS\xTIMEcomposer\Community_13.0.1\target\include
  • C:\Program Files (x86)\XMOS\xTIMEcomposer\Community_13.0.1\target\include\c++\4.2.1
  • C:\Program Files (x86)\XMOS\xTIMEcomposer\Community_13.0.1\target\include\c++\4.2.1\xcore-xmos-elf
  • C:\Program Files (x86)\XMOS\xTIMEcomposer\Community_13.0.1\target\include\gcc

So, we will want to repeat this step four times:

09

After we’ve added all four directories, our Include directories list should look like this:

10

Click the Apply button to add those four directories to our project. Since we are updating the project’s references, xTIMEcomposer will ask if we want to rebuild our project in order for it to index the files in those directories and provide autocomplete (intellisense) for those new files. If you don’t want xTIMEcomposer to ask you this everythime you add a new include directory, make sure to check the Remember my decision checkbox before clicking Yes:

11

After a couple of seconds, our will be updated to reflect the new include directories that we just added. Those will appear under an Includes folder in the project tree:

12

Now, we can add our “Hello World” code to our project!
In the “HelloWorld.xc” file in the code editor area, type the following code:

#include&lt;stdio.h>

void main(void) {
    printf("Hello, World!\n");
}

13

Now, let’s try to build our project to see if it has any errors. Click on the Project menu and select Build All (note the Ctrl+B shortcut)

14

xTIMEcomposer will begin building our project. Being based on Eclipse, it is possible to tell xTIMEcomposer to build our project in the background. In fact, we can tell it to always do that when building our projects:

15


If our project has no errors, the Console will say “Build Complete” at the end of the build process.

The only thing left for us to do now is run our project. But before doing that, make sure the StarterKit is connected to your computer and that Windows has finished installing its drivers.

To run our project, click on the Run menu, and select Run (guess what “Ctrl+F11″ does?):

16


Since this is our first time running a project from xTIMEcomposer, a window will pop up asking us to select where we want to run our program. Click on XMOS starterKIT connected to… and click OK

17

If everything goes without errors, xTIMEcomposer will upload our program to the StarterKit, and the StarterKit will run it. Since the only thing our program does is print a “Hello World!” message, it should appear in the console of xTIMEcomposer

19


Congratulations! You have just written and run your first XCORE program :)

A general overview of the XCORE architecture

January 2, 2014 on 10:10 pm | In Technology | No Comments

DISCLAIMER: This is a very brief introduction to the XMOS XCore architecture. It is based on the XMOS XS1 Architecture documentation. Large portions and a lot of details have been omitted from the architecture document for the sake of brevity. Therefore, it is not meant in way to be a substitute for the XS1 Architecture document.

The XS1 is a family of programmable, general purpose processors that can execute languages such as C. They have direct support for concurrent processing (multi-threading), communication and input-output. The XS1 products are intended to make it practical to use software to perform many functions which would normally be done by hardware; an important example is interfacing and input-output controllers.

  • The XCore Instruction Set:
    The main features of the instruction set used by the XCore processors are:

    • Efficient access to the stack and other data regions and efficient branching and subroutine calling is provided by the use of short instructions.
    • Memory is byte addressed; however all accesses must be aligned on natural boundaries so that, for example, the addresses used in 32-bit loads and stores must have the two least significant bits zero.
    • Each thread has its own set of registers.
    • Input and output instructions allow very fast communications between threads. They also support high speed, low-latency, input and output operations.
  • Threads and Concurrency:
    Each XCore tile has hardware support for executing a number of concurrent threads. This includes:

    • A set of registers for each thread.
    • A thread scheduler which dynamically selects which thread to execute.
    • A set of synchronisers to synchronise thread execution.
    • A set of channels used for communication with other threads.
    • A set of ports used for input and output.
    • A set of timers to control real-time execution.
    • A set of clock generators to enable synchronisation of the input-output with an external time domain.

    The set of threads on each XCore tile can be used to:

    • Implement input-output controllers executed concurrently with applications software.
    • Allow communications or input-output to progress concurrently with processing.
    • Hide latency in the interconnect by allowing some threads to continue whilst others are waiting for communication to or from other cores or other external hardware.

    The instruction set enables threads to communicate between each other and perform input and output operations. It provides event-driven communications and input-output operations. Waiting threads automatically descheduled making more processing power available to those that are running. The instruction set also supports streamed, packetised, or synchronous communication between threads. This enables the processor to idle with clocks disabled when all of its threads are waiting in order to save power, while allowing the interconnect to be pipelined and input-output to be buffered.

  • Instruction Execution And The Thread Scheduler:
    The processor is implemented using a short pipeline to maximise responsiveness. It is optimised to provide deterministic execution of multiple threads. Typically, over 80% of instructions executed are 16-bit, so the XS1 processor can fetch two instructions every cycle. As typically less than 30% of instructions require a memory access, the processor can run at full speed using a unified memory system.

    Threads on a tile are intended to be used to perform several simultaneous realtime tasks, so it is important that the performance of an
    individual thread can be guaranteed.

    The scheduling method used allows any number of threads to share a single unified memory system and input-output system whilst guaranteeing that with N threads able to execute, each will get at least 1/N processor cycles.

    This means that the minimum performance of a thread is XCore tile’s processing power divided by the number of concurrent threads at a specific point in the program. In practice, performance will almost always be higher than this because individual threads can be delayed waiting for input or output and their unused processor cycles taken by other threads.

    The time taken to re-start a waiting thread is at most one thread cycle. The set of N threads can therefore be thought of as a set of virtual processors; each with clock rate at least 1=n of the clock rate of the processor itself. The only exception to this is that if the number of threads is less than the pipeline depth P, the clock rate is at most 1/P.

    Each thread has a 64-bit instruction buffer which is able to hold four short instructions or two long ones. Instructions are issued from the runnable threads in a round-robin manner, ignoring threads which are not in use or are paused waiting for a synchronisation or input/output operation.

    The pipeline has a memory access stage which is available to all instructions. If the instruction buffer is empty when an instruction should be issued, a special fetch no-op is issued; this will use its memory access stage to load the issuing thread’s instruction buffer.

    Certain instructions cause threads to become non-runnable, for example while waiting for an input channel that has no available data. When the data becomes available, the thread will resume from the point where it paused.

    The tile scheduler therefore allows threads to be treated as virtual processors with performance predicted by tools. There is no possibility that the performance can be reduced below these predicted levels when virtual processors are combined.

    Instruction execution from each thread is managed by the thread scheduler. This scheduler maintains a set of runnable threads from which it takes instructions in turn. When a thread is unable to continue, it is paused by removing it from the run set.

    A thread may be paused because:

    • Its registers are being initialised prior to it being able to run.
    • It is waiting to synchronise with another thread before continuing.
    • It is waiting to synchronise with another thread and terminate (a join).
    • It has attempted an input from a channel which has no data available, or a port which is not ready, or a timer which has not reached a specified time.
    • It has attempted an output to a channel or a port which has no room for the data.
    • It has executed an instruction causing it to wait for an event or interrupt which may be generated when channels, ports or timers become ready for input.
  • Communication:
    Communication between threads is performed using channels. Channels provide full-duplex data transfer between threads. Channels carry
    messages constructed from control and data tokens between the two channel ends. The control tokens are used to encode communication protocols. A channel end can be used to generate events and interrupts when data becomes available. This allows a thread to monitor several channels, ports, or
    timers, while only servicing those that are ready.

    Channel ends have a buffer able to hold sufficient tokens to allow at least one word to be buffered. If an output instruction is executed when the channel is too full to take the data then the thread which executed the instruction is paused. The thread is restarted when there is enough room in the channel for the instruction to successfully complete. Likewise, when an input instruction is executed and there is not enough data available then the thread is paused and will be restarted when enough data becomes available.
  • Timers and Clocks:
    XCore provides a reference clock output which ticks at a standard frequency of 100MHz. In addition to that, a set of programmable timers is provided that can be used by threads to provide timed program execution relative to the reference clock. Each timer can be used by a thread to read its current time or to wait until a specified time.

    A set of programmable clocks is also provided and each can be used to produce a clock output to control the action of one or more ports and their associated port timers. Each clock can use a one bit port as its clock source.

    The data output on the pins of an output port changes state synchronously with that port’s clock. If several output ports are driven from the same clock, they will appear to operate as a single output port, provided that the processor is able to supply new data to all of them during each clock cycle.

    Similarly, the data input by an input port from the port pins is sampled synchronously with that port’s clock. If several input ports are driven from the same clock they will appear to operate as a single input port provided that the processor is able to take the data from all of them during each clock cycle.

    The use of clocked ports therefore decouples the internal timing of input and output program execution from the operation of synchronous input and output interfaces.
  • Ports, Input and Output:
    Ports are interfaces to physical pins. They can be tri-state, or can be configured with pull-up or pull-downs. A port can be used for input or output. It can use the reference clock as its port clock or it can use one of the programmable clocks, which in turn can use an external clock source. Transfers to and from the pins can be synchronised with the execution of input and output instructions, or the port can be configured to buffer the transfers and to convert automatically between serial and parallel form (serialization and deserialization). Ports can also be timed to provide precise timing of values appearing on output pins or taken from input pins. When inputting, a condition can be used to delay the input until the data in the port meets the condition. When the condition is met the captured data is time stamped with the time at which it was captured.

    A port has an associated condition which can be used to prevent the processor from taking input from the port when the condition is not met. When the condition is met a timestamp is set and the port becomes ready for input. When a port is used for conditional input, the data which satisfied the condition is held in the transfer register and the timestamp is set. The value returned by a subsequent input on the port is guaranteed to meet the condition and to correspond to the timestamp even if the value on the port has changed.
  • Events and Interrupts:
    Events and interrupts allow timers, ports and channel ends to automatically transfer control to a pre-defined event handler. A thread normally enables one or more events and then waits for one of them to occur. The thread can perform input and output operations using the port, channel
    or timer which gave rise to an event whilst leaving some or all of the event information unchanged. This allows the thread to complete handling an event and immediately wait for another similar event.

    Timers, ports and channel ends all support events, the only difference being the ready conditions used to trigger the event.

An Introduction to XMOS’ XC language

January 2, 2014 on 1:07 am | In Technology | No Comments

This post is based on XMOS Programming Guide and XMOS Multicore Extensions to C. I tried to summerize the contents of the 66 page document to help other people learning about XMOS processors with the StarterKit. Please note the code snippets included are not complete examples, but only the parts that illustrate how a certain feature of XC or the XMOS hardware is declared/used. Throughout these notes, I am assuming the reader is already familiar with normal C-language syntax and programming.

Note: The code snippets are not clear. If anyone knows or can recommend a WordPress plugin that handles code better, please do!

So, here it goes:

  • The XC language:
    XC is an imerative programming language based on C. XC programs are composed of multiple tasks running concurrently in parallel. The concurrent tasks manage their own state and resources and interact by passing messages to each other. To use the extensions in a project, files containing XC code must have the .xc extension. It is possible to integrate C, C++ and XC files within the same project. The build system compiles each file based on its extension.

    Some of differences between C/C++ and XC are:

    • case statements within a switch must be terminated with a break. ie: flow can’t cascade from one case to another (similar to C#)
    • XC supports optional, nullable, types: Resource types (ex: ports, timers, etc) and reference types can be made nullable. This means that a variable or function parameter can have a value or can be the special value null. The ? type operator creates a nullable type. In the following example, “paramC” may be sent as “null” when calling this function:
      void myFunction(int paramA, int paramB, int ?paramC, int paramD);
      

      The isnull function is used tot est whether a variable of nullable type is null or not. Ex:

      void f ( port ? p)
      {
      if (! isnull (p )) {
      printf (" Outputting to port \n" );
      p <: 0;
      }
      }
      
    • Multiple Return Functions: Functions can return multiple values without the need for additional call-by-reference parameters or definition of structs to encapsulate the return values. ex:
      { int , int } swap ( int a , int b) {
      	return {b , a };
      }
      
    • Reinterpretation: Allows wrapping/unwrapping arrays of a type into those of a larger type (array of chars to array of int). This can be useful in data transmission (such as communications using xlinks). ex:
      void transmitMsg (char msg [] , int numWords) {
      for (int i =0; i < numwords ; i ++)
      	transmitInt (( msg , int [])[i]) ;
      }
      
  • Input & Output:
    The XCORE architecture provides flexible I/O ports to communicate externally. These ports have many features that enable fast I/O processing.

    • All ports must be declared as global variables.
    • Ports can be passed as function parameters.
    • Ports are declared using the port keyword.
    • An output port is declared as “out port” while an input port is declared as “in port”.
    • To output a value to a port, the <: operator is used. While to input a value from a port the :> operator is used. Ex:
      in port oneBit = XS1_PORT_1A;
      
      p <: 1; // output the value 1 to port p
      p <: 0; // output the value 0 to port p
      

      [/code]

    • An input operation on a port can be made to wait for one of two conditions on the port: equal to (pinseq) and not equal to (pinsneq). These functions are used in conjunction with the when predicate to form a conditional input. Ex:
      in port oneBit = XS1_PORT_1A;
      int counter=0;
      int x;
      
      oneBit :> x;
      
      while (1) {
      	oneBit when pinsneq (x) :> x;
      	counter <: ++ i;
      }
      
    • The (quite powerful) select statement: When tasks are run in parallel, they execute their code independently of each other. However, during this execution they may need to react to external events from other tasks or the system environment. Tasks can react to events using the select construct which pauses the tasks and waits for an event to occur. A select can wait for several events and handles the event that occurs first.
      The syntax of the select statement is similar to that of the case statement:

      in port p1 = XS1_PORT_1A;
      in port p2 = XS1_PORT_1B;
      
      while(1){
      	select {
      		case p1 when pinseq (0x1) :> int x:
      		// handle the event here
      		break ;
      
      		case p2 when pinseq (0x1) :> int x:
      		// handle the event here
      	break ;
      	}
      }
      

      This statement will pause until either of the events occur and then execute the code within the relevant case. Although the select waits on several events, only one of the events is handled by the statement when an event occurs. If both inputs occur at the same time, only one will be selected. The other remains ready on the next iteration of the while loop.

      case statements are not permitted to contain output operations as the XMOS architecture requires an output operation to complete but allows an input operation to wait until it sees a matching output before committing to its completion.

      Each port, timer, or other resource may appear in only one case in a select statement. The XMOS architecture restricts each resource to waiting for just one condition at a time.

    • Chapter 2 of XMOS Programming Guide concludes with an example on how to implement a UART using a single thread, and further demonstrates the power of the select statement (pages 26-29)
  • Concurrency: The bread and butter of what makes XCore so different. xC programs are comprised of tasks that run in parallel. Tasks are just code so you can
    define them as normal C functions.

    Tasks can be run in parallel from any function. However, it is only in the function main that tasks can be set up to run on multiple different xCORE cores.

    • Tasks (threads) running in parallel are declared and exist within the scope of a “par” statement. Ex:
      par {
      	task1();
      	task2();
      }
      

      This statement will run task1 and task2 in parallel to completion. It will wait for both tasks to complete before carrying on. Tasks run on separate logical cores run in parallel in the hardware so there is no notion of priority or scheduling between the tasks.

    • Thread Disjointness rules: There are two simple rules to data access across threads:
      • 1) A variable can be read across threads only if all threads are reading its value only. Once a thread modifies that variable, none can access it.
      • 2) Only one thread can have access/user any single port.
      • Some examples to show the thread Disjointness rules in action (pages 32-33). Those not familiar with the concepts of threading and thread safety should read those two pages to get a better understanding of those rules.
      • Ports can to be declared on the tile which will use them, if the device has more than one tile. Ex:
        on tile[0] : out port tx = XS1_PORT_1A ;
        on tile[0] : in port rx = XS1_PORT_1B ;
        on tile[1] : out port lcdData = XS1_PORT_32A ;
        on tile[1] : in port keys = XS1_PORT_8B ;
        
    • There are various ways to communicate data between threads:
      • Channel Communication: Channels provide a primitive method of communication between tasks. They connect tasks together and provide blocking communication but do not define any types of message. The rules of channels are:
        • Channels are synchronous. Outputting data on a channel end will cause the sending thread to block until the receiving thread has consumed that data.
        • Channels are declared with the keyword chan
        • Each channel has two endpoints.
        • Channel endpoinds are declared using the keyword chanend (ex: passed as parameters in functions running on separate threads)
        • Channels are lossless. So, each channel output must have a matching input.
        • The amount of data going out must be equal to that coming in.
        • Only one thread can use a given channel end (in or out). Hence, only two threads can use a given channel.
        • The output and input operators <: and :> are used to send and receive messages respectively. The operators send a value over the channel.
        • Ex:
          chan c;
          
          void task1 (chanend c) {
          	c <: 5;
          }
          
          void task2 (chanend c) {
          select {
          	case c :> int i:
          		printintln(i);
          		break ;
          	}
          }
          
          par{
          	task1 (c) ;
          	task2 (c) ;
          }
          
      • Transactions: Provides a means for channels to synchronize on the beginning and end of a transaction, while running asynchronously within the transaction. The main things to know about transactions are:
        • Transactions exist within a par statement.
        • A transaction consists of a “master” thread and a “slave” thread, running concurrently.
        • Each of the “master” and “slave” is a statement. It can be a block of code surrounded by curly braces (a bit like anonymous methods in javascript), or can be a function call.
        • Each transaction can communicate over one channel only.
        • The master thread blocks only if the channel buffer is full. While the slave thread blocks if there is no data to consume.
        • Ex:
          int send[10], receive[10];
          chan c;
          
          par{
          	master {
          		for(int i=0;i&lt;10;i++)
          			c <: send[[i];
          	}
          	slave {
          		for(int i=0;i&lt;10;i++)
          			c :> receive[i];
          	}
          }
          
      • Streams: Streams are asynchronous, permanent, channels between two threads. Main points about streams:
        • They are declared as “streaming chan VarName”
        • They provide the highest data rates between threads.
        • Outputs to and inputs from streams take one instruction to complete as long as the channel buffer is not full.
        • Unlike transactions, streams can be processed concurrently, creating multi-threaded pipelines.
        • Ex:
          port dataIn = XS1_PORT_8A;
          port dataOut = XS1_PORT_8B;
          streaming chan s1, s2;
          
          par {
          	receiveData(dataIn, s1);
          	processData(s1, s2);
          	sendData(dataOut, s2);
          }
          
      • Parallel Replication: a variation on the par statement that permits running the same function or block of code across numerous threads. It uses a similar format to the for-loop statement in C, but substitutes the “par” word instead of “for”. Ex:
        chan c[4];
        int someData[4];
        
        par (int i=0; i&lt;4; i++) { 
        	runMyFunction(c[i], c[(i+1)%4], someData[i]);
        }
        
      • Services: Provide a means to communicate to external devices that implement xLinks, ex: FPGAs.
      • Interfaces: For people who are familiar with web services and data contracts, xC supports communication over predefined interfaces. XMOS Multicore Extensions to C Chapter 3 (pages 10-15) details the use of interfaces.
  • Clocks:
    • Clocks must be declared as global variables, and each initialized with a unique clock resource identifier.
    • To configure a clock rate call configure_clock_rate(clock clk, unsigned a, unsigned b) where “clk” is the clock to be configured, “a” is the dividend of the desired rate, and “b” is the divisor of the desired rate. The hardware supports rates of “ref” MHz and rates of the form (ref/2n) MHz where ref is the reference clock frequency and “n” is a number in the range 1 to 255 inclusive (copied from xs1.h).
    • To configure an output port as a clock port call configure_port_clock_output(void port p, const clock clk) where “p” is a 1-bit port, and “clk” is the clock block to output. If the port is not a 1-bit port, an exception is raised (from xs1.h).
    • To tie an output port to a clock port call configure_out_port(void port p, const clock clk, unsigned initial) where “p” is the output port, “clk” is the clock block, and “initial” is the initial value to output on the port. The port drives the initial value on its pins until an input or output statement changes the value driven.
    • Outputs are driven on the next falling edge of the clock and every port-width bits of data are held for one clock cycle. If the port is unbuffered, the direction of the port can be changed by performing an input. This change occurs on the falling edge of the clock after any pending outputs have been held for one clock period. Afterwards, the port behaves as an input port with no ready signals (from xs1.h).
    • To start a clock call start_clock(clock clk) where “clk” is the clock block to be put into running mode.
    • The ability to tie a clock port to an input/output port greatly simplifies the code for data input/output: ex:
      clock clk = XS1_CLKBLK_1;
      out port clkPort = XS1_PORT_1E;
      out port outPort = XS1_PORT_8A;
      
      configure_clock_rate(clk, 100, 1); //Drive the port at 100MHz?!
      configure_out_port(clkPort, clk, 0);
      configure_port_clock_output(outPort, clk);
      start_clock(clk);
      
      for(int i=0; i&lt;1000; i++)
      	outPort <: i;
      
    • Using an external clock source to drive a synchronous input port is very similar (page 45 of XMOS Programming Guide).
  • Timers: A timer is a xCORE resource with a 32-bit counter that is continually incremented at a rate of 100MHz.
    • Timers are declared using the keyword timer
    • Timers may be declared as local variables.
    • An input statement (:>) can be used to read the value of a timer’s counter.
    • Timers can be used to periodically perform an action using the when statement. Ex:
      timer t;
      unsigned int time;
      
      for(int i=0; i&lt;100; i++) {
      	t when timerafter(time) :> void;
      	time += XS1_TIMER_MHZ * 1000 * 1000;
      	printstr("This message is printed once a second \n");
      }
      

      Note in the above examples the time input from the timer is discarded in the loop by inputting it to void. Because the processor completes the input shortly after the time specified is reached, had we input the time to a variable, the input in the loop may actually increment the value of time by a small amount. This amount may be compounded over multiple loop iterations, leading to drift over time.

    • It is similarly possible to perform a periodic action inside a select statement. Ex:
      timer t;
      unsigned int time;
      
      for(int i=0; i&lt;100; i++) {
      	select {
      		case t when timerafter(time) :> void :
      			time += XS1_TIMER_MHZ * 1000 * 1000;
      			printstr("This message is printed once a second \n");
      			break;
      		// Insert cases to handle other events here ...
      	}
      }
      
  • Port Buffering: A buffer can hold data output by the processor until the next falling edge of the port’s clock, allowing the processor to execute other instructions during this time. It can also store data sampled by a port until the processor is ready to input it. Using buffers, a single thread can perform I/O on multiple ports in parallel. This decouples the sampling and driving of data on ports from a computation.
    • Buffered ports are declared by adding the keyword buffered in the port declaration:
      in buffered port:8 inP = XS1_PORT_8A;
      out buffered port:8 outP = XS1_PORT_8B;
      
    • A :x defines the width of the buffer for a port, in this case 8-bit.
    • Tieing a clock to a buffered port, will cause the port’s buffer to store input or output values until the next clock cycle and simplifies the input/output code.
    • By tieing more than one port to the same clock, it is possible to drive those ports synchronously related to that clock, making those ports behave like one larger port.
    • Ex:
      out buffered port p :4 = XS1_PORT_4A;
      out buffered port q :4 = XS1_PORT_4B;
      out port clkPort = XS1_PORT_1E;
      clock clk = XS1_CLKBLK_1;
      
      configure_clock_rate(clk, 100, 8);
      configure_port_clock_output(outPort, clk);
      configure_out_port (p , clk , 0) ;
      configure_out_port (q , clk , 0) ;
      start_clock ( clk ) ;
      
      p <: 0; // start an output
      sync (p ); // synchronise to falling edge
      
      for ( char c= ' A ' ; c <= ' Z ' ; c ++) {
      	p <: ( c & 0 xF0 ) >> 4;
      	q </:><: ( c & 0 x0F );
      }
      
    • The “sync()” function synchronizes the clock to the start of a clock period, ensuring the maximum amount of time before the next falling edge. This causes the processor to wait until the next falling edge on which the last data in the buffer has been driven for a full period, ensuring that the next instruction is executed just after a falling edge.
  • Serialisation and strobing:
    • Strobing is just another way of saying clocking, as in, a synchronous port like SPI.
    • A port can be configured to perform serialisation in hardware, useful if data must be communicated over ports that are only a few bits wide, and strobing, useful if data is accompanied by a separate data valid signal. Offloading these tasks to the ports frees up more processor time for executing computations.
    • A simple 8-bit SPI out-only port can be as simple as:
      out buffered port :8 outP = XS1_PORT_1A ;
      out port clkPort = XS1_PORT_1E;
      clock clk = XS1_CLKBLK_1;
      
      configure_clock_rate(clk, 100, 8);
      configure_port_clock_output(outPort, clk);
      configure_out_port ( outP , clk , 0) ;
      start_clock ( clk ) ;
      
      int x = 0xAA;
      outP <: x ;
      
    • By defining the width of the buffer of the port to be a 8-bits, while the port itself is one bit, will cause the port to use a shift register to transfer/serialize 8-bits of data at a time.
    • The same code can be used to deserialize data by only changing the buffered port to an in port instead and doing an :> (input) operation instead.
    • The XMOS Programming Guide provides a case study on how to implement a 100mbit Ethernet Media Independent Interface (MII) (pages 58-65)

Getting Started with XMOS’ Starter Kit

January 2, 2014 on 12:43 am | In Technology | No Comments

So, last week, I received the Xmos StarterKit I won during their recent StarterKit giveaway. Seeing the lack of simple tutorials that explain things and help get people started, I decided to take things into my own hands. I gathered every document I could find, started reading them, and took notes of the tidbits I found important and/or useful.

Hopefully, this will be the first post in a series of introductory posts about the XMOS StarterKit. So, stay tuned :)

So, What are XMOS’ XCore processors?

  • Eight-Core Multicore Microcontroller with Advanced Multi-Core RISC Architecture
    • 500 MIPS shared between up to 8 real-time logical cores
  • Each logical core has:
    • Guaranteed throughput of between 1/4 and 1/8 of tile MIPS
    • 16x32bit dedicated registers
  • 159 high-density 16/32-bit instructions
    • All have single clock-cycle execution (except for divide)
    • 32×32!64-bit MAC instructions for DSP, arithmetic and user-definable cryptographic functions
  • Programmable I/O:
    • Up to 64 General-purpose I/O pins, configurable as input or output (per XCore tile, depending on package)
    • 2 xCONNECT links
    • Port sampling rates of up to 60 MHz with respect to an external clock
    • 32 channel ends for communication with other cores, on or off-chip
  • Memory:
    • Up to 128KB internal single-cycle SRAM for code and data storage
    • 8KB internal OTP (One Time Programmable) for application boot code
  • Hardware resources
    • 6 clock blocks
    • 10 timers
    • 4 locks
  • JTAG Module for On-Chip Debug
  • 600mW at 500 MHz (typical)

There are a few other details like temp grade and the like, which I am omitting here for the sake of keeping things brief and simple. At the time of this writing, there is still no datasheet for the XCore chip used in the StarterKit. The info I listed above is a mixture of info from the datasheets of the XS1-U16A-128-FB217 and XS1-U8A-64-FB96

درس من كاما سوطرا – محمود درويش

September 30, 2011 on 4:01 am | In Technology | No Comments

Two days ago, I listened to this poem by Palestinian poet Mahmoud Darwish on youtube and haven’t been able to get it out of my haed since. Simply put, its epic.

بكأس الشراب المرصَّع باللازوردِ
انتظرها،
على بركة الماء حول المساء وزَهْر الكُولُونيا
انتظرها،
بصبر الحصان المُعَدّ لمُنْحَدرات الجبالِ
انتظرها،
بذَوْقِ الأمير الرفيع البديع
انتظرها،
بسبعِ وسائدَ مَحْشُوَّةٍ بالسحابِ الخفيفِ
انتظرها،
بنار البَخُور النسائيِّ ملءَ المكانِ
انتظرها،
ولا تتعجَّلْ، فإن أقبلَتْ بعد موعدها
فانتظرها،
وإن أقبلتْ قبل وعدها
فانتظرها،
ولا تُجْفِل الطيرَ فوق جدائلها
وانتظرها،
لتجلس مرتاحةً كالحديقة في أَوْج زِينَتِها
وانتظرها،
لكي تتنفَّسَ هذا الهواء الغريبَ على قلبها
وانتظرها،
لترفع عن ساقها ثَوْبَها غيمةً غيمةً
وانتظرها،
وقدَّمْ لها الماءَ قبل النبيذِ ولا تتطلَّع إلى تَوْأَمَيْ حَجَلٍ نائمين على صدرها
وانتظرها،
ومُسَّ على مَهَل يَدَها عندما تَضَعُ الكأسَ فوق الرخامِ
كأنَّكَ تحملُ عنها الندى
وانتظرها،
تحدَّثْ إليها كما يتحدَّثُ نايٌ إلى وَتَرٍ خائفٍ في الكمانِ
كأنكما شاهدانِ على ما يُعِدُّ غَدٌ لكما
وانتظرها،
ولَمِّع لها لَيْلَها خاتماً خاتماً
وانتظرها
إلى أَن يقولَ لَكَ الليلُ:
لم يَبْقَ غيركُما في الوجودِ
فخُذْها، بِرِفْقٍ، إلى موتكَ المُشْتَهى
وانتظرها!

My take of the iPad announcement

January 28, 2010 on 12:20 am | In Technology | No Comments

Well, it was finally announced today, the long rumoured Apple Tablet, the iPad. To be honest, I was quite disappointed. Mostly because of the price, but also by some of the underwhelming specs of the device itself (especially the iPhone OS part, lack of multitasking, and the continuation of the orphan button philosophy). But that’s not what this post is about.

The most interesting part for me in the whole announcement shebang is that the iPad runs an ARM powered system-on-chip (SoC) manufactured by Apple-owned P.A. Semi. I’ve wrote before about P.A. Semi when they announced their PWRficient processors back in 2007. The PWRficient was basically the only piece of silicon announced by P.A. Semi before being acquired by Apple in April of 2008. Now we’re hearing that the P.A. Semi designed SoC in the iPad around ARM architecture.

This, to me, is very significant in more than one way:
1- This is the first piece of silicon designed by Apple. This is a significant strategy shift from Apple that will go beyond the iPad. Entering the silicon design business is certainly no cheap endeavour and if Apple is to recuperate its investments it will have to sell a hell lot of chips to justify the investment.
2- Apple chose to ditch P.A. Semi’s PowerPC ISA in favour of ARM. Licensing an ARM core is certainly no cheap thing.
3- The SoC in the iPad also includes an OpenGL ES 2.0 Graphics Processing Unit (GPU) licensed from Imagination Technologies (which confirmed Apple as a licensee back in December 2009), another not-so-cheap thing.

These three notes amount to a significant investment from Apple in a market that is VERY competitive. ARM said that in 2009 there were 1.1 Billion devices shipped with an ARM powered processor. The profit margins for such SoC’s are already very low, and unless you manage to ship very high volumes it just won’t be profitable.

My bet is the next iPhone/iPod Touch will run the same Apple SoC we just saw in the iPad. Probably tweaked a bit for lower power consumption to cope with the stringent power requirements of a phone. This in turn, raises another question. What other changes will the next iPhone bring to make it the next hot item that everyone wants to get (including current iphone owners)? Technically, apart from the clock speed bump, the Apple A4 is very similar to the SoC in the current iPhone 3GS, so that won’t be enough to justify upgrading for must current 3GS owners. 4G is nowhere near seeing wide deployment this year, let alone having low power chipsets that are suitable for such phones, so it won’t be that. WiMax doesn’t have large market penetration, so that neither. CDMA is mostly irrelevant outside the US (and a few Asian countries, where the iPhone doesn’t enjoy the success it enjoys in the US and Europe). So, what will the next iPhone have to offer that will best the current 3GS???

But lets assume Apple manages to cram something that will make the next iPhone the must have item that the previous ones were, even if we assume 20M units sold annually, I don’t think it will be enough to justify the investment that went to develop this chip in house, not to mention manufacturing costs, instead of sourcing it from someone else (like Samsung, as in the previous iPhones). I doubt the iPad will see any success similar to what the iPhone is enjoying in terms of sales. The Apple A4 will have to go in many more high volume devices to be able to compete, in terms of cost, with sourcing similar offerings from Samsung, TI, or Marvell. With Apple being Apple, I doubt they’ll sell the chip to other companies for use in 3rd party devices.

So, what other Apple devices will we see that will use the Apple A4 to justify all the money that went into making this chip???

Reblog this post [with Zemanta]

كاظم الساهر – سلام عليك على رافديك

June 7, 2008 on 11:34 am | In Technology | No Comments

Does it really need any additional comments?

Next Page »

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^
Free website monitoring service