Contents
For the TIOBE index 016:[10]: «xC programming».
Observe that [[combinable]]
, as well as [[combine]]
are xC keywords. Combined, as in the title, is not. I assume that much of xC’s terms are known.
Fold handling with Collapse-O-Matic plugin
I am using Collape-O-Matic (here) on many pages.
Collapse All |
Typical fold
Intro
For a quick jump into this, you may want to read the Stop (after ran until..) chapter-paragraph first. I guess it puts the whole note in context, as an in medias res, in the double meaning of it.
Deadlock is when there is a communication cycle between concurrent tasks. It may be between two tasks in a small cycle (like a handshake), or several tasks in a larger cycle – also in a cycle. If I reason with two tasks: if they both want to communicate with the other at the same time, then (1) if that communication is synchronous, or (2) if that communication is asynchronous & buffered, but one of the tasks will in addition not proceed before there is a reply – then none will have any progress, and are locked up in this deadlock or deadly embrace. The synchronous communication deadlocks once the situation occurs on one of the sides, at «link level». It does, in the debugger. However, the asynchronous will deadlock after the sw has been shipped, kind of. Exaggerated? A little.
Other tasks may be running, but as soon as they have anything to communicate with those deadlocked, they too, will freeze.
Deadlock is also seen if a pair of semaphores are used incorrectly. Two tasks have correctly acquired a semaphore (T1.sem1 and T2.sem2), but when they then ask for the one that’s already held by the other task (T1.sem2 and T2.sem1) – it’s all over. This may happen.
This also goes for mutexes. Plus locks.
These are some of the reasons why people think multi-threading is difficult. But doing bookkeeping of not related states isn’t simple either. It can lead to the same kind of complex problems, where a state should have been set but the code that’s supposed to set it isn’t run in that state.
For me, concurrency has always made things simpler. Even considering the matter discussed here.
It is a lock with a key that’s not reachable that I think may happen to your xC code when mutually communicating tasks are bundled up on the same logical core (we use just «core» here) with a [[combinable]]
construct.
This is what this blog note is about. Here is some background stuff:
- CPA 2018 fringe – Unravelling xC concepts
[[combine]]
,[[combinable]]
,[[distribute]]
,[[distributable]]
and[[distributed(..)]]
pluspar
and on.. - xC is C plus x. The References chapter there lists all the xC stuff I have found
- My XMOS pages. As mentioned on the top here
Black box investigations
This note contains black box investigations. From the outside, from what I see.
I really don’t know what’s inside. I don’t know what goes on at XMOS or inside their tools. Like the xC compiler and the rest of the xTIMEcomposer tool suite. I am not at all knowledgable on the xC instruction set (141:[xCore instruction set]).
Therefore some of this is on the limit to philosophy or frankly, speculation.
However, by doing this, and by discovering where I some time get it wrong, I learn. In that case, I update.
The error messages from this system, as I have managed to make them pop up, may be seen at 141:[Messages] – which also is a black box encounter.
You may still learn from these attempts, since I certainly have. While we wait for XMOS to reveal that inner light, somewhat visible on the picture.
I must still hurry to thank XMOS for pointing out the matters that caused me to write this note, in an update on a ticket I issued on some strange behaviour I had observed. This was not my first ticket related to such a problem – which makes that Jan2021 feedback the more valuable.
This note, as all of my notes, follow a kind of created while on the run, having a narrative style. See Style disclaimer. I guess, when it’s finished, maybe then would be the time to write it all over again, with (perhaps) another structure. But then I’d be off with a new project. Most of them cause a new note, but of course, not all.
Aside: I made the black box from some black paper board. Then a stained glass Filter in Adobe Photoshop Elements. I liked it because it had a virtual light source.
I knew I has struggled with this before
Here is main
of my aquarium code which now has controlled the light and the temperature for several years already:
Main of aquarium code
int main() { chan c_analogue; button_if i_buttons[BUTTONS_NUM_CLIENTS]; spi_master_if i_spi [NUM_SPI_CLIENT_USERS]; radio_if_t i_radio; i2c_external_commands_if i_i2c_external_commands [I2C_EXTERNAL_NUM_CLIENTS]; i2c_internal_commands_if i_i2c_internal_commands [I2C_INTERNAL_NUM_CLIENTS]; startkit_adc_acquire_if i_startkit_adc_acquire; lib_startkit_adc_commands_if i_lib_startkit_adc_commands[ADC_STARTKIT_NUM_CLIENTS]; port_heat_light_commands_if i_port_heat_light_commands [PORT_HEAT_LIGHT_SERVER_NUM_CLIENTS]; temperature_heater_commands_if i_temperature_heater_commands[HEATER_CONTROLLER_NUM_CLIENTS]; temperature_water_commands_if i_temperature_water_commands; par { on tile[0]: installExceptionHandler(); par { startkit_adc (c_analogue); on tile[0]: My_startKIT_ADC_Task ( i_startkit_adc_acquire, i_lib_startkit_adc_commands, NUM_STARTKIT_ADC_NEEDED_DATA_SETS); on tile[0]: System_Task ( i_i2c_internal_commands[0], i_i2c_external_commands[0], i_lib_startkit_adc_commands[0], i_port_heat_light_commands[0], i_temperature_heater_commands[0], i_temperature_water_commands, p_display_notReset, i_buttons, i_radio, c_irq_update, null, null, 0); on tile[0]: adc_task ( i_startkit_adc_acquire, c_analogue, ADC_PERIOD_TIME_USEC_ZERO_IS_ONY_QUERY_BASED); on tile[0]: Port_Pins_Heat_Light_Task ( i_port_heat_light_commands); } on tile[0]: { [[combine]] par { Button_Task (IOF_BUTTON_LEFT, inP_button_left, i_buttons[IOF_BUTTON_LEFT]); Button_Task (IOF_BUTTON_CENTER, inP_button_center, i_buttons[IOF_BUTTON_CENTER]); Button_Task (IOF_BUTTON_RIGHT, inP_button_right, i_buttons[IOF_BUTTON_RIGHT]); } } on tile[0]: { [[combine]] par { I2C_Internal_Task (i_i2c_internal_commands); I2C_External_Task (i_i2c_external_commands); Temperature_Heater_Task (i_temperature_heater_commands, i_i2c_external_commands[1], i_port_heat_light_commands[1]); Temperature_Water_Task (i_temperature_water_commands, i_temperature_heater_commands[1]); } } on tile[0]: { [[combine]] par { RFM69_driver (i_radio, p_spi_aux, i_spi[SPI_CLIENT_0], SPI_CLIENT_0); IRQ_interrupt_task (c_irq_update, p_spi_irq, probe_led_d2, IRQ_HIGH_MAX_TIME_MILLIS); spi_master_3 (i_spi[0], p_sclk, p_mosi, p_miso, SPI_CLOCK, p_spi_cs_en, maskof_spi_and_probe_pins[0]); } } } return 0; }
[[combine]]
(coding tasks as [[combinable]]
or not) to actually get the code placed in the first place. Now I know better. (The full code is at My xC code downloads page at row #174.)
Yes, I know better. But even now, the communication structure of the above code example is quite complex. But at the application level, with xC used correctly, it doesn’t deadlock. The communication structure of the examples I show below is much simpler, even 100% regular.
Combined and communication on the same core
Hypothesis I
My wish is that someone on the xCore Exchange forum would say that I can just forget all my complications, because it is a simple as, «….». In the meantime, I’d have to stay with my black box reasoning, which some times feels like it’s from the inside of that black box.
When we have [[combined]]
tasks (one client
and one server
which may serve more than one client
) that thus share the same select
on the same core
, then that core
is locked during the interface
session. If all tasks share the same, single select
, this locking will stop the code from having any sort of progress. We have a deadlock, which is not seen before the code is attempted to be run. It will then run to the first global locking.
HW restraints
The xCore architecture has the following units implemented in HW – meaning there are not ∞ amounts of them:
tile
score
schanend
stimer
s- locks
port
s- RAM memory
- FLASH memory
Locks are not in the xC language. Thank goodness. But the compiler toolset uses them for the code, and there is a library. Having a limited amount of these units: you may still count up the number of timer
s, chanend
s and core
s to larger counts than the HW seems to contain. That’s because the tools are extremely smart and thus try not to use them when they see that such a unit is not needed. But XMOS won’t reveal much of the inner details. But I have a black box..
In this note I try to cover how the number of cores or not cores affect us, even before we start using [[combine]]. In fact, I guess that combinable tasks were invented to have me use more tasks than the single normal task that a logical core can run, by birth. And I can use this beautiful task/process abstraction, right out of the box. With the built in scheduler and me, alone. No big brother FreeRTOS if it is not really needed, for some other reason.
The topologies
Courtesy of
The topologies studied here all constitute a torus topology where every node is connected to all of its three neighbours, wrapping around, so there are no edgse. Courtesy of Dr. Larry (Lawrence) John Dickson, (CA, USA). We have been doing a sort of mail-based hobby project to try to find out how xC on xCore compares with occam on transputers. He has been feeding me with quite itchy occam related issues which I have tried to remap to xC and xCore. Some of the results are seen here, but it may also lead to a joint fringe presentation for IEEE-COPA 2021. The topology of a torus is so much better to relate to than, f.ex. my aquarium code.
Below is the PDF from which all of the later bitmaps have been generated from:
217_my_xc_combine_combinable_notes_oyvind_teig Fig.1 Download PDF hereTop_01 runs
TOP_01_HAS_1_TILE_002_NODE_T0_2N_RUNS because client and server run on one core each, not combined with any other task. (Later we shall see that also TOP_07_HAS_1_TILE_002_NODE_T0_2CN_RUNS.)
As explained above, number of chanend
s and timer
s used I have not paid any attention to here. I may, should i come back with a very large torus structure.
I will explain this topology in more detail than the rest. From the here log we see that:
Constraint check for tile[0]: Cores available: 8, used: 2 . OKAY Timers available: 10, used: 2 . OKAY Chanends available: 32, used: 4 . OKAY Memory available: 262144, used: 2916 . OKAY (Stack: 616, Code: 2030, Data: 270) Constraints checks PASSED.
Instead of showing this, which quantitatively lists up used resources, I will stay with the figure only, where individual cores are shown. The main
looks like this:
typedef interface conn_if_t { {pos_t, unsigned} do_io_server (const unsigned value_to); } conn_if_t; #define NUM_CONNS_PER_NODE 3 conn_if_t all_conns [NUM_CONNS_PER_NODE]; out buffered port:4 outP4_leds = on tile[0]: XS1_PORT_4F; par { Client_node_root (0, 0, all_conns[2], 0,0,2, all_conns[0], 0,0,0, all_conns[1], 0,0,1, outP4_leds); Server_node (0, 1, all_conns); }
The interface is a single «remote procedure call» (RPC) from the Client_node_root
to Server_node
, which just returns a value. When the client and server have had comms on all of their three connected interfaces lines, they will Calculate
a new value as the sum of the old plus all the three values received. The function of the «root» client is the same as those clients that are not root, except that the root exercise some LEDs on the board.
This is the smallest torus topology there is. One client and one server only communicate and wrap their interface lines at the edges and up/down (some column) and left/right (same row). All the other examples below follow the same topology.
In xC, tasks which communicate as defined in an interface typedef
must be a pair of client and server. Server-server or client-client are not allowed. (With chan
and chanend
the communicating tasks’s roles have to handled by the programmer. However, channels cannot be used between combined tasks, which we will use lots of here, see 141:[1].) However, a task using channels chanend
may be decorated with [[combinable]]
and compile, but using it as a combinable with [[combine]] par
is not allowed. Take a brief of the xC compiler’s messages (141:[Messages]) and you may grasp that using xC for concurrency you’d stand on giant’s shoulders. (But some times I still feel this takes the better of my balancing skills.)
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Invariants
- None of the tasks in any of these examples are changed for the different topologies. However,
- ..there are 4 different client tasks, but all call the shared
Handle_client
function - ..there are 3 different server tasks, but all call the shared
Handle_server
function - ..there are 2 different client and server tasks, that contain a composite
par
to start a client and a server node, as a pair. These are not considered a runnable tasks, since they only start other tasks - All runnable tasks have been written such that they are
[[combinable]]
, meaning that I can chose to use them as such, by adding[[combine]]
to thepar
closest to the start of the task inmain
, or not use[[combine]]
, when I would allow them to be treated (by xmap?) as not being[[combinable]]
. A[[combinable]]
task may also run alone on a core, but a standard task cannot be tagged as[[combine]] par
The differences have to do with the _root
that adds a LED port, and the _alt
that instead of having an array of [3]
interfaces, has one 1
single plus an array of [2]
.
There are these 12 defines to start the above client and servers with par
, where one of them is used per topology:
// LEGEND of config name, rather informally listed // '1_TILE' : 1 tile // '016_NODE' : 16 nodes (tasks) altogether // 'T0' : tile[0], then a list // 'T1' : tile[1], then a list // '2N' : 2 nodes on 2 cores, configured as normal task types // '2CN' : 2 combined nodes on 1 core // '4P' : 4 pairs of client and server // 'CC' : special config combined client and client on 1 core // 'CS' : special config combined client and server on 1 core // #define TOP_01_HAS_1_TILE_002_NODE_T0_2N_RUNS 1 #define TOP_02_HAS_2_TILE_016_NODE_T0_4P_T1_4P_RUNS 2 #define TOP_03_HAS_2_TILE_016_NODE_T0_2N_AS_CC_6CN_T1_8CN_RUNS 3 #define TOP_04_HAS_1_TILE_016_NODE_T0_2N_AS_CS_6CN_8CN_RUNS 4 #define TOP_05_HAS_1_TILE_016_NODE_T0_2N_14CN_RUNS 5 #define TOP_06_HAS_1_TILE_032_NODE_T0_2N_30CN_RUNS 6 #define TOP_07_HAS_1_TILE_002_NODE_T0_2CN_RUNS 7 // #define TOP_10_HAS_1_TILE_016_NODE_T0_16CN_STOPS 10 #define TOP_11_HAS_2_TILE_016_NODE_T0_2CN_12CN_T1_2N_STOPS 11 #define TOP_12_HAS_1_TILE_012_NODE_T0_1N_2CN_8CN_1N_STOPS 12 #define TOP_13_HAS_1_TILE_016_NODE_T0_8CN_8CN_STOPS 13 #define TOP_14_HAS_1_TILE_016_NODE_T0_2N_AS_SS_6CN_8CN_STOPS 14 #define TOP_15_HAS_1_TILE_016_NODE_T0_2N_AS_SS_6CN_8CN_STOPS 15 // #define TOP_03G_HAS_2_TILE_016_NODE_T0_2N_AS_CC_6CN_T1_8CN_NO_BUILD 103 #define TOP_03A_HAS_2_TILE_016_NODE_T0_2N_AS_CC_6CN_T1_8CN_STOPS 203 // #define TOP_20_OCCAM_LIKE_IS_FAIR 20
Basic client node task
I’m not going through client_context_t
here, nor the internals of Handle_client
. See the The code (download) chapter. (Update: I have added some pseudo code for this further down.)
Client_node code
[[combinable]] // void Client_node_alt no param "pos_z_1" // void Client_node_root_alt no param "pos_z_1", but this: // void Client_node_root added param "plus buffered port:4 ?outP4_leds" void Client_node ( const unsigned iof_row, const unsigned iof_col, client conn_if_t i_conn_0, const unsigned pos_x_0, const unsigned pos_y_0, const unsigned pos_z_0, client conn_if_t i_conn_1, const unsigned pos_x_1, const unsigned pos_y_1, const unsigned pos_z_1, client conn_if_t i_conn_2, const unsigned pos_x_2, const unsigned pos_y_2, const unsigned pos_z_2) { client_context_t context; init_value (iof_row, iof_col, context.vals); debug_print ("(%u %u):C[%u] = %u.%u.%u - %u.%u.%u - %u.%u.%u\n", iof_row, iof_col, context.vals.value, pos_x_0, pos_y_0, pos_z_0, pos_x_1, pos_y_1, pos_z_1, pos_x_2, pos_y_2, pos_z_2); // un-const ok, since I don't change them anyhow: context.iof_row = iof_row; context.iof_col = iof_col; context.cycle_cnt = 0; context.iof_server = 0; context.time_tmr :> context.timeout_ticks; // AFTER now = immediate context.time_previous_ticks = context.timeout_ticks; while (1) { select { case context.time_tmr when timerafter (context.timeout_ticks) :> void : { Handle_client (i_conn_0, i_conn_1, i_conn_2, context); } break; } } } // Client_node
Basic server node task
I’m not going through server_context_t
here, nor the internals of Handle_server
. See the The code (download) chapter. (Update: I have added some pseudo code for this further down.)
Server_node code
[[combinable]] // void Server_node_alt has params dim-1 interface plus dim[2] // void Server_node_timeout same params as this: void Server_node ( const unsigned iof_row, const unsigned iof_col, server conn_if_t i_conns[NUM_CONNS_PER_NODE]) { // dim[3] server_context_t context; init_value (iof_row, iof_col, context.vals); debug_print ("(%u %u):S[%u]\n", iof_row, iof_col, context.vals.value); // un-const ok, since I don't change them anyhow: context.iof_row = iof_row; context.iof_col = iof_col; context.cycle_cnt = 0; context.num_neighbours_seen = 0; context.time_tmr :> context.time_now_ticks; context.time_previous_ticks = context.time_now_ticks; while (1) { select { case i_conns[unsigned iof_connection].do_io_server (const unsigned value_to) -> {pos_t pos_return, unsigned value_from} : { context.iof_client = (iof_connection % NUM_CONNS_PER_NODE); {pos_return, value_from} = Handle_server (value_to, context); } break; } } } // Server_node
Basic composite par task
As already mentioned the _alt
tasks have arrays of connections conn_if_t
that suit the pair thinking, where a box around a pair will have one interface in each of the four directions, covered by an array of just [2]
interfaces. See fig TOP_01 RUNS. But then we need one internal interface as well, because both clients and servers need an array of [3]
interfaces (or each and one of them picked out, or 1
single plus a pair of [2]
).
This code is only used by TOP_02_HAS_2_TILE_016_NODE_T0_4P_T1_4P_RUNS.
Client_Server_pair_root_task code
// void Client_Server_pair_task not param "outP4_leds" void Client_Server_pair_root_task ( const unsigned iof_row, const unsigned iof_col, client conn_if_t i_conn_hor, // Left or right const unsigned pos_x_0, const unsigned pos_y_0, const unsigned pos_z_0, server conn_if_t i_conns_server [NUM_CONNS_PER_PAIR], // DIM 2 const unsigned pos_x_1, const unsigned pos_y_1, client conn_if_t i_conn_ver, const unsigned pos_x_2, const unsigned pos_y_2, const unsigned pos_z_2, out buffered port:4 ?outP4_leds) { conn_if_t i_conn_internal; // DIM 1, sum DIM 3: hidden \i_conn_internal par { Client_node_root_alt ( iof_row, iof_col, i_conn_hor, pos_x_0, pos_y_0, pos_z_0, i_conn_internal, pos_x_1, pos_y_1, i_conn_ver, pos_x_2, pos_y_2, pos_z_2, outP4_leds); Server_node_alt ( iof_row, iof_col+1, i_conn_internal, i_conns_server); } } // Client_Server_pair_task
Top_07 runs
This is «out of sequence» 01 → 07 because I only saw that I needed to make this quite late, placed it here, but didn’t want to renumber the others.
TOP_07_HAS_1_TILE_002_NODE_T0_2CN_RUNS because both client and server never communicate out of their shared core. (Earlier we saw that also TOP_01_HAS_1_TILE_002_NODE_T0_2N_RUNS.)
I must admit that arguing why TOP_01_HAS_1_TILE_002_NODE_T0_2N_RUNS I was rather excited to find out whether this one ran. I introduced it after I had made all the other topologies, but I still I wasn’t certain.
Why does this run when TOP_13_HAS_1_TILE_016_NODE_T0_8CN_8CN_STOPS (TOP_13) stops?
I assume that the compiler (or xmapper) tool investigates the connections and see that both ends are on the same core. All ends. I further assume that some locking will then not be necessary, since everything run in the same, combined select
.
Aside: This reminds me of the SPoC (Southampton Portable occam Compiler in the early nineties) (search for it, like here, plus I have mentioned it in several notes because it made such a lasting impression on me) which flattened out the occam process hierarchy down to individual while (1)
loops that were scheduled by a scheduler. I’ve done the same on several occasion, like at New ALT for Application Timers and Synchronisation Point Scheduling. I don’t know how xC combines into one select, whether it really is «one select», or of it’s more complex than that. Please tell me!
This config then has less concurrency than the above top_01 config. I have discussed some of this before I started working with xC: How much concurrency? Maybe I should update it with an xCore/xC chapter.
The oscilloscope picture shows the four LEDs that are exercised by the root. The top goes when the root client has talked with all of its three servers. The three LEDs below is one for each server communicated with. There is exactly 1 ms between each. This is handled with the timing mechanisms in both the clients and the servers.
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_02 runs
TOP_02_HAS_2_TILE_016_NODE_T0_4P_T1_4P_RUNS because clients and servers run on one core each, not combined with any other task.
This is perhaps the placement that’s closest to the intention of the architecture. Most real-time systems don’t have more than 16 tasks. Let them be normal type, with full freedom to include code after the all the select case
s (but still inside the while(1)
) for common handling for all cases – which combinable tasks be theory cannot do. 2*2 cannot be 5, kind of. It’s easy to, below the hood, combine several selects into one, since they are all triggered on events.
This is the only configuration that start Client_Server_pair_root_task
and Client_Server_pair_task
. I have tried to show this in the red and green in-fills, shown as four pairs on each tile. This is also seen in how I have drawn the inner par
s:
Client+Server pair: par Client core[..] Server core[..]
xC is thus able to start parallel tasks in a sub-task.
However, these subtasks cannot be [[combinable]]
. Trying to make them so gives this error message:
error: combinable function must end in a `while(1){select{..}}' or combined `par' statement
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_03 runs
TOP_03_HAS_2_TILE_016_NODE_T0_2N_AS_CC_6CN_T1_8CN_RUNS even if 14 of the 16 tasks share cores. But two of the tasks run on a core each. I think what saves this situation is this fact, plus the communication pattern.
The communication situation goes like this:
217_top_03_comms Fig.2 Download PDF hereThe interface conn_if_t
constitute a blocking (atomic) remote procedure call (RPC):
tile[0] core [1]
one task calls 3 (two tasks) and 4 (one task)tile[0] core [0]
–«–tile[0] core [2]
is only called: by all the others: 1, 2 and 4tile[1] core [0]
calls 3, is called by 1 and 2
I am not sure where this takes me. Maybe that 1 and 2 who have one core each, will be able to call 3 whenever they need to, since 3 is not locked. The two «external» clients on 4 will also call 3, but will not stop or be stopped by 1 or 2, since all calls are atomic. 1 will also call the servers on 4 ok, and not be stopped, since they have the server role and the nodes having the client roles on 4 will never have caused any lock of 4.
Maybe that rationale is more about application level protection of the cores than considering the hw locks of the cores. If I could look inside the black box, this could be modelled. Even if the XMOS toolset runs through a lot of rules (ref. their error messages), I don’t think they do any formal verification.
I have also not studied the fact that the communication between 4 and the is across the communication fabric router between the two tiles.
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Fairness
This chapter is an aside, it’s not got to do with the problem of [[combinable]]
with [[combine]]
and whether the code runs or not.
Fairness is when a server listens on a set of interface connections and, provided that the connected clients are equally eager to use the server, the clients do get the access to the server in a «fair» fashion. Not one of the clients should be able to use the client any more than the other.
I have discussed this in Nondeterminism – have a look above that ref. as well.
Top 03 is not fair
When I came to the TOP_03 and was going to describe it thoroughly I discovered something that looked like an error. I had to take a break and use a couple of days, just to discover something I hadn’t seen before.
This log shows that TOP_03 is not fair! This is not an error, this is just how it is! (Or how I have coded it.) There is no requirement in the paradigm of the xCore or xC which would guarantee this. I don’t know this, but it would be very strange if this were not so.
Of all of the code presented here, there is none that is fair in theory! Except the Top 20 is fair.
217_top_03_log_not_fair Fig.3 Download PDF hereOver the years I have been surprised as to how fair xC code actually has been. If I look at longer logs than the one above I see that the neighbours on the same tile[0]
as (1 3)
, namely (1 0)
and (1 2)
get the server equally much (it looks very fair between them), but the neighbour on the other tile[1]
, namely (2 2)
get served about half as much. Like 400, 400 and 200 times. This is so interesting, and it’s only explainable by XMOS, I assume. It may have to do with the tile crossing, it may not.
There is no mechanism here to guarantee that the clients should move in concert. Or, the orchestra of clients with servers loosely connected between them, has no conductor. If each server is a bucket, we know that each client fills up ⅓, but how often an eager client may fill its ⅓ is not defined. The reason is that the servers just count the number of times a bucket is filled by ⅓, when 3 fillings have been done, it say «ok, I’m full». But in theory all the thirds could be from one client.
Now, could this be solved in any way?
Extreme unfairness
I had not tested with DELTA_TIME_US 0 before on the TOP_08 version. I assume that all the topologies here will become extremely unfair (except Top 20 is fair). It’s simply the architure. See TOP_08_HAS_1_TILE_016_NODE_T0_2N_AS_CC_6CN_8CN_RUNS.
How to make code fair
The original code that this architecture stems from was written in the occam language (ref. Larry Dickson, above). Occam, until its demise, ran on transputers only. (Aside: xC is related to occam, the xCore architecture is related to the transputer, the XMOS company is related to the INMOS company (ex-inmos = xmos?). The common factor has to do with Bristol in the UK and some reuse of very good people (like David May) with a topping of new, smart people. Plus, I guess, some new money and some years from the past, which takes us to the present. And then, a hope that their fates will not be related) All tasks in occam (they were called processes) were normal tasks, in the xC idiomatic jargon, and timers and chanends were not HW resources per see. A transputer had one core (several needed to be connected to make them multi-core), and it was not made to make the code deterministic with ns or µs timing guarantees straight out of the box, like the xCore. In that context it’s easier to use lots of processes in occam. Like this:
Some pseudo code
Below are 6 occam processes started in parallel, three inputs and three outputs. When all (inputs and outputs) are done, a new set of Calculate
is done.
Aside: Observe that if all the channel outputs only are indented one more, we’d have three processes which would do inputs first, then outputs: our «server». Vice versa, with the outputs first and the inputs indented, we’d have – our «client». This is what I do in the Top 20 is fair chapter (below).
But the below pseudo code is the general occam case. Observe that this general case will require 6 processes per node, whereas the client/server solution just mentioned, would have 3 processes per node. But the below will indeed synchronise on 6 comms and then Calculate
. Fairness is not an issue. This just runs like a clockwork, which won’t deadlock. All six parallel tasks can run at any time they want, no sequence required. When all the 6 comms have finished then Calculate
is done:
-- Fair by design -- occam pseudo code PROC node (x-y-pos and 6 chans here) -- locals and init WHILE TRUE SEQ loopnum := loopnum PLUS 1 PAR in_0 ? vals[0] // input out_0 ! val // output in_1 ? vals[1] out_1 ! val in_2 ? vals[2] out_2 ! val val := Calculate(val,vals[]) :
I’d say that there really is no xC version of the above code. See the Top 20 is fair chapter.
What is true to say is that, with xC we would, as listed in a previous chapter, be limited by the HW resources available. That’s really what this whole note is about.
I have shown the of my xC code proper earlier, but here is an attempt at a some xC pseudo code of the basic topologies I have shown in this note. I have placed it here to pinpoint some of the problems with this code:
// Fairness not guaranteed // xC pseudo code [[combinable]] void Client_node (x-y-pos and 3 interfaces here) { -- locals and init while(1) select case timerafter (1 ms) : { ret_val[ix] = i_conns[ix].do_io_server (val); // Returned from the above do_io_server "RPC call" with Server_node ix++; if (ix == 3) { ix = 0; val = Calculate(val,vals[]); } else {} } } } } [[combinable]] void Server_node (x-y-pos and 3 interfaces here) { -- locals and init while(1) select case i_conns[ix].do_io_server(val) -> ret_val : { // "RPC call" from Client_node vals[ix] = val; ix++; if (ix == 3) { loopnum = 0; ret_val = Calculate(val,vals[]); } else { ret_val = val; } wait (1 ms); } break; } } }
I am only counting having received on three interfaces, there is no gatekeeper of which interface connection I have received on. That’s «the problem» with the code. That’s why I still would consider it «surprisingly» fair, at least when there is some delay between each running, like the standar 1 ms.
xC by design as fair
Top_20 is fair
TOP_20_OCCAM_LIKE_IS_FAIR. This code runs like a dream. I will start with showing the code of one of the tasks:
void Client_node_3_composite_par ( const unsigned iof_row, const unsigned iof_col, chanend chan_0, chanend chan_1, chanend chan_2) { node_context_t context; Init (iof_row, iof_col, context); DEBUG_PRINT_BANNER (context, CLIENT_STR, "Client_node_3_composite_par"); while (1) { // Client_node_3_composite_par par { { // subtask 1 chan_0 <: context.value; chan_0 :> context.values[0]; } { // subtask 2 chan_1 <: context.value; chan_1 :> context.values[1]; } { // subtask 3 chan_2 <: context.value; chan_2 :> context.values[2]; } } Node_Calculate (context); DEBUG_PRINT_VALUES (context, CLIENT_STR); } }
Since channels cannot be used between combined tasks, we use up much of machine here. (On each tile: 6 cores of 8, 6(!) timers of 10 and 7 chanends of 32, with -O2 optimalisation). But it’s very nice to show the potental in xC, as compared to occam.
The communication pattern is seen in the below figure. The result af this pattern is that all the four tasks run synchronised, just like four cogwheels (toothwheels, gears). The synchronised part are the four yellow dotted lines. Communication is along the red dotted lines, from the client to the server, and back again. For each task there are the three subtasks (each of them use a full core) that do this communication. The whole point of this code, as compared to all the other topologies here, is that this communication is done without any sequence. We’ve already done that, where the clients send out on three connections (0 to one server, then 1 to another server, then 2 to yet another server) and the servers listen to any one of them (0 from one client, or 1 from one client or 2 from one client). «From one client» means we don’t know from which, when things are not fair. This is fine until the server sees something on 0 then 1 then 0 again. That’s not fair since 2 is left out in the shadow. Then, on the next rounds it’s hard to get it averaged correctly. We cannot rely on such averages. The below communication pattern solves it, once a connection has been done it’s not possible to repeat it before all have done their three comms. It’s better than that: it doesn’t matter which sequence the 6 comms happen. Any sequence is fine. But now each of them is done only once. This pattern is very possible in occam on transputers, and it’s possible for this test case also on an xCore. But it certinly doesn’t scale.
217_top_20_occam_type_tasks_comm_pattern Fig.5 Download PDF hereA complete round trip of this, without printing or delay I measured to 1.3 µs. This includes starting and stopping of 12 tasks, 12 communications (6 back and forth, they are independent) plus some 32 bits integer calculation and looping. I think this is much faster than in occam, even if the transputer 30 years ago was blizzingly fast. (With more calculations and float math, the round trip is 8.7 µs. Floats are through a builtin library, so I must say that i am impressed.)
Now, how can I be sure that this works as expected? The code I started with had delay in each of the four tasks, as do the rest of the code here. (They need this, to make them as fair as possible.) I removed the delay in 3 of the 4 tasks here, and left it in task [0,0], the root. Now, if I made the delay there 1 sec I should see this in my log. I did! So it works. Here is the start of that log:
// The print code first: { time32_t time_now_ticks; time_tmr :> time_now_ticks; debug_print_extra ("(%u %u):c[%5u] from [%5u][%5u][%5u] at %u ms\n", iof_row, iof_col, value, rx_val[0], rx_val[1], rx_val[2], time_now_ticks/XS1_TIMER_KHZ); } // Then the log: (0 1):S[2] Server_node_3_composite_pars (1 0):S[3] Server_node_3_composite_pars (1 1):C[4] Client_node_3_composite_pars (0 0):C[1] Client_node_3_composite_par_root (1 1):c[12] from [3][3][2] (0 1):s[8] from [4][1][1] (1 0):s[12] from [1][4][4] (0 0):c[8] from [2][2][3] at 379 ms (1 1):c[44] from [12][12][8] (0 1):s[36] from [12][8][8] (1 0):s[44] from [8][12][12] (0 0):c[36] from [8][8][12] at 1379 ms (1 1):c[168] from [44][44][36] (1 0):s[168] from [36][44][44] (0 1):s[152] from [44][36][36] (0 0):c[152] from [36][36][44] at 2379 ms
I also have included seeing the LED on the xTIMEcomposer’s XScope. It’s in the sources and the config.xscope
file. It’s offline mode since the value is DISCRETE
(else a pulse is seen as a sawtooth) so the log being made my the debugger during running is in xscope.xmt
. When I stop the debugger the XScope is automatically started. There is an XMOS application for this called AN00196 Getting started with real time xSCOPE in xTIMEcomposer Studio, plus XScope described in 141:[12]. Also see 143:[Analysis and xSCOPE].
This code is at top_20_occam_like_xc.h. I have pragmatically moved the code into an .h
file to keep it out of the main .xc
file. More at The code (download).
Using the protocol as guarded
I have not been able to persue this to having running code. There is some problem with [[guarded]]
.
In order to make the xC code 100% fair by design we could add a switch at each server input. We need three «switches». They are the guards of CSP (both occam and xC have them). They are logical expressions, when true the «switch» is closed and the comms is allowed. When false any sender trying to enter will simply block. («OK» blocking, see Not so blocking after all.) In xC we will have to add [[guarded]]
to the interface definition, like this:
// Fair by design typedef interface g_conn_if_t { [[guarded]] {pos_t, unsigned} do_io_server ( const unsigned value_to, const pos_t pos, const unsigned call_cnt); } g_conn_if_t;
Then, when we have received on an interface, we just set the corresponding guard to false. When all three have been received, we do Calculate
and set all three guards to true again. It’s basically only the server that needs to be modified.
The problem with my code is that the present xTIMEcomposer (14.4.1) does compile the code; but there’s an internal crash when it tries to place these nodes on the cores. My compiled, but alas, not tested code attempt is at top_03_guarded_xc.h. I have pragmatically moved the code into an .h
file to keep it out of the main .xc
file. More at The code (download).
The same problem exists for the next solution, alas. I have reported this into the XMOS ticket system.
Top_03G as distributable
Update 28Mar2021. For TOP_03G_HAS_2_TILE_016_NODE_T0_2N_AS_CC_6CN_T1_8CN_NO_BUILD
I have made a TOP_03G_AS_PARTLY_DISTRUBUTED==1
and then then decorated Server_node_guarded
as [[ditributable]]
, and placed the code like below, but alas, it didn’t help. The compiler still crashed in «line 183» (search for «183» here). xtimecomposer 14.4.1.
Top_03G as distribute placed code
par { on tile[0]: par { // Two clients one core each // [R][P][C] [R][P][C] [R][P][C] Client_node_guarded (0,0, all_conns[0][1][1], 0,1,1, all_conns[0][0][2], 0,0,2, all_conns[3][0][0], 3,0,0); Client_node_guarded (0,2, all_conns[0][0][1], 0,0,1, all_conns[0][1][2], 0,1,2, all_conns[3][1][0], 3,1,0); [[combine]] par { // [R][P][C] [R][P][C] [R][P][C] Client_node_guarded (1,0, all_conns[1][1][1], 1,1,1, all_conns[1][0][2], 1,0,2, all_conns[0][0][0], 0,0,0); Client_node_guarded (1,2, all_conns[1][0][1], 1,0,1, all_conns[1][1][2], 1,1,2, all_conns[0][1][0], 0,1,0); Client_node_guarded (2,0, all_conns[2][1][1], 2,1,1, all_conns[2][0][2], 2,0,2, all_conns[1][0][0], 1,0,0); Client_node_guarded (2,2, all_conns[2][0][1], 2,0,1, all_conns[2][1][2], 2,1,2, all_conns[1][1][0], 1,1,0); Client_node_guarded (3,0, all_conns[3][1][1], 3,1,1, all_conns[3][0][2], 3,0,2, all_conns[2][0][0], 2,0,0); Client_node_guarded (3,2, all_conns[3][0][1], 3,0,1, all_conns[3][1][2], 3,1,2, all_conns[2][1][0], 2,1,0); } } on tile[0]: [[distribute]] Server_node_guarded (0,1, all_conns[0][0][0], all_conns[0][0][1], all_conns[0][0][2], 0); on tile[0]: [[distribute]] Server_node_guarded (0,3, all_conns[0][1][0], all_conns[0][1][1], all_conns[0][1][2], 1); on tile[0]: [[distribute]] Server_node_guarded (1,1, all_conns[1][0][0], all_conns[1][0][1], all_conns[1][0][2], 0); on tile[0]: [[distribute]] Server_node_guarded (1,3, all_conns[1][1][0], all_conns[1][1][1], all_conns[1][1][2], 1); on tile[0]: [[distribute]] Server_node_guarded (2,1, all_conns[2][0][0], all_conns[2][0][1], all_conns[2][0][2], 0); on tile[0]: [[distribute]] Server_node_guarded (2,3, all_conns[2][1][0], all_conns[2][1][1], all_conns[2][1][2], 1); on tile[0]: [[distribute]] Server_node_guarded (3,1, all_conns[3][0][0], all_conns[3][0][1], all_conns[3][0][2], 0); on tile[0]: [[distribute]] Server_node_guarded (3,3, all_conns[3][1][0], all_conns[3][1][1], all_conns[3][1][2], 1); }
Extending the protocol to an asynch session
I have not been able to pursue this to having code that runs longer than to some point.
This solution is probable not possible, since I now think that [[notification]]
and [[clears_notification]]
are designed for single client usage only. See [[notification]] and [[clears_notification]] only for single client usage? on xCore Exchange. If so, you don’t have to read on.
Here is another attempt. I have extended the interface such that a client will not do any new interface «call» before a server has acknowledged having received all three interface calls, one from each client. A server will receive the three do_io_server
calls and none of the clients will (by coding pattern) try to snatch another ⅓ bucket filling. However, when the bucket is full, the server will notify this with an all_clients_seen
notification to all clients, and each client then clears the notification with a get_result
call. When a client has gone through three complete sessions we have in effect controlled that there is one and only one from each client for each server. Below is the interface definition.
// Fair by design typedef interface a_conn_if_t { void do_io_server ( // [[guarded]] here not necessary and crashes compiler const unsigned value_to, const pos_t pos, const unsigned call_cnt); [notification]] slave void all_clients_seen (void); [[clears_notification]] {pos_t, unsigned} get_result (void); } a_conn_if_t;
In this case both the client and the server nodes need a rewrite. My compiled, but alas, not tested code attempt is at top_03_asynch_xc.h. I have also moved this code into an .h
file to keep it out of the main .xc
file. More at The code (download).
Extending the protocol to being client-driven with a polling synch phase
Since the above I did find a solution! It’s completely client driven and then relies on polling to get in synch (fairness, synch step). The protocol’s interface
description goes like this:
typedef interface conn_if_t { temp_degC_t SET_GET_TEMP (const temp_degC_t temp_degC); all_clients_seen_t POLL_ALL_SYNCHED (void); } conn_if_t;
This is a CLIENT DRIVEN interface only, it therefore introduces polling (with POLL_ALL_SYNCHED
). In order avoid sliding of the tasks which are required to run in harmony, in step or being synchronized and fair (and by this ensure that no data is being sent and then later overwritten by a newer data set).
This polling solution should not have been necessary. There seems to be some kind of problem with xTIMEcomposer 14.4.1 (and earlier) where use of [[guarded]]
, [[notification]]
and [[clears_notification]]
is stopped by some internal compiler error when the OK compiled [[combinable]]
code (!) is placed on cores, as [[combine]] par
. That error is the «xcc1: internal compiler error:
«Failed in jenkins/RELEASE_14_4/sb/tools_xcc1_c_llvm/FrontEnd/Lowering/lower_combined_pars.cpp, line 183»
Polling typically is a «non-CSP» solution. With the synchronous and event-driven system, where each conditional choice (as select/case) may be tagged with a boolean «guard» (if evaluated to true the case switch «is on»), polling is never necessary.
However, the xC/xCore case is probably «the best» architecture in which to implement polling. Task context switch time either does not exist or is low (in the combinable case?), and communication is also very fast with no operating system overhead. And in this particular case all the nodes are supposed to run in step anyhow, all of them have to wait for the slowest. And the slowest is the root (0 0) which actually may introduce an active waiting, by ROOT_DELAY_UNTIL_NEXT_ROUND_US
. So the price isn’t terribly high. But using one of the mentioned mechanics will be retried when xTIMEcomposer is updated. I have reported this to XMOS on three occasions (since 2018).
SET_GET_TEMP
and POLL_ALL_SYNCHED
are now with capital letters, to try to make the code more readable. These are often visually and cognitively «hidden» in a long line.
I will write this out in a joint «fringe» presentation at IEEE-COPA 2021. But I already now (21Mar2021) present the code examples at TOP_26 code excerpts.
Topologies (cont.)
Top_04 runs
Back to discussing the problem of [[combinable]]
with [[combine]]
and whether the code runs or not.
This TOP_04_HAS_1_TILE_016_NODE_T0_2N_AS_CS_6CN_8CN_RUNS is not fair fro a few log lines at the start, but after this it’s actually fair. See fairness discussed above.
The further I write, or try to reason, the more assured I get that I really can’t tell why this one topology, as compared to another, will run and not stop. Here there is one server alone and one client alone, which sounds good. In TOP_03 there were two clients alone. Which makes me suggest this TODO: if I replace the one client in [0,0] with a server? I must admit that I don’t know. Stay tuned.
And why will TOP_12_HAS_1_TILE_012_NODE_T0_1N_2CN_8CN_1N_STOPS (TOP_12) stop when this one runs?
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_08 runs
TOP_08_HAS_1_TILE_016_NODE_T0_2N_AS_CC_6CN_8CN_RUNS.
I will write more about this on 2Mar2021.
DELTA_TIME_US 1000 Schedulings 2402 (0 0) (0 2) (2 0) (2 1) (2 2) (2 3) (3 0) (3 1) (3 2) (3 3) 3603 (0 1) (0 3) 4803 (1 1) (1 3) 6004 (1 0) (1 2) DELTA_TIME_US 0 Schedulings Never (0 0) (0 2) (2 0) (2 1) (2 2) (2 3) (3 0) (3 1) (3 2) (3 3) 7391 (0 1) (0 3) 14785 (1 1) (1 3) 22127 (1 0) (1 2)
Top_05 runs
TOP_05_HAS_1_TILE_016_NODE_T0_2N_14CN_RUNS. I guess that if TOP_04 runs, then why shouldn’t this? I’ve only crammed all other tasks into one core, instead of two. But those two having their own core, must have the same roles, I’d assume.
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_06 runs
TOP_06_HAS_1_TILE_032_NODE_T0_2N_30CN_RUNS. Next text coming here.. (28Feb2021?)There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_07 runs seen
See above: TOP_07 runs.
Stop (after ran until..)
When I started this note I though, rather naïvely and optimistically, that I should crack the secret of when (and when not) xC combined [[combinable]]
and [[combine]]
code goes to a halt. I like the analogy of a balance wheel with a broken spring (here), or jammed cogwheels (here). Even if it probably classifies as a deadlock or no progress, as well. I feel like this note gives ok examples, but I have had no progress on how to see (by the glimpse of anything, really) that my code wont’ thrive forever.
Now, if you want to shortcut all my words, then jump directly to All combined.
Is it easier to spot a stopping xC core
and tile
placement using par
(«topology» in this context) than placement of the same code which runs forever? That’s what I intend to find out in the following, having perhaps become not as naïve and optimistic as at the beginning:
Top_10 stops
TOP_10_HAS_1_TILE_016_NODE_T0_16CN_STOPS because all the cores are filled with clients that need to lock (the core it runs on?) to wait for the RPC call (being taken or returned from) do_io_server
which always share a core with a very client, and.. STOP! Neither I nor you nor any neighbour are able to proceed because the communication pattern gets to a deadly embraced cycle and then none is able to proceed.
I wonder if I can extract anything from a tabular view like the below?
tile[1] | tile[0] | ||
Client combined server on a core | … | Eight | |
Client alone on a core | … | … | |
Server alone on a core | … | .. | |
Missing | … |
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_11 stops
TOP_11_HAS_2_TILE_016_NODE_T0_2CN_12CN_T1_2N_STOPS because, since TOP_10_HAS_1_TILE_016_NODE_T0_16CN_STOPS (TOP_10), it does not help to move two tasks over to tile[1]
. Since TOP_10 stopped with tile[0]
alone, then tile[0]
will stop here as well, and the tile[1]
code is not enough to kick them all moving – since both tasks on tile[1]
have communication with tile[0]
. Was this self-referential arguing in a circle?
tile[1] | tile[0] | ||
Client combined server on a core | … | Six | |
Client alone on a core | One | … | |
Server alone on a core | One | .. | |
Missing | … |
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_12 stops
TOP_12_HAS_1_TILE_012_NODE_T0_1N_2CN_8CN_1N_STOPS because.. I don’t even want to argue, since TOP_04_HAS_1_TILE_016_NODE_T0_2N_AS_CS_6CN_8CN_RUNS (TOP_04) actually runs. Is it because TOP_04 is 4 * 4, where this is 4 * 3, causing a slightly different communication pattern?
tile[1] | tile[0] | ||
Client combined server on a core | … | Five | |
Client alone on a core | … | One | |
Server alone on a core | … | One | |
Missing | … |
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_13 stops
TOP_13_HAS_1_TILE_016_NODE_T0_8CN_8CN_STOPS because.. Why does this stop when TOP_07_HAS_1_TILE_002_NODE_T0_2CN_RUNS (TOP_07) runs? Is it because with one core it cannot lock itself up when it tries to communicate with itself, but when two cores communicate then this is a possibility? Of all the reasoning I’ve done here, maybe this is closest to any correctness analysis.
tile[1] | tile[0] | ||
Client combined server on a core | … | Two | |
Client alone on a core | … | … | |
Server alone on a core | … | .. | |
Missing | … |
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_14 stops
TOP_14_HAS_1_TILE_016_NODE_T0_2N_AS_SS_6CN_8CN_STOPS
tile[1] | tile[0] | ||
Client combined server on a core | … | Six | |
Client alone on a core | … | … | |
Server alone on a core | … | Two | |
Missing | … |
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
Top_15 stops
TOP_15_HAS_1_TILE_016_NODE_T0_2N_AS_SS_6CN_8CN_STOPS.
tile[1] | tile[0] | ||
Client combined server on a core | … | Six | |
Client alone on a core | … | … | |
Server alone on a core | … | Two | |
Missing | … |
There are more files. Even if they are not updated on every version of The code (download) they are certainly worth a look. The preprocessor output file here, the assembly file here and the map file here.
All combined
..but not all really combinable. It strikes me that the xTIMEcomposer toolset (14.4.1) does not have any error message telling me that the bottom row here will happen. There certainly are some «missing error messages» (chapter below). There may be a reason or two for it.
If anybody could extract the necessary info from this figure alone or from any of my other figures, fine. But I doubt that this is possible. I guess that after every build that has passed all the analysis that XMOS already does (reflected here), then it might be possible to run a formal analysis and verification of the build. I’ll suggest this to my contacts at the university here. But it’s for anybody who thinks this is interesting. Just feel free to mail me, if you have anything that you’d want reflected in this note. I’ll make sure to honour you.
217_my_xc_combine_combinable_notes_overview_oyvind_teig Fig.4 Download PDF hereThe code (download)
A fast look at the single code file is _xc_test_combinable.xc. (I’ve made it a .txt file so the browser may understand it better).
The full code and result from individual builds are at My xC code downloads page at row #217.
TOP_26 code excerpts
This is an aside in chapter Extending the protocol to being client-driven with a polling synch phase. I have placed this code as is on 20Mar2021 at (1) xc_test_Toroid_2.xc plus the server and client code at (2) code_out_of_project/top_26_xc.h. And the 4*4 topology is described at (3) 217_top_26.pdf.
Observe that is is an aside from the main theme of this blog note, which is about when [[combinable]]
placed as [[combine]] par
runs or not.
But code runs as standard tasks and combined. There are configs where it does not run (I think it has to do with speed), you should be able to see that documented in the code. Search for TOP_26_TESTING_SYNCH_PHASE
in file (1).
Missing error messages
I wrote this chapter before I wrote this note. And I won’t rewrite it after. Let it stand for what it is:
The fact that we may deadlock is a result of how combinable tasks are realised, under the hood. It may not be an error per se, but the user should absolutely be issued a proper error message when this may happen. But then, it depends either on analysing usage rules or doing some formal analysis, neither of which may be possible to implement in the toolchain.
I have a feeling that this error message would be in group with «error: distributed task with client and server connection to same logical core is not supported» (see xC is C plus x)
- Error: No progress (deadlock) between tile [0] and [1]. Not all tasks can be combinable
→ 2Feb2021: With certain interfaces and some certain communication patterns between xC tasks there may be a mutual waiting from the set of all tasks on one core on one tile, with the set of mutually waiting set of all tasks on one core on another tile. «Max combinable» to one core on both sides will in some situations deadlock. On the lower level it’s about only one select (=max combinable) on each tile that together need to make the system proceed. This even goes for deadlock free interface session patterns with client and server roles defined. Here ‘+’ crosses tiles: 8 + 8 tasks run on 1 + 1 core on 1 + 1 tile may deadlock, when the same code configured as 16 tasks combined to 1 core on one tile only will run. To make such a two-tile system proceed, there must at least be two independent cores running on one of the tiles. Thanks XMOS, for pointing this out to me! (14.4.1) - Error: No progress (deadlock) on a single tile. Not all tasks can be combinable
→ 8Feb2021. More coming (14.4.1)
TODO
Done
- Make a topology almost like TOP_04 with the servers at the beginning. Done, seeTOP_08_HAS_1_TILE_016_NODE_T0_2N_AS_CC_6CN_8CN_RUNS
- Check whether fairness property of all the topologies is dependent on the built-in delays in the client and servers. Done, also see TOP_08_HAS_1_TILE_016_NODE_T0_2N_AS_CC_6CN_8CN_RUNS
- Check whether
[[distributable]]
helps to avoid the «line 183» crash. It don’t. Done. See Top_03G as distributable
Further work
- Maybe it’s a good idea to check the code that stops after some time in the xTIMEcomposer simulator? Will the code show any different behaviour there?
References
Wiki-refs: Black-box testing. Deadlock