Multithreading theory...

ELCouz · Joined: 18 Jul 2007 Posts: 427 Location: Montreal,Quebec

Just wondering what is holding the low-end embedded market to have multiple cores?

I mean I know CCS have a RTOS (far from perfect) to deal with tasks but why we don't see any interest from Microchip or Atmel do develop let say a PIC18 with 2 cores?

I know just like PC threading... transferring,locking vars and synchronizing threads can be a PITA to debug but it's easy to implement.

Why we are still stuck at dealing with a single process with interrupts?

KISS principle in the embedded market?

Picture having 2 cores managing different pins at the same times with code synchronization once in a while.... Less intterupt colision (don't have to deal with priorities anymore)

Just saying...
_________________
Regards,
Laurent

-----------
Here's my first visual theme for the CCS C Compiler. Enjoy!

drolleman · Joined: 03 Feb 2011 Posts: 116

I talked to microchip a few years ago on this and they said there is not enough interest in dual cores.

It seems that most programmers can't get their head around multitasking. We all have multi cored processors but rarely does an application use multiple cores. Even applications that could take advantage of them don't like adobe and solid works are poor users of multi threading.

temtronic · Posted: Sat Jun 20, 2015 5:01 am

Simple answer.... How many would you buy ?? 1 million, 5 million, 20 million? As an OEM you can price your product based on the client's sales. If he buys 1,000,000 you can cut him a deal and design/build the product. If he only wants 5, eithe rthe price goes way up or you tell him to 'have a nice day'.
I've yet to have anyone truly need a multicore device where a PIC is used. That being said, if you really,really need it, then 'jump ship' and go to a 'PC' based tablet. For less than $50 you can have your 'multicore' brains WITH a lot of peripherals in a pretty package.
Think of a simple 4 function calculator product. There's is no way you could design/build one with a PIC for less than the ready made solutions coming from China.
To me the 'bottom line', is the bottom line. For Microchip(and others) the return on investment just isn't there.

Jay
BTW the Moto 6809(of CoCo fame) was a multitasking chip, you could use several at once....

ELCouz · Joined: 18 Jul 2007 Posts: 427 Location: Montreal,Quebec

asmboy · Joined: 20 Nov 2007 Posts: 2128 Location: albany ny

drh · Posted: Sat Jun 20, 2015 7:33 am

If you need multi-core..https://www.parallax.com/catalog/microcontrollers/propeller
_________________
David

Ttelmah · Joined: 11 Mar 2010 Posts: 19881

Multi-core, would not suit the PIC architecture.

The whole 'key' of the PIC low cost approach, is to get speed, by not needing cache or any similar system, instead using the Harvard architecture, so the fetches from data memory can partially take place at the same time as fetches from the program memory. To go multi-core, you would have to have an alternative way to handle this, or lose speed on the two cores....

Multi-core requires you to have high speed cache memory, and a separate system to handle the memory I/O. Result lots of real estate on the chip. Not the low gate count of a PIC....

The PIC suit's 'multi-processor' architecture, better than multi-core. Here you give two processors (or more), their own separate ROM spaces and RAM spaces, and have a high speed way of sending data between the processors, allowing each to do it's own 'job' independant of the other processor. Most of the 'old hands' here, have done this (you see it in the number of times this is suggested as a solution), adding extra PIC's to handle extra functions. Now using multiple completely separate processors like this is not as efficient as if MicroChip actually implemented this into a single unit, with a separate 'I/O' processor, linked to a main chip. I've actually done this in the past using a large FPGA (I've fairly often built "PIC's" into FPGA's, and on a couple of occasions have implemented multiple PIC's in the single array). Lovely thing with this is that it is easy to implement a really fast small area of 'linked' memory between the processors, and a semaphore between them. Now the last time I did this, I was using a 2 million gate array (largest then available), and adding extra RAM compared to the equivalent PIC's, and quite a few extra peripherals. I used external flash memory for the code, and implemented a linear address architecture with both ROM spaces in the one linear area, and took advantage of the sheer speed of RAM the array could handle to cache the fetches as needed. Over the years since then, sizes available have gone up by at least a factor of ten, and it'd be quite easy now to implement (say) a 20 PIC array to do something.

However the key thing really is that the PIC is not primarily a 'computational' device. It's not designed for doing lots of processing (if you want this look at chips better built for this). The name stood for 'Programmable Interface Controller'. Their 'target' is controlling interface lines, or re-formatting data, before it is fed on to other devices, or for relatively simple jobs. Trying to make it into a device for fast computation, is a bit like trying to use a nut to crack a sledgehammer.....

Funnily though, I've said before that the biggest single 'jump' in performance that could be done on the PIC, for a low number of gates, would be to add two or three sets of mirror registers for all the processor registers. Then when configuring interrupts, you could specify which register set is associated with the interrupt level. When an interrupt is called, all the primary registers are copied to the selected mirror set. The return copies them back. Result no need to save _any_ registers to service interrupts, and effectively single cycle latencies into the interrupt handlers. This existed in a limited way, in the Z80, nearly 40 years ago (single operations that swapped to a duplicate set), and it is still annoying that a 48MHz PIC, can only just match the interrupt latency of a 2.5MHz processor all that time ago. I implemented a version of the Z80 register swap into one of my FPGA PIC's, and this showed just how much of an improvement this could be....

Realistically, the PIC16/18/24/33 is not the processor to suit a multi core architecture. A lot would depend on what you actually want to 'do', but (for instance) the core processors used in most (even quite basic) video cards, can be used to provide things like 64 parallel threads, for simple operations. For a logical 'I/O versus processing' architecture, things like the TI Concerto, implement the I/O processor versus main processor approach.

ckielstra · Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands

jeremiah · Joined: 20 Jul 2010 Posts: 1392

Depending on your definition of low end MCU's, there are the propeller chips (not microchip):
https://www.parallax.com/catalog/microcontrollers/propeller

One of my coworkers uses those for a variety of small scale embedded projects. I think his has 8 cores (they refer to them as "cogs" in a lot of their documentation).

For PICS, I sometimes use protothreads. It's not true multithreading, but I do like the syntax involved as it is more threadlike. They are stackless though, so unlike regular threads, you have to keep track of your variables (either declare them static or pass them in as an environment variable). I posted a port of those in the code library for CCS, but I haven't extensively tested them as I tend to only use them for one off projects. If interested, google protothreads to get a better idea. The CCS port doesn't use the fancy trick they normally do, but the CCS version allows more freedom in usage too.

SherpaDoug · Joined: 07 Sep 2003 Posts: 1640 Location: Cape Cod Mass USA

Debugging multiple cores would be a nightmare! If you want to do multiple things at the same time just use multiple PICs. That way you can debug each one separately.
_________________
The search for better is endless. Instead simply find very good and get the job done.

drolleman · Joined: 03 Feb 2011 Posts: 116

For years when possible on the pc I used multithreading, I found it easier to debug large applications, not the other way around.

On the pic I created a os to handle the multitasking. it now is included in every project I do. I would love a internal coprocessor, it would not have access to ports and such. Just do number crunching. I do a lot of time critical apps, so I sometimes have to fit time intensive code in the middle of other functions. Or break apart functions because they may take up to much time, in a single function. sometimes it makes for ugly code.

In one job just saving the cost of a driver chips for 7 segment display saved 30k, in the first batch alone. so the driving of the display must not flicker while other tasks are done. It doesn't take much to make it pay to pack as much as possible in one chip. Yes if you are doing a run of a hundred units or so the component count is not as important. but the larger the run every resistor counts, let alone adding another pic that would be insane. The pic is a low end processor and suited for a particular market. If microchip wanted to expand, adding functionality like multicore could be an asset. Look at the pc market have they gone back to single core, no they are adding more cores / threads. It's all about adding functionality.

Thomas Watson's son almost sank ibm building the 360 general purpose computer.

Thomas Watson, president of IBM, 1943
"I think there is a world market for maybe five computers."

Ken Olsen, founder of Digital Equipment Corporation, 1977
"There is no reason anyone would want a computer in their home."

If you make it they will come.

david

Ttelmah · Joined: 11 Mar 2010 Posts: 19881

I agree multi-threading is a very good way to work. I suspect most of the experienced programmers here are multi-threading their code to some extent.

However you are missing the key point about complexity. It'd actually be a lot easier to implement multi-processors, than multi-cores.

Imagine a PIC, that has two processors. Single ROM space, with (say) the low 128K being the ROM for processor#1, and the next 128K being for processor#2. Each has it's own RAM. A single small block of 'shared RAM' exists (like the area used for USB I/O), and processor #2 has half a dozen interrupts that processor #1 can trigger, and vice versa.
Then just as with the #PIN_SELECT, I/O pins and peripherals, can be programmed to connect to either core. With the connection, comes the interface registers for the peripheral.

Result a fairly easy to use environment, with you being able to program each chip effectively separately (except that the first core has to do the initial configuration to set what is routed to the second). Data can be quickly sent from chip to chip (load the shared area, and trigger an interrupt). Each chip can be doing service to the peripherals it handles. If you wish, one could do all the I/O, while the other handles arithmetic.

Now this as an architecture could be done by the PIC fairly easily. It'd only involve effectively doubling the chip, and adding the extra interfaces/controls. Perhaps 2.5 to 3* the chip estate used by the single processor.

However trying to go multi-core with the PIC, is a completely different kettle of fish. It means adding a memory manager to handle access to the ROM, and RAM, and high speed caching of the memory. Then the actual processor architecture has to be changed to include the ability to wait if there is a cache miss (the PIC as it stands does not implement any form of wait, always assuming it can fetch bytes from ROM and RAM as needed). I started sketching out a version of this on a fairly basic PIC, and found my chip real estate growing by over 10*. Multi-core does not suit the basic architecture of the PIC. It is harder to implement multi-core on the PIC memory architecture, than on the single linear memory architecture (Von Neumann).

By the time you add this much real estate, you are no longer talking low-end embedded.

ELCouz · Joined: 18 Jul 2007 Posts: 427 Location: Montreal,Quebec

temtronic · Posted: Mon Jun 22, 2015 4:58 am

hmm a PIC FPU... maybe cut code for one and it can become your 'winning lottery ticket'!
Even the original PC needed a 'coprocessor' to do the 'math'. Perhaps 'reverse engineer' the PC chip and see IF it's viable to do for a PIC. I assume using SPI as the interface so that every PIC could use it.

If viable, you should be able to 'retire' early !

Jay

RF_Developer · Joined: 07 Feb 2011 Posts: 839