Research advances in CyberPhysical Systems (CPS) promise to transform our world with systems that respondmore quickly, are more precise, work in dangerous or inaccessible environments,provide large-scale, distributed coordination, are highly efficient, augmenthuman capabilities, and enhance societal wellbeing. Some examples includeautonomous collision avoidance; robotic surgery and nano-tolerancemanufacturing; autonomous systems for search and rescue; firefighting, andexploration, automated traffic control; zero-net energy buildings; andassistive technologies and ubiquitous healthcare monitoring and delivery 1.
CPS is essentially the conjoining and coordination of physical andcomputational resources. However, current computing and networking abstractionsare not well suited for implementing CPS applications 2. New abstractionsmust be developed to incorporate computing and physical processes in a unifiedway. Current methods forcomputing, networking, and implementing software focus more on average caseperformance rather than timing predictability. Programming languages such as Cor Java have no methods to specify timing constraints.
That is, the languagedoes not associate timing with a program. Typically, system components aredesigned individually and later integrated. In the end, worst-case executiontime can be calculated and addressed. While current architectures that employmulti-level caches, branch predictors, deep pipelines, etc improve average caseperformance, they do not affect worst-case performance 3. Failures in certain2 applications that rely on worst-case execution times could be catastrophic,such as flight controllers for satellites, medical diagnostics equipment for apatient in intensive care, and guidance systems for missiles to name a few. CPSis primarily concerned with how time is addressed. In the physical world, timecannot pause in order to perform calculations, and CPS must operate in real-timein order to adhere to the fact that time is continuous in the physical world.
In an effort to achieve the timing requirements of CPS, new processorarchitectures have been identified to include Precision Timed (PRET)processors. PRET aims to address timingpredictability at the architecture level. Timing predictability is the abilityto predict timing properties of the system 3. Many of the basic elements of aRISC processor are kept while removing other sources of indeterminacy.
Conceptually, instructions are fetched, decoded, and executed the same way;however, methods for implementing these tasks can vary. For CPS research, anopen source PRET processor is being developed based on the OpenFire softprocessor 4. While many aspects of a processor that affect the execution oftasks can be addressed, such as pipelines, caches, inter-process communications(IPC), and resource allocation, thread management is addressed in this thesis.More specifically, hardware and software interfacing for the scheduling ofthreads is addressed. Different methods for implementing hardware schedulersare indentified, modeled, and analyzed. Motivations behind implementing ahardware management controller include: speedup over software, flexibility, andperformance gains directly rated to frequency of scheduling operation (to namea few). Other than their names,differences exist between software and hardware. Software can be thought of asa set of ordered instructions required to perform a specific task.
Theseinstructions can be a source of overhead, especially in real-time systems. If apatient’s blood 3 pressure significantly drops; but, ten instructions must beperformed in order to alert the nurse, vital time has been wasted if theinstructions take a considerably long time to execute. Hardware can potentiallyovercome a lot of pitfalls associated with software. Some calculations can beperformed in parallel with a CPU if implemented in hardware. Overhead isreduced by not performing a set of instructions. Moreover, hardware isinherently deterministic. That is, execution times in hardware can more easilybe calculated when compared to execution times in software.
While the softwareversion can execute the algorithm in 12 clock cycles, additional overhead canexist; however, the hardware version takes 2 clock cycles with little to noadditional overhead. While timing predictabilityis easier to gauge in hardware, many claim that processing time can also bereduced 6. Moreover, design costs and time-to-market are reduced by usingFPGAs.
Even though many processors can be reprogrammed to perform specificapplications, the reprogramming is done in software. That is, the number ofhardware devices, such as multipliers, adders, cache, etc, does not change.Reconfigurable computing devices such as FPGAs can be configured at thehardware level instead of software level. FPGA configuration is most oftenspecified using a hardware description language (HDL). The most commonly usedHDLs are Verilog and VHDL. Specific to this thesis, a soft processor has beencreated in the logic fabric and general-purpose memory of an FPGA and is usedto investigate the need for PRET architectures in CPS applications.
As previously stated, a softprocessor implemented on a FPGA is created in the logic fabric andgeneral-purpose memory of the FPGA. A soft processor is an intellectualproperty core that is implemented using logic gates that implement specificcircuit behaviors using programmable logic devices. It can be thought of as asoftware based microprocessor. Configurability, fast time to market, ease ofintergration, and avoiding obsolescence are features for using a softprocessor. An open-source soft processor for FPGA technology currently exists,OpenFire 9, and is being modified for CPS applications. Current modificationsto the OpenFire add additional stages to the pipeline.
This newer version,MultiFire, saw an increase in performance (36% in area and 69% in maximumfrequency) when compared to the OpenFire 4. While many areas need to beinvestigated for advancing PRET architecture research, the purpose of thisthesis is to identify, model, and simulate thread scheduling methods inhardware for multithreaded applications. 6 Threads are essentially aset of instructions used to carry-out a specific operation. What is more,multithreading is the process of executing multiple threads on a processor asconcurrently as possible. Certain tasks in the physical world are made up ofmultiple processes that can run in parallel.
For example, an autonomous systemmight interpret speed, temperature, height, and pressure in order to make adecision for a specified task. Each event measured can be considered a thread.That is, the instructions needed to make a speed measurement will beaccomplished using one thread, and the instructions needed to make atemperature measurement will be accomplished on another thread. Whilecorrectness is important, performing these tasks correctly in real-time isparamount. Specific events in the applications environment determine threadsignificance. In the example above, measuring temperature may be more criticalthan measuring speed. Moreover, speed may be measured more often thattemperature.
Therefore, when temperature is needed, the thread providingtemperature data may need to interrupt the thread providing speed data. Eachprocess represents a thread and the thread management block represents managingand scheduling. The purpose of this thesis is to model and simulate methods forscheduling threads in hardware.
Thread scheduling iscritical for ensuring certain applications operate in real-time. However, mostof the traditional scheduling schemes encountered for multithreadedapplications are inefficient for implementing real-time applications. Moreover,context switching adds overhead when using multiple threads. Therefore,efficiently implementing context switches and scheduling threads are paramountin real-time applications.
The most common schemes encountered for schedulingand implementing multithreaded applications 7 include: round-robin, first-infirst-out (FIFO), shortest time remaining, multilevel queue, and priority basedscheduling. Current implementation inthe MultiFire processor employs a round-robin scheduling scheme; a newinstruction is fetched each cycle from a different thread. Round-robin isconsidered the simplest solution for implementing a thread schedulingalgorithm. Equal timeslices are given to each thread and they cycle through thepipeline in a circular fashion 10.
Priorities are not established in around-robin scheme. Each thread is given the same amount of time to execute.Once a time-slice completes, thread state is saved and the next thread in thequeue is given the same amount of time to execute. Once the time-slice for thelast thread in the queue completes, the first thread is started again. Whilethis method is simple to understand and implement, real-time applicationsgreatly suffer as the number of threads needed increase. The number of threadsand the length of each time-slice affect how long one thread will take tocomplete. In real-time applications, 8 certain threads may require earliercompletion times.
While dividing the time equally to execute each thread mightseem fair, certain applications might need attention and results sooner. Forexample, if someone’s heart stops beating who is wearing a pacemaker, immediateattention should be given to the circuitry required to keep the person’s heartbeating. Therefore, more time (i.e. longer time-slices) for certain threadsmight be needed. Three other traditional scheduling methods (first-infirst-out, shortest remaining time, and fixed priority) exist to address theinefficiencies seen in round-robin scheduling schemes.
However, they too aresometimes inefficient for implementing real-time applications. In a broad sense, first-in first-out (FIFO) isa concept that allows the first element encountered of a process to beserviced. Some examples include: people waiting in line, a print queue, andelectronic buffers. Think of standing in line at a place of business. Theperson in 9 front of the line is serviced first. After they are serviced, thenext person in line is serviced. This process is repeated until all people havebeen serviced.
The same idea can be seen in a standard print queue. If fivepeople want to print a document at the same time, whoever gives their documentto the print queue first will be serviced first regardless of the size of thedocument. The first document could consist of one hundred pages while the nextdocument in the queue could consist of one page.
Regardless, the first documentin the print queue will be serviced first. While this method seems the mostfair, it proves very inefficient for real-time applications. Throughput istypically low due to long processes filling the queue. Moreover, deadlines arenot addressed when executing processes.
However, scheduling overhead istypically minimal since context switching only occurs at process termination. Shortest remaining timescheduling, sometimes called shortest job first, is a concept that allows aprocess with the shortest time to be serviced first. The same example ofstanding in line can be used for illustration. When two people enter a line ata business, occasionally the person needing more time allows the other personneeding shorter time to be serviced first. For example, occasionally a customerat a grocery store with a cart full of groceries will allow for another personwith one item to be serviced first. This easily proves subjective depending onthe people involved.
Moreover, a real-time application implementing a shortestremaining time scheduling scheme is not subjective. While a person can easilymake their plea to move to the front of the line, threads do not have thisluxury. A processor does not know the difference between a thread servicing apatient in the hospital and a thread reading a thermometer.
This method provesinefficient in a lot of ways. First, processes with longer times suffer. Theyconstantly must wait for shorter processes to complete before being considered.
The longer processes are effectively starved of processor time. Second, contextswitching adds overhead if a 10 currently running process is interrupted by aprocess with shorter time. If this happens, the longer process is split inhalf, potentially many times, allowing for shorter processes to execute. Last,deadlines are not considered.
The processing time is simply addressed andconsidered when determining which process to execute. In order to introducesome form of subjectivity to determine which thread is serviced first, a fixedpriority pre-emptive scheduling scheme can be implemented. Fixed priority pre-emptivescheduling is a concept that applies priorities to processes to determine theirorder for being serviced. A thread with a higher priority will move up in therun queue. If it is blocked, it will move to a wait queue and the next highestpriority thread will be executed in the run queue. Determining the priority foreach thread depends on the application.
For this report, scheduling prioritybased threads in hardware will be addressed. Moreover, it is assumed thatpriorities have already been established for the purpose of this report. Thatis, prioritizing threads is not of concern for the purpose of this report; Forthe purpose of this report, four priorities will be used.
The highest threepriorities are used for application specific operations, and the fourthpriority is used for an idle state. Priorities are needed in order to addressthe hierarchy of events for a specific process. For example, if a pacemakersenses that the user’s heart stops beating, the process needed for starting theuser’s heart should have highest priority. Moreover, other threads should behalted when this process is needed. For example, if the user’s temperature isbeing measured, it should be halted to allow the process servicing the user’sheart activity to run. Real time operating systems(RTOS) are closely related to PRET architectures and processors. While both areconcerned with time, the main difference is that RTOS has minimal interruptlatency where PRET processors must have zero interrupt latency.
Moreover, RTOSs11 add unnecessary overhead making the execution non-deterministic. In a FPGAapplication, the hardware portion would be synthesized on the FPGA and thesoftware portion would be performed through the RTOS. The interaction betweenhardware and software is accomplished through the programming language, whichis usually C 11. Implementing a hardwarethread management system on the MultiFire will attempt to eliminate overheadcommonly seen in software thread management systems.
Moreover, a hardwarethread management system will attempt to meet the timing demands for CPS. Adesign for a hardware thread scheduler will be modeled and compared to softwarethread management techniques. Scenarios will be used for thread simulation;moreover, threads will have one of four priorities associated with them andwill have already been configured. Chapter 1 introduces the reader to CPS, PRETarchitectures, Field Programmable Gate Array (FPGA) technology, threadscheduling, and a need for hardware thread management on the MultiFire.
Chapter2 presents literature for current and past research for thread management techniquesin both hardware and software