Research performance, they do not affect worst-case performance

Research advances in Cyber
Physical Systems (CPS) promise to transform our world with systems that respond
more quickly, are more precise, work in dangerous or inaccessible environments,
provide large-scale, distributed coordination, are highly efficient, augment
human capabilities, and enhance societal wellbeing. Some examples include
autonomous collision avoidance; robotic surgery and nano-tolerance
manufacturing; autonomous systems for search and rescue; firefighting, and
exploration, automated traffic control; zero-net energy buildings; and
assistive technologies and ubiquitous healthcare monitoring and delivery 1.
CPS is essentially the conjoining and coordination of physical and
computational resources. However, current computing and networking abstractions
are not well suited for implementing CPS applications 2. New abstractions
must be developed to incorporate computing and physical processes in a unified


Current methods for
computing, networking, and implementing software focus more on average case
performance rather than timing predictability. Programming languages such as C
or Java have no methods to specify timing constraints. That is, the language
does not associate timing with a program. Typically, system components are
designed individually and later integrated. In the end, worst-case execution
time can be calculated and addressed. While current architectures that employ
multi-level caches, branch predictors, deep pipelines, etc improve average case
performance, they do not affect worst-case performance 3. Failures in certain
2 applications that rely on worst-case execution times could be catastrophic,
such as flight controllers for satellites, medical diagnostics equipment for a
patient in intensive care, and guidance systems for missiles to name a few. CPS
is primarily concerned with how time is addressed. In the physical world, time
cannot pause in order to perform calculations, and CPS must operate in real-time
in order to adhere to the fact that time is continuous in the physical world.
In an effort to achieve the timing requirements of CPS, new processor
architectures have been identified to include Precision Timed (PRET)

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


PRET aims to address timing
predictability at the architecture level. Timing predictability is the ability
to predict timing properties of the system 3. Many of the basic elements of a
RISC processor are kept while removing other sources of indeterminacy.
Conceptually, instructions are fetched, decoded, and executed the same way;
however, methods for implementing these tasks can vary. For CPS research, an
open source PRET processor is being developed based on the OpenFire soft
processor 4. While many aspects of a processor that affect the execution of
tasks can be addressed, such as pipelines, caches, inter-process communications
(IPC), and resource allocation, thread management is addressed in this thesis.
More specifically, hardware and software interfacing for the scheduling of
threads is addressed. Different methods for implementing hardware schedulers
are indentified, modeled, and analyzed. Motivations behind implementing a
hardware management controller include: speedup over software, flexibility, and
performance gains directly rated to frequency of scheduling operation (to name
a few).


Other than their names,
differences exist between software and hardware. Software can be thought of as
a set of ordered instructions required to perform a specific task. These
instructions can be a source of overhead, especially in real-time systems. If a
patient’s blood 3 pressure significantly drops; but, ten instructions must be
performed in order to alert the nurse, vital time has been wasted if the
instructions take a considerably long time to execute. Hardware can potentially
overcome a lot of pitfalls associated with software. Some calculations can be
performed in parallel with a CPU if implemented in hardware. Overhead is
reduced by not performing a set of instructions. Moreover, hardware is
inherently deterministic. That is, execution times in hardware can more easily
be calculated when compared to execution times in software. While the software
version can execute the algorithm in 12 clock cycles, additional overhead can
exist; however, the hardware version takes 2 clock cycles with little to no
additional overhead.

While timing predictability
is easier to gauge in hardware, many claim that processing time can also be
reduced 6. Moreover, design costs and time-to-market are reduced by using
FPGAs. Even though many processors can be reprogrammed to perform specific
applications, the reprogramming is done in software. That is, the number of
hardware devices, such as multipliers, adders, cache, etc, does not change.
Reconfigurable computing devices such as FPGAs can be configured at the
hardware level instead of software level. FPGA configuration is most often
specified using a hardware description language (HDL). The most commonly used
HDLs are Verilog and VHDL. Specific to this thesis, a soft processor has been
created in the logic fabric and general-purpose memory of an FPGA and is used
to investigate the need for PRET architectures in CPS applications.

As previously stated, a soft
processor implemented on a FPGA is created in the logic fabric and
general-purpose memory of the FPGA. A soft processor is an intellectual
property core that is implemented using logic gates that implement specific
circuit behaviors using programmable logic devices. It can be thought of as a
software based microprocessor. Configurability, fast time to market, ease of
intergration, and avoiding obsolescence are features for using a soft
processor. An open-source soft processor for FPGA technology currently exists,
OpenFire 9, and is being modified for CPS applications. Current modifications
to the OpenFire add additional stages to the pipeline. This newer version,
MultiFire, saw an increase in performance (36% in area and 69% in maximum
frequency) when compared to the OpenFire 4. While many areas need to be
investigated for advancing PRET architecture research, the purpose of this
thesis is to identify, model, and simulate thread scheduling methods in
hardware for multithreaded applications.

6 Threads are essentially a
set of instructions used to carry-out a specific operation. What is more,
multithreading is the process of executing multiple threads on a processor as
concurrently as possible. Certain tasks in the physical world are made up of
multiple processes that can run in parallel. For example, an autonomous system
might interpret speed, temperature, height, and pressure in order to make a
decision for a specified task. Each event measured can be considered a thread.
That is, the instructions needed to make a speed measurement will be
accomplished using one thread, and the instructions needed to make a
temperature measurement will be accomplished on another thread. While
correctness is important, performing these tasks correctly in real-time is
paramount. Specific events in the applications environment determine thread
significance. In the example above, measuring temperature may be more critical
than measuring speed. Moreover, speed may be measured more often that
temperature. Therefore, when temperature is needed, the thread providing
temperature data may need to interrupt the thread providing speed data. Each
process represents a thread and the thread management block represents managing
and scheduling. The purpose of this thesis is to model and simulate methods for
scheduling threads in hardware.


Thread scheduling is
critical for ensuring certain applications operate in real-time. However, most
of the traditional scheduling schemes encountered for multithreaded
applications are inefficient for implementing real-time applications. Moreover,
context switching adds overhead when using multiple threads. Therefore,
efficiently implementing context switches and scheduling threads are paramount
in real-time applications. The most common schemes encountered for scheduling
and implementing multithreaded applications 7 include: round-robin, first-in
first-out (FIFO), shortest time remaining, multilevel queue, and priority based

Current implementation in
the MultiFire processor employs a round-robin scheduling scheme; a new
instruction is fetched each cycle from a different thread. Round-robin is
considered the simplest solution for implementing a thread scheduling
algorithm. Equal timeslices are given to each thread and they cycle through the
pipeline in a circular fashion 10. Priorities are not established in a
round-robin scheme. Each thread is given the same amount of time to execute.
Once a time-slice completes, thread state is saved and the next thread in the
queue is given the same amount of time to execute. Once the time-slice for the
last thread in the queue completes, the first thread is started again. While
this method is simple to understand and implement, real-time applications
greatly suffer as the number of threads needed increase. The number of threads
and the length of each time-slice affect how long one thread will take to
complete. In real-time applications, 8 certain threads may require earlier
completion times. While dividing the time equally to execute each thread might
seem fair, certain applications might need attention and results sooner. For
example, if someone’s heart stops beating who is wearing a pacemaker, immediate
attention should be given to the circuitry required to keep the person’s heart
beating. Therefore, more time (i.e. longer time-slices) for certain threads
might be needed. Three other traditional scheduling methods (first-in
first-out, shortest remaining time, and fixed priority) exist to address the
inefficiencies seen in round-robin scheduling schemes. However, they too are
sometimes inefficient for implementing real-time applications.


 In a broad sense, first-in first-out (FIFO) is
a concept that allows the first element encountered of a process to be
serviced. Some examples include: people waiting in line, a print queue, and
electronic buffers. Think of standing in line at a place of business. The
person in 9 front of the line is serviced first. After they are serviced, the
next person in line is serviced. This process is repeated until all people have
been serviced. The same idea can be seen in a standard print queue. If five
people want to print a document at the same time, whoever gives their document
to the print queue first will be serviced first regardless of the size of the
document. The first document could consist of one hundred pages while the next
document in the queue could consist of one page. Regardless, the first document
in the print queue will be serviced first. While this method seems the most
fair, it proves very inefficient for real-time applications. Throughput is
typically low due to long processes filling the queue. Moreover, deadlines are
not addressed when executing processes. However, scheduling overhead is
typically minimal since context switching only occurs at process termination.


Shortest remaining time
scheduling, sometimes called shortest job first, is a concept that allows a
process with the shortest time to be serviced first. The same example of
standing in line can be used for illustration. When two people enter a line at
a business, occasionally the person needing more time allows the other person
needing shorter time to be serviced first. For example, occasionally a customer
at a grocery store with a cart full of groceries will allow for another person
with one item to be serviced first. This easily proves subjective depending on
the people involved. Moreover, a real-time application implementing a shortest
remaining time scheduling scheme is not subjective. While a person can easily
make their plea to move to the front of the line, threads do not have this
luxury. A processor does not know the difference between a thread servicing a
patient in the hospital and a thread reading a thermometer. This method proves
inefficient in a lot of ways. First, processes with longer times suffer. They
constantly must wait for shorter processes to complete before being considered.
The longer processes are effectively starved of processor time. Second, context
switching adds overhead if a 10 currently running process is interrupted by a
process with shorter time. If this happens, the longer process is split in
half, potentially many times, allowing for shorter processes to execute. Last,
deadlines are not considered. The processing time is simply addressed and
considered when determining which process to execute. In order to introduce
some form of subjectivity to determine which thread is serviced first, a fixed
priority pre-emptive scheduling scheme can be implemented.


Fixed priority pre-emptive
scheduling is a concept that applies priorities to processes to determine their
order for being serviced. A thread with a higher priority will move up in the
run queue. If it is blocked, it will move to a wait queue and the next highest
priority thread will be executed in the run queue. Determining the priority for
each thread depends on the application. For this report, scheduling priority
based threads in hardware will be addressed. Moreover, it is assumed that
priorities have already been established for the purpose of this report. That
is, prioritizing threads is not of concern for the purpose of this report; For
the purpose of this report, four priorities will be used. The highest three
priorities are used for application specific operations, and the fourth
priority is used for an idle state. Priorities are needed in order to address
the hierarchy of events for a specific process. For example, if a pacemaker
senses that the user’s heart stops beating, the process needed for starting the
user’s heart should have highest priority. Moreover, other threads should be
halted when this process is needed. For example, if the user’s temperature is
being measured, it should be halted to allow the process servicing the user’s
heart activity to run.


Real time operating systems
(RTOS) are closely related to PRET architectures and processors. While both are
concerned with time, the main difference is that RTOS has minimal interrupt
latency where PRET processors must have zero interrupt latency. Moreover, RTOSs
11 add unnecessary overhead making the execution non-deterministic. In a FPGA
application, the hardware portion would be synthesized on the FPGA and the
software portion would be performed through the RTOS. The interaction between
hardware and software is accomplished through the programming language, which
is usually C 11.


Implementing a hardware
thread management system on the MultiFire will attempt to eliminate overhead
commonly seen in software thread management systems. Moreover, a hardware
thread management system will attempt to meet the timing demands for CPS. A
design for a hardware thread scheduler will be modeled and compared to software
thread management techniques. Scenarios will be used for thread simulation;
moreover, threads will have one of four priorities associated with them and
will have already been configured.


 Chapter 1 introduces the reader to CPS, PRET
architectures, Field Programmable Gate Array (FPGA) technology, thread
scheduling, and a need for hardware thread management on the MultiFire. Chapter
2 presents literature for current and past research for thread management techniques
in both hardware and software