# An Asynchronous High-Performance FPGA Based on LEDR/Four-Phase-Dual-Rail Hybrid Architecture

Yoshiya Komatsu Center for Innovative Integrated Electronic Systems, Tohoku University Sendai, JAPAN ykomatsu @cies.tohoku.ac.jp Masanori Hariyama Graduate School of Information Sciences, Tohoku University Sendai, JAPAN hariyama @ecei.tohoku.ac.jp Michitaka Kameyama Graduate School of Information Sciences, Tohoku University Sendai, JAPAN kameyama @ ecei.tohoku.ac.jp

### ABSTRACT

This paper presents an asynchronous high-performance FPGA that combines Four-Phase Dual-Rail (FPDR) protocol and Level-Encoded Dual-Rail (LEDR) protocol. FPDR protocol is employed to achieve small area for logic blocks, while LEDR protocol is employed to obtain high bit rate and low power for data transfer. Each logic block consists of LEDR-FPDR protocol converter, FPDR-LEDR protocol converter and two pipelined FPDR LUTs that alternately operate. The proposed FPGA is designed using the e-Shuttle 65nm CMOS process and the simulation result shows that the throughput is 3.91 GHz.

#### **Keywords**

Asynchronous circuit, reconfigurable VLSI, Four-Phase Dual-Rail (FPDR) protocol, Level-Encoded Dual-Rail (LEDR) protocol, domino logic.

# 1. INTRODUCTION

Field-programmable gate arrays (FPGAs) are widely used to implement special-purpose processors. FPGAs are costeffective for small-lot production because functions and interconnections of logic resources can be directly programmed by end users. Despite their design cost advantage, FP-GAs impose slower operating speed compared to custom silicon alternatives [1] because of programmable interconnects. Fine-grained pipelining is an effective approach to improve throughput but it requires a lot of registers. Hence, it is difficult to apply fine-grained pipelining to conventional synchronous FPGAs which have enormously large numbers of registers and complex clock distribution networks [2].

To solve the problem, asynchronous FPGAs have been proposed. Instead of using the clock, the asynchronous FP-GAs use the handshake protocol between their components in order to perform the necessary synchronization, communication, and sequencing of operations. Reference [3] proposes a  $0.18 \mu m$  674MHz FPGAs that consists of fine-grained asynchronous pipeline. Reference [4] proposes a 65nm 2.97GHz FPGA that employs pipelined cell and dual pipeline architecture [5].

This work was presented in part at the international symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART2014), Sendai, Japan, June 9-11, 2014. Most of high-throughput asynchronous FPGAs use Four-Phase Dual-Rail (FPDR) protocol because it has simple hardware of the function units. However, in FPDR protocol, a spacer must be inserted between two consecutive data values. This results in low throughput and large power consumption of the data transfer using the programmable interconnection resources because of the large number of signal transitions. Although the dual pipeline architecture [5] can conceal overhead caused by spacer insertion, it requires five wires to transfer a 1-bit value.

Another well-known dual-rail protocol is Level-Encoded Dual-Rail (LEDR) protocol which requires no spacer [6]. Because no spacer is required, the number of signal transitions is half that of FPDR protocol. As a result, the throughput and the power consumption of the data transfer using the programmable interconnection resources are small. However, the drawback of LEDR protocol is its complex hardware of function units.

In this paper, we propose a high-throughput asynchronous FPGA that combines FPDR protocol and LEDR protocol. The basic idea is presented in our previous work [7] that is focused on improving energy consumption. The proposed architecture in this paper is focused on high performance and efficient data transfer. LEDR protocol is employed to achieve high throughput and efficient data transfer between Logic Blocks (LBs), while FPDR protocol is employed to achieve small LBs. In addition, fine-grain pipelined LB, interconnect and dual pipeline technique are introduced to improve throughput. According to the evaluation result, the proposed FPGA operates up to 3.91 GHz.

#### 2. ASYNCHRONOUS PROTOCOLS

Asynchronous protocol schemes are mainly classified into

- Single-rail protocol (ex. bundled-data protocol)
- Dual-rail protocol (ex. FPDR protocol, LEDR protocol)

Bundled-data protocol is the most common method in the single-rail protocol. Figure 1 shows a simple bundled-data pipeline. In bundled-data protocol, request and value are split into separate wires. The value is encoded as in a synchronous circuit using N wires to denote an N-bit number, and request is encoded using a dedicated request wire denoted by REQ. Bundled-data protocol requires the explicit insertion of matching delays in REQ to ensure that a request



Figure 1: A simple bundled-data pipeline.



Figure 2: A simple dual-rail pipeline.

is never received before the bundled value is valid. Bundleddata protocol is the most frequently-used way in ASICs since its hardware overhead is relatively small. This is because the REQ wire is shared among all the N wires. Hence, to transfer an N-bit value, only N+2 wires are required. The major disadvantage is that it requires the constraint of the delay length. If the data path is fixed in advance, it is relatively easy to meet the constraint by optimizing layouts of wires. However, for FPGAs, since the data path is programmable, complex programmable delay elements are required. As a result, bundled-data protocol is not suitable for FPGAs.

The dual-rail protocol encodes a bit onto two wires. Figure 2 shows a simple dual-rail pipeline. In the dual-rail protocol, the value is made implicit in the request and no delay insertion is therefore required [8]. Hence, the dual-rail protocol is the ideal one for reconfigurable VLSIs. In the dual-rail protocol, to transfer an N-bit value, 2N + 1 wires are required.

There are two major methods for the dual-rail protocol:

- FPDR protocol
- LEDR protocol

FPDR protocol is the most common method in dual-rail protocols. Table 1 shows the code table of FPDR protocol. The codeword consists of "True bit" and "False bit". The data value "0" is encoded as (0, 1) and "1" is encoded as (1, 0). Moreover, the spacer is encoded as (0, 0). Figure 3 shows the example where data values "0" and "1" are transferred. The main feature is that the sender sends spacer after a data value. The receiver knows the arrival of a data value by detecting the change of either bit: "0" to "1". The insertion of spacers makes the protocol law simple and it results in simple hardware for the function unit. However, since each data is followed by a spacer, only one data is transferred in a cycle of an acknowledge signal. In addition, the insertion of spacers increases the number of signal transitions.

As a result, the throughput of data transfer is low and the



Table 1: Code table of FPDR protocol

Codeword

Figure 3: Example of FPDR protocol.

Table 2: Code table of LEDR protocol.



Figure 4: Example of LEDR protocol.

power consumption of the programmable interconnection resources is large. LEDR protocol is one of 2-phase dual-rail protocols. In LEDR protocol, no spacer is required. Table 2 shows the code table of LEDR protocol. Note that each data value has two types of codewords with different phases. For example, data value "0" is encoded as (0, 0) in phase 0 and (0,1) in phase 1. The codeword consists of V(Value)bit) and R(Redundant bit). The value V is encoded as in a synchronous circuit. The redundant bit R is defined by XOR-ing V and *Phase* so that R includes the information on phase and consecutive codewords are differentiated only by hamming distance 1. Figure 4 shows the example where data values "0", "1", "1" and "0" are transferred. The main feature is that the sender sends data values alternately in phase 0 and phase 1. The receiver knows the arrival of a data value by detecting the change of phase, and data values are continuously transferred between the sender and the receiver without any break. Because no spacer is required, two data are transferred in a cycle of an acknowledge signal and the number of signal transitions is half that of FPDR protocol. As a result, throughput of data transfer is high and the power consumption of the programmable interconnection resources is small. The disadvantage of LEDR protocol is the large hardware cost of the function unit. This is because LEDR protocol has two phases for each value. In



Figure 6: Structures of CBs.

fine-grain architectures, such as the function unit of each LB is a 2-input-and-1-output Look-Up-Table (LUT), the hardware overhead is small. However, in more coarse-grain architectures, the hardware overhead is large.

## **3. ARCHITECTURE**

Figure 5 shows the overall structure of the proposed FPGA. The FPGA comprises regularly arrayed cells. Each cell is connected to adjacent eight cells through input and output channels. Since LEDR protocol is employed for data transfer, a channel requires three wires to transfer a single data bit as mentioned in the previous section.

A cell consists of four Input Connection Blocks (CBs), an Output CB and an LB. Figure 6 shows structures of Input CB and Output CB. The Input CB consists of MUXs and takes LB's input from input channels connected to adjacent eight cells. The Output CB consists of DEMUXs and distributes LB's output to output channels. Since MUXs and DEMUXs also work as LEDR registers, Input CB and Output CB behave as 2 and 3-stage pipeline respectively.

Figure 7 shows a proposed LB structure. The proposed LB consists of LEDR-FPDR converter, two 4-input FPDR LUTs and FPDR-LEDR converter. As mentioned in the previous section, FPDR protocol is employed to achieve small area for LUTs, while LEDR protocol is employed to achieve high throughput and low power for data transfer. In the LUT, the spacer of FPDR protocol is required. Therefore, in a typical manner, the throughput of the hybrid architecture is almost same as the FPDR-based architecture de-



Figure 8: Time chart of an LEDR-FPDR converter and LUTs.

spite it employs LEDR protocol for the fast data transfer. To solve this problem, dual pipelined architecture is employed. As shown in Fig.7, the FPDR LUT is duplicated and they execute boolean operation alternately. Figure 8 shows the time chart of an LEDR-FPDR converter and LUTs. When the phase of the LBIn is 0, the LEDR-FPDR converter sends valid data to LUT0 and it sends a spacer to LUT1. When the phase of the LBIn is 1 the LEDR-FPDR converter sends valid data to LUT1 and it sends a spacer to LUT0. In a similar manner, output data from LUTs are converted by FPDR-LEDR converter continuously. Therefore, the delay caused by the spacer is concealed. Reference [4] also proposes a high performance FPGA based on a dual pipeline architecture. However, it employs FPDR protocol and requires five wires (four wires for data and one wire for an acknowledge signal) to transfer a single bit data. Therefore, the proposed architecture is more efficient at data transfer between LBs.

Figure 9 shows an FPDR LUT. The LUT executes arbitrary 4-input-and-1-output boolean function. To achieve high throughput, the LUT consists of three pipeline stages.

## 4. EVALUATION

The proposed FPGA is implemented using e-Shuttle 65nm CMOS process with 1.2V supply. The circuit is evaluated using HSPICE simulation. Table 3 summarizes the evaluation result of a cell between the previous hybrid architecture [7] and the proposed FPGA. Since the proposed FPGA is pipelined and dual pipeline architecture is employed, the transistor count of a cell and energy consumption is increased by 191.4% and 203.5% respectively. On the other hand, throughput is increased by 173.2% thanks to the pipelined architecture and dual pipeline technique.



Figure 9: Structure of an LUT.

Table 3: Evaluation result of a cell between the previous architecture and the proposed FPGA.

|                           | Hybrid FPGA [7] | Proposed |
|---------------------------|-----------------|----------|
| Transistor count          | 1332            | 3882     |
| Throughput [G data set/s] | 1.43            | 3.91     |
| Energy [fJ/data set]      | 287             | 871      |

# 5. CONCLUSION

This paper proposed an asynchronous high-performance FPGA that combines FPDR protocol and LEDR protocol. LEDR protocol is employed to achieve high throughput and efficient data transfer between Logic Blocks (LBs), while FPDR protocol is employed to achieve small LBs. Moreover, fine-grain pipelined LB, interconnect and dual pipeline technique are introduced to improve throughput. The simulation result shows that the maximum throughput of the proposed FPGA is 3.91 GHz. As a future work, we are evaluating the proposed architecture on some practical benchmarks. Developing the CAD environment for asynchronous FPGAs is also important topic.

#### Acknowledgment

This research is supported by JSPS KAKENHI Grant Number 25.5513 and METI through its "R&D Subsidiary Program for Promotion of Academia-industry Cooperation". Also, this work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with STARC, e-Shuttle, Inc., Fujitsu Ltd., Cadence Design Systems, Inc. and Synopsys, Inc.

# 6. REFERENCES

- I. Kuon and J. Rose, "Measuring the Gap Between FPGAs and ASICs," in Proceedings of The International Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, 2006, pp. 21–31.
- [2] J. Lamoureux and S. J. E. Wilton, "Fpga clock network architecture: Flexibility vs. area and power," in Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate

*Arrays*, ser. FPGA '06. New York, NY, USA: ACM, 2006, pp. 101–108. [Online]. Available: http://doi.acm.org/10.1145/1117201.1117216

- [3] R. Manohar, "Reconfigurable Asynchronous Logic," in Proceedings of IEEE Custom Integrated Circuits Conference, Sep. 2006, pp. 13–20.
- [4] B. Devlin, M. Ikeda, and K. Asada, "A 65 nm gate-level pipelined self-synchronous fpga for high performance and variation robust operation," *Solid-State Circuits, IEEE Journal of*, vol. 46, no. 11, pp. 2500–2513, Nov 2011.
- [5] M. Jeong, T. Nakura, M. Ikeda, and K. Asada, "Moebius circuit: Dual-rail dynamic logic for logic gate level pipeline with error gate search feature," in Proceedings of the 19th ACM Great Lakes Symposium on VLSI, ser. GLSVLSI '09. New York, NY, USA: ACM, 2009, pp. 177–180. [Online]. Available: http://doi.acm.org/10.1145/1531542.1531587
- [6] M. E. Dean, T. E. Williams, and D. L. Dill, "Efficient self-timing with level-encoded 2-phase dual-rail (LEDR)," in *Proceedings of University of California/Santa Cruz conference on Advanced research* in VLSI, 1991, pp. 55–70.
- [7] S. Ishihara, Y. Komatsu, M. Hariyama, and M. Kameyama, "An asynchronous fpga based on ledr/4-phase-dual-rail hybrid architecture," *IEICE transactions on electronics*, vol. 93, no. 8, pp. 1338–1348, aug 2010.
- [8] J. Sparsø and S. Furber, Principles of Asynchronous Circuit Design: A Systems Perspective. Kluwer Academic Publishers, 2001.