Optogenetics in Silicon: A Neural Processor for Predicting Optically Active Neural Networks

We present a reconfigurable neural processor for real-time simulation and prediction of opto-neural behaviour. We combined a detailed Hodgkin–Huxley CA3 neuron integrated with a four-state Channelrhodopsin-2 (ChR2) model into reconfigurable silicon hardware. Our architecture consists of a Field Programmable Gated Array (FPGA) with a custom-built computing data-path, a separate data management system and a memory approach based router. Advancements over previous work include the incorporation of short and long-term calcium and light-dependent ion channels in reconfigurable hardware. Also, the developed processor is computationally efficient, requiring only 0.03 ms processing time per sub-frame for a single neuron and 9.7 ms for a fully connected network of 500 neurons with a given FPGA frequency of 56.7 MHz. It can therefore be utilized for exploration of closed loop processing and tuning of biologically realistic optogenetic circuitry.

The kinetics of the molecule, were previously explored from an engineering viewpoint by ourselves (Nikolic et al. [7] and Grossman et al. [8]) using data from optogenetically transfected hippocampal cells. However there are still challenges to be addressed, such as how to physically stimulate large numbers of neurons. More importantly, how can therapeutic or scientific network stimulation protocols be translated into a particular three-dimensional light pattern? Such questions will be application-specific and can be answered either empirically or through modelling. The latter would require accurate software models. To explore further, bio-silicon hybrid networks could be used, with the potential for exploring both basic science and downstream translation.
A range of methodologies exist to simulate and predict the state of neural networks. These differ in their accuracy of mathematical representation as well as their scope and range of biological features. Abstract models such as integrate-andfire [9], Izhikevich [10], and Hindemarsh-Rose [11] provide computational efficiency. This allows scaling to large network simulations (of many thousands of neurons) on commodity hardware. There is however a need for more moderate sizes of neural networks but with bio-realism and real-time operation. In particular, optogenetics can provide stimuli to relatively localized neuronal circuitry. This requires the combination of optogenetic models with spatially detailed Hodgkin-Huxley models of neurons [12]. Such a system could potentially interpret recordings and command stimulation equipment in real time (through closed loop control), and could be very useful to both the in vitro [13] and in vivo communities [14].
Previously, computer workstations have been used to achieve high speed simulation of moderately complex neural networks. This is particularly the case when Graphics Processing Units (GPU's) are used for their parallel processing capability: Fidjeland used a GPU kernel to simulate 55 000 neurons with 1000 connections per neuron under bio-plausible conditions [15]; Wang implemented a network with 1 million HH based neurons on a commodity GPU, achieving a 28× speed-up over CPU implementations [16], and Tadashi applied a cerebellum gain and timing control algorithm on a GPU for realtime processing. However, with this technique it is difficult to achieve accurately timed output states for stimulation in real time using computational systems with operating systems. Therefore further digital logic is required to provide buffering and timing accuracy in the stimulus. This work is motivated by the benefit for timing accuracy in putting the neural network processing in this digital logic layer, and using the computer for updating variables associated with the neurons and network. Fig. 1. An optogenetic-neuron architecture. The soma and dendrites contain three different types of ion channels: voltage-dependent ion channels, calcium dependent ion channels and light-dependent ion channels. The voltagedependent ion channels are: a sodium ion channel I Na , a calcium ion channel I Ca , the delayed rectifier K ion channel I K(DR) , and the A-type of transient K ion channel I K(A) which are modelled using the HH equations. The calciumdependent ion channels are a long duration Ca-dependent K ion channel I K(AHP ) , and a short duration Ca-dependent K ion channel I K(C) . The Ca-dependent ion channels depend on the current intracellular calcium concentrations, typically calculated only in the cytoplasmic shell near the cell membrane as described in Traub et al. [20]. The light-dependent ion channel is ChR2, based on the four-state Markov process of Nikolic et al. [7]. The synapses receive synaptic currents from the other neurons and generated action potentials are transmitted along the axon.
One of the most appealing solutions for creating such a digital implementation is via reconfigurable logic, and in particular with a Field Programmable Gated Array (FPGA). FPGA's consist of arrays of logic and memory elements which can be defined as particular digital elements and connected in highly parallelized forms. These allow for rapid bespoke prototyping of digital circuits and their relative connectivity. As they are reprogrammable, they can be re-tuned to whatever neural network configuration is required. The downside of FPGA's is that classically their relatively high power consumption means that their application is limited to the benchtop. This is still acceptable for in vitro applications however, and more recently non-volatile forms of FPGA's provide low power operation suitable for battery-based applications.
FPGA systems have already been used to implement the Hodgkin-Huxley (HH) model, albeit with only voltagedependent ion channels: Smaragdos implemented an olivocerebellar 92-neuron network using a three-compartment HH model [17]; Weinstein et al. developed a system level design flow for implementing voltage-dependent ion channels [18]; and Graas et al. presented a timing multiplexing technique to process multi-neuron activities sequentially [19].
In this work, we have developed an FPGA-based highly biologically plausible processor for real-time simulation of optogenetic neural networks. Fig. 1 depicts the opto-neural architecture.
The first key advancement of this work lies in how we implement a biologically realistic neuron model with our four-state ChR2 model [7]. In addition, we have incorporated calcium and calcium-dependent ion channel models from both Traub et al. [20] and Soto-Treviño et al. [21] Calcium is an important ion for neuronal adaptation (and also imaging).
Our model can be adapted to represent most forms of optogenetic channels (opsins) by modifying the time-constants, reversal potential and conductance to capture the dynamics of other variants. Therefore, compared to the other FPGA-based neural systems, the short-and long-term calcium-and lightdependent ion channels, allow the hardware to replicate more advanced neural characteristics (e.g., light-to-spike processes and calcium-related adaptation) in real-time.
The second key aspect of this system is its flexibility and computational efficiency. The data management system and configuration unit are separate to the computing data-path. Thus, the system application objectives can be easily updated by modifying corresponding model parameters (e.g., light irradiance, architecture, neural parameters and network sizes). For example, since each neuron's stimulation level is calculated sequentially, from pre-stored tables of different light levels in the data generation system, the hardware is able to simulate the effects of spatially varying illumination levels over a population of neurons. This is especially useful for investigating multi-site light stimulation strategies for optogenetics, such as for shaping the illumination levels from arrays of LEDs.
Furthermore, our pipelined parallel processing requires only 0.03 ms for a single neuron and 9.7 ms for a fully connected 500-neuron network to calculate a simulation sub-frame. Thus the applicability of this system for either open or closed loop interaction with tissue is where the neuron count is in the hundreds of thousands rather than millions. Examples of this include active pixel sensor neural recording systems [22] and stimulation systems (e.g., Wang et al. [23] and ourselves [24]).
It is also possible to directly translate the FPGA design into an Application Specific Integrated Circuit (ASIC) chip. In this instance, the chip would be sufficiently small and low power for in vivo applications.

II. MODELLING THE LIGHT-TO-SPIKE PROCESS
The optogenetic-neuron mathematical model has been adapted from previous work [7]. It combines a detailed Hodgkin-Huxley neuron model with parameters for a CA3 neuron [20], and integrates an additional ChR2 channel [7]. The structure is shown in Fig. 1, which consists of four compartments: the synapses, axon, dendrites and soma. In order to ensure hardware translation, we do not attempt to increase the number of compartments to reflect long neuronal arbors. Nevertheless, it is still significantly more accurate than for abstract point-neuron models.

A. Cell Model: Soma and Dendrites
Our cell model is essentially a two-compartment neuron model: one compartment emulates the complete dendritic tree including synaptic inputs and the other compartment models the cell soma. Nominally there is a third compartment-the axonbut in our model it is treated as a simple communication contact, hence a separate compartment was not associated with it. The common ion channels for both the soma and dendrites are: The light-dependent ion channels (ChR2) are assumed to be expressed only in the soma. We justify this as the surface area of the dendrites of any given cell is relatively small compared to the volume of tissue they inhabit, so optical stimulation is best targeted at the soma. We feel the computational cost is not justified by the small dendritic contribution of traditional ChR2, which has very low channel conductance. However, if a high conductance opsin were to be used, these effects could be incorporated.
Synapses are assumed to be only in the dendrites. Similarly, this is to simplify the model computationally, but again, this can be easily changed if required.
The neuronal model is based upon the traditional HH differential equations [12] which treat individual channels as having an individual conductance with a specific reversal potential. The traditional model contains potassium, sodium and leakage ion channel components. We have also incorporated calcium and rhodopsin channels.
Equations (1) and (2) describe the time evolution of the membrane potential for the soma compartment (dV soma /dt) and for the dendritic tree compartment (dV dend /dt) in terms of current flow through each channel The current terms are described in Table I. The last term in both equations describes the current between the compartments. g c = 0.02 nS/μm 2 is the conductance between the somatic and the dendritic compartments, C soma = 0.01 pF/μm 2 is the membrane capacitance of the soma compartment, and C dend = 0.01 pF/μm 2 is the membrane capacitance of the dendritic compartment.
The mathematical equations for the current flow through the voltage-dependent ion channels [20] are given by where I i is the ion channel current, g i is the ion conductance, m and h are gate variables (where h has the same form as m) and m(h) ∞ and m(h) τ are the gate-variable steady-state and time constant values respectively. Finally, v is the reduced membrane potential (v = V − V rest ) and E i is the reduced reversal potential.  [20] and shown here in Here, F = 3 is the scaling constant, and τ Ca = 13.33 ms is the time constant for the decay of intracellular calcium concentration, due to the rapid action of ion pumps which extrude calcium. The corresponding parameters are shown in Tables I and II.
The four-state model of Channelrhodopsin-2 was previously described by Nikolic et al. [7], which we believe to be optimal in terms of the balance between accuracy and simplicity. The model describes ChR2 as having four states; two closed states and two open (conductive) states, and is shown in Fig. 1 The retinal molecular core of the ChR2 rhodopsin complex absorbs a photon to switch from all-trans to 13 These relations can be described as four coupled differential equations where O1, O2, C1, and C2 are the proportions of ChR2 complexes in the open states (1 and 2), and closed states (1 and 2), which are conserved to sum to one. G d1 and G d2 are the deactivation rates O1→C1 and O2→C2 respectively, and e tc and e ct are the rates of transition between O1 and O2 and vice versa and G rd is the rate of thermal conversion of C2 to C1. G a1 and G a2 are the activation rates for C1 to O1 and C2 to O2 respectively [described in general terms in (10)], γ = 0.05 is the conductance ratio of O1 and O2. Light, F is flux in photons per ChR2 per millisecond and e is the quantum efficiency of the rhodopsin. V is the membrane potential of a neuron (in mV), v 0 and v 1 are empirical constants equal to 40 mV and 15 mV and E ChR2 is the channel reversal potential, equal to 0 mV. The ChR2 channel's maximum conductance per unit area,ḡ ChR2 = 2.5 pS/μm 2 is multiplied by the ChR2 expression area A ChR2 to find the total channel conductance for the cell. The corresponding rate parameters are given by Table III.

B. Synapses
The synapse model is described in where I i syn indicates the total synaptic currents received by the neuron i, n is the number of presynaptic neurons, indexed by j, with their train of spike times represented by t j ,ḡ i is the maximum synaptic conductance of each postsynaptic neuron, and e is the transmission efficiency. Spike events are represented by δ ij , a Dirac-delta function, which is 1 at the time of a presynaptic spike (i.e., when t − t j = 0) or 0 otherwise. Our intention here is to explore the network dynamics rather than learning processes, however they could be included later using synaptic potentiation/depression models from [25].

C. Axons
Cable theory such as described by Wilfrid [26] can be used to simulate axonal transmission. Its incorporation would allow for more detailed timing studies between synaptic connections, e.g., spike correlated timing. However, the partial derivative calculations would increase the required FPGA resources. In this instance we believe that the cost outweighs the benefits.
As with other neural network systems, we assume that the transmission channel efficiency is 100%, i.e., no spike loss between soma and synapse. The transmission delay is one clock cycle which occurs at the end of each computing frame.
If transmission delays are important to study, e.g., for rank [27] or phase coding [28], then they are best introduced as direct network delays. Our system can be reconfigured to interpret this behaviour, but at the cost of additional memory blocks, which would reduce the maximum implementable network size.

III. NEURAL PROCESSOR ARCHITECTURE
The neural processor mainly contains three components: the computing data-path, the data generation/reconfiguration units, and the router, which are shown in Fig. 2. The computing data-path is specifically designed for calculating the previously described mathematical equations (for details see Section II-A), the data generation system aims to deliver all the required neuronal fixed model parameters to the different data-paths at the corresponding time, the reconfiguration unit is to modify the computing data-path based on the models, and the router is for implementing the network's synaptic connections.
The FPGA design utilizes 40-bit fixed-point precision, with 22 fractional bits. Therefore, the parameter's dynamic range It consists of three main parts: data generation system, reconfiguration unit, computing data-path and routing system.

A. Computing Data-Path
The computing data-path has three separate algorithm logic units (ALUs), which are shown in Fig. 3. Here ALU1 is for calculating voltage-dependent ion channel (3) and (4), ALU2 is for calculating calcium-dependent functions (5) and ALU3 is for calculating the ChR2 state variables (7)- (9). (N.B. For simplicity of implementation, the fourth differential equation for C1 is eliminated by substitution, since by conservation of the states, it is equal to C1 = 1 − O1 − O2 − C2.) Each ALU receives two types of signal: the first are the data stream signals from the data generation systems, determined by the software model parameters. The second are the switch configuration link signals from the configuration unit, determined by the software model architecture and applications. The memory data register (MDR) is applied to maintain an equal latency for the different data-paths.
Since this architecture is pipelined, ion channels are calculated sequentially. These ALUs have to perform their calculations in a specific sequence to simulate the interactions between different types of ion channels. This timing diagram is shown in Fig. 4. In this design, ALU1 calculates the voltage-and calciumdependent ion channel activity in 14 clock cycles. When I Ca results are released at time-point t 1 , ALU2 receives the values to calculate the calcium concentration, and the outputs at time-point t 2 are feedback to ALU1 for computing calciumdependent ion channels. In parallel, ALU3 calculates the ChR2 current based on the current membrane potential and light stimulation. At the 15th clock cycle at time-point t 3 , the integrator sums the outputs from ALU1 and ALU3 for the final output and the system performs the next frame calculation.
The ALU1 hardware architecture is shown at Fig. 5, which implements (3) and (4). During the process, the neural parameters (e.g., G, V) are released sequentially for calculation. A complete frame comprises of 14 clock cycles. There are also 3 switch control signals for the gate variable exponential (Sel1) and calcium calculation styles (Sel2 and Sel3). In the system the forward and backward slices are for calculating the activation and inactivation rate equations as shown in Table I. As described in Fig. 6, five different gate variable calculation styles can be selected in a system depending on the select signals. Particularly, styles d0, d1 and d3 share the common data-paths. Overall ALU1 has 4 configurations and 10 data stream signals. Specifically, when I Ca is calculated from the ALU1, the Sel signal in Fig. 3 will activate and send it to ALU2 for calcium computing. At the same time, the Sel signal will send ALU2 into an inactive state.
The data-path of ALU3 is shown at Fig. 7. The values G a1 and G a2 are pre-calculated and depend on the light irradiance. The three coupled differential equations (7)-(9) are implemented to simulate the ChR2 four-state model's dynamic behaviour. The overall latency is optimized to 3 clock cycles, and the time-step for numerical integration is set to 50 μs. For each loop, the previous state values O1 , O2 and C2 are used with the current light stimulation levels to generate the ChR2 outputs. More importantly, ALU3 is only active for 1 clock cycle in a frame (14 clock cycles) due to the parallel implementation and computing pipeline.

B. Data Generation System
The data generation system is shown in Fig. 8. As can be seen, it contains n individual units and a Finite State Machine (FSM). Each unit has one RAM cell and two program counters  Fig. 1. E and G are ion channel reversal potential and maximum conductances, while V and Ca are neuron membrane potential and calcium concentrations used as inputs.  Table II. Here, a, b, . . . , g are the ion channel gate parameters. d0 is for calculating forward variables (m) of I Na,Ka,Kdr,Ca(m) , d1 is for h variables for I Na,Ka(h) ; d2 is for I Ca(m) ; d3 is for I Ca(h) and d4 is for I Kahp(m) . Additionally a Look-Up-Table block is employed for calculating short duration Ca-dependent ion channel gate variables. The meanings of the numerical values (0.005 and 0.01) are given in Table II. (PCs). The RAM is used for storing model parameters such as activation (inactivation) rate parameters (e. g., a, b, . . . , e) and ion channel conductance. PC1 is an index of the different parameters of a neuron, and PC2 is an index of different neurons in a network. An FSM is employed as a control signal to select corresponding RAM states as output values. Specifically, the FSM decides frame and sub-frame control signals. In addition, memory address registers (MAR) are implemented based on the latency in the computing data-path. Since the system uses different sub-block RAM rather than an entire one, data management becomes more efficient and controllable. In a similar manner, the reconfiguration unit shares this technique with the data generation system.    Table for storing network connectivity, RAM for updating synaptic events, and two registers for data management. In the synaptic connection LUT, location (1→2),(1→3) stores the maximum synaptic conductance of neuron index 1 to 2 and 1 to 3, and the RAM_networking LUT records the synaptic current values at time t for neuron index 1 to index 2 and index 3.
Using this approach we note that the simulator can handle biologically realistic situations which originate from uneven light distribution and/or ChR2 expression: different light intensities for different neurons can be stored in these units as well. The PCs at the address index are responsible for sending light-dependent variables at the correct times for modelling the spatial distribution of light, while the other PCs in the reconfiguration units turn on/off the ChR2 channels to implement different levels of opsin expression.

C. Routing System
The routing system is shown in Fig. 9. Spike events from three processors are sequentially sent into shift registers for processing, and the results are fed-back individually. The basic mechanism is as follows: when a neuron spike event (1 or 0) arrives, its corresponding post-synaptic neuron location (e.g., 1 → 1, 1 → 2, . . . , 1 → n) will be addressed by the neuron index. By multiplying the synaptic strength pre-stored in the LUT and the spike event, the updated synaptic inputs are stored in the RAM block at the same location (e.g., 1 → 1 t , 1 → 2 t , . . . , 1 → n t ). After calculating the states for all the neurons in the network, the accumulator adds all the received synaptic inputs per neuron for the next frame calculation (the process happens at the last sub-frame periods). For example, for neuron index 1, all the synaptic currents (1 → 1 t , 2 → 1 t , . . . , n → 1 t ) will be accumulated and represented by (:, 1) t . Two memory data registers are implemented for storing the accumulator results. One is for sending the previous frame's synaptic inputs (e.g., (:, 1) t−1 ) to the calculated neuron, the other one is for storing the currently summed synaptic inputs (e.g., (:, 1) t ) for computing in the next frame. The frame period is the product of the total number of neurons with the processing time per neuron.

A. ChR2 Ion Channel
The individual silicon ChR2 channel simulation results are shown in Fig. 10. Light pulses of seven durations are used in this experiment: 1, 2, 3, 5, 8, 10 and 20 ms.
The FPGA simulations indicate that the developed silicon ChR2-HH neuron model behaves similarly to its biological counterpart, on which the software model is based (data not shown but can be seen in [7]). However, there are some slight differences between the model and the FPGA implementation, especially at 2 and 3 ms light pulses, which are due to the digital truncation errors and fixed-step integration.

B. Hippocampal CA3 Neurons
Voltage-dependent and [ChR2-expressing + voltagedependent] hippocampal CA3 neurons have been simulated for comparison. Fig. 11(a) and (b), show oscilloscope readings of our neuron in response to constant and pulsed electrical stimulation (duty cycle = 50% with injected current 0.1 nA): the red line is the membrane potential and the blue line represents the electrical pulses. Fig. 11(c) and (d), show oscilloscope readings of our neuron's response to constant and pulsed light stimulation (duty cycle = 50% with light irradiance 0.4 mW/mm 2 ): the purple line is the membrane potential (showing action potentials) and the green line is the ChR2 current.
A comparison between software (simulated with Matlab) and hardware firing rates is shown in Fig. 12. For an electrical stimulus, as the stimulus strength increases, the firing rate increases accordingly. When the injected current exceeds 0.6 nA, the CA3 neuron approaches its saturation and the firing rate collapses. For the light-based stimulus, the firing rate increases with light intensity (from 0.01 to 10 mW/mm 2 ) and duty cycle (from 10% to 80%). In both conditions software and hardware systems show identical and biologically realistic results.
In the simulations we used a relatively low light irradiance level (as an effective threshold) of 0.4 mW/mm 2 . This value was experimentally found to be adequate to evoke opto-neuro activity for reasonably long pulses (≥ 50 ms), whereas the value of 1 mW/mm 2 tends to be used for short pulses (< 5 ms).  In other experiments, even lower light intensities have been found to evoke a response (e.g., Mattis et al. [29] used 0.1 mW/mm 2 ) so 0.4 mW/mm 2 was a compromise. This represents the power density reaching the neuron for in vitro experiments or simulations. However in more complex experimental setups, i.e., in vivo studies, the light will be absorbed and scattered by other (non-transfected) brain tissue, reducing the effective power density at the target neurons. In that case approximate calculations of the true power density just require a multiplicative correction factor, which would need to be determined by experimental measurements. Several studies have modelled these attenuating effects and produced software to simulate and calculate them [40], [41] and there is even an iPhone app called Optogenetics Pro for the purpose [42].

C. Optogenetically Transfected Neural Network
We simulated a 25-neuron opto-neural network. Each neuron receives different light stimulation as shown in Fig. 13(a). Each neuron randomly connects to 16-17 neurons on average with maximum synaptic conductance of 0.01 nS/μm 2 . The unconnected neural responses (i.e., no network connectivity) are shown in Fig. 13(b). As expected, this is, similar in response to that with the original irradiance patterns: only five neurons with light-stimulation above threshold (0.4 mW/mm 2 ) had significantly elevated firing rates, while the others remained silent.
The network dominating condition is shown in Fig. 13(c). In this case, the synapses are all excitatory, i.e., no negative feedback. It can be seen that the average firing rate is 45 Hz and the light pattern can no longer be seen in the spatial distribution of neural responses. In this scenario, the irradiance pattern has an effect, but on the overall firing rate rather than a spatial pattern of activity, which is dominated by the synaptic connections. Fig. 14 shows an interesting example of activity in Neuron (1, 1) where the two scenarios above are moderated such that the firing behaviour is determined by both surrounding network activity and the pattern of optical stimulation. That is, the optical stimulus on its own would not produce such significant neural activity.

A. System Scalability
We implemented different numbers of neuron on the FPGA processor to test the system's scalability by measuring the wall- Fig. 14. An example of an opto-neuron with (red trace) and without synaptic connectivity (grey trace) exhibiting super-threshold firing and sub-threshold behaviours respectively. Fig. 15. The speed performance of the neural processor implementing different network sizes. At cross point A, the network performance shows non-linear behaviour rather than linear behaviour. At cross point B, the individual router processing periods will be longer than the processor's. At cross point C, the maximum neuron number that can be implemented on the processor for realtime computing is found to be 500 and takes 9.7 ms (assuming that the fastest biologically-realistic firing frequency is 100 Hz).
time required for the system to generate a single spike (subframe). As shown in Fig. 15, the processor wall-time increases linearly with the number of neurons (blue line). This is because the calculations are sequential. In contrast, the router processing time depends exponentially on the number of neurons due to the memory based approach (where all the connections are prestored in the LUTs). At cross point B, the routing computing period exceeds that of the neural processing. At cross point C, the maximum number of neurons which can be implemented on the processor for real-time computing can be seen to be 500, for which the simulation time is 9.7 ms (assuming the fastest biologically-realistic firing frequency is 100 Hz).
Specifically, with fewer than 45 neurons, the network simulation time equals the processor time. This is because the processor and router compute in parallel in the hardware, and the routing period of a frame is less than a processor subframe period. However, with more than 45 neurons, indicated at cross point A, the system transitions from scaling linearly to non-linearly with the network size (neuron number). This is because the router requires more time for routing tasks compared to the processor's sub-frame periods at this stage, meaning that the processor has to wait until the router finishes its current frame tasks. Therefore, the system simulation performance will mainly depend on the router itself. Overall, the system performance exhibits a linear relationship to network size when it is below 45 neurons, and displays a non-linear relationship for more than 45 neurons (shown by the black line).

B. Comparison With Other Work
Comparisons between this work and previous FPGA neuron implementations are shown in Table IV. HH* indicates that a HH based model with three compartments, and HH+ represents our optogenetic-calcium enhanced model. Compared to the previous work, the major novelty of the presented work is that we include long-and short-term calcium-and light-dependent ion channels in the system. This enables our implementation to produce more biologically realistic behaviours when compared to other abstract models.
The neuron model itself exerts a major influence on the hardware architecture design. General models with strong biophysical meaning have smaller time steps than mathematically abstract models: Izhikevich [30] and LIF [31] models have 1 ms time step while HH [21], [22] based models have time-steps ranging from 0.001 to 0.05 ms. This is because complex neural models require higher integration step resolution to compute the detailed ionic dynamics. As a result, the number of hardware operations for 1 ms of biological time in bio-physical models is significantly larger than for the high-level phenomenological neuron models: LIF and Izhikevich hardware implementations take only 30 and 13 operations to simulate 1 ms of biological time, whereas the HH model with 3 compartments and our model require 22,200 and 11,880 operations respectively for the same period.
Another vital issue concerns the implementation of neural communication on the hardware. There are two major approaches for this: memory-based and routing-based. The memory-based approach uses on/off chip memory for prestoring network connections. For each computing loop iteration, the neuron spike events will be sent to their postsynaptic-neuron targets according to their address packages (e.g., neuron and synaptic indices). Similarly to Cheung et al. [30], our design also follows this principle. It enjoys low latency and simple hardware design, but memory resource/bandwidth limits will be reached when the neuron number exceeds a certain threshold (dependent upon the resources of a particular FPGA). The other approach is to use a network-on-chip architecture; a tailor-made routing strategy implemented to deliver multi-core spike events in a system such as the SpiNNaker platform [32]. Our previous work [33], [34] employed this approach to implement cerebellum model [30] connections. In addition, Randall et al. [18], Andrew et al. [31] and Smaragdos et al. [17] implemented an all-to-all connection through their custom-designed techniques.
In prior work, different designs have used different methods to assess their relative computing performances. It is therefore hard to directly compare system speed and efficiency: Graas et al. [19] proposed increasing the FPGA clock frequency and the step size for a speed up of 40× real-time; Cheung et al. [30] designed an event-driven and fully pipelined architecture for 2.48× real-time; Smaragdos et al. [17] optimized their HLS C-code for 12.5× real-time. Fully pipelining and shortening the critical path are employed in our system speed optimizations.
There are also several different hardware platforms such as Spinnaker [32], Neurogrid [36], IFAT [37], and GPU [16] for neural modelling. Each system has strengths and weaknesses in particular areas. For example, Neurogrid and IFAT are mixedsignal based architectures that are less reconfigurable but enjoy elegant design and efficient power consumption.

C. Applications
The developed hardware can serve as a multi-functional platform to investigate optogenetic related topics. Some of these potential applications are summarized in Table V.
The first application is the investigation of optogenetic actuators such as channelrhodopsin, halorhodopsin [38] and archaerhodopsin [39]. Depending on the required model, the ChR2 computing block can be easily re-configured to model other opsins by updating its parameters and configuration signals.
Also, since an optical-neural interface system [23] is hard to verify due to the complicated nature of the experiments, it would be useful to develop optogenetic hardware (e.g., optrode) functionality by using silicon networks at first. This will greatly speed-up development and improve the hardware success rates before investing time in biological experiments.
Finally, as mentioned, society faces important challenges in fully realizing the potential of optogenetics as a method, such as how to translate therapeutic or scientific network stimulation into a particular three-dimensional light pattern. We hope that the developed processor will prove to be a reliable tool with which to address those challenges.

D. Future Work
One of the main areas for further development will be in developing new techniques for system optimization. For example, the natural communication in biological systems tends to be asynchronous and event driven. Therefore, an asynchronous communication protocol [43] coupled with an event driven approach [44] may potentially make the system more power efficient. Furthermore, sharing the common computing-path [45] (e.g., ALU1) and optimization of the neural network modularity [46], [47] will result in utilizing less hardware resources. Finally, multi-core architectures [48] represent a promising way to scale the number of implemented neurons towards brainscale sizes with real-time computation.

VI. CONCLUSION
In this work we have designed and implemented an FPGAbased neural processor for real-time simulation of opto-neural behaviour. The developed neural processor can successfully reproduce the photo-kinetics of mammalian neurons expressing optically active ion channels [7] in a biologically realistic neural network model. It only requires 0.03 ms for a single neuron and 9.7 ms for a fully connected 500-neuron network to generate a spike. Therefore the system, with its real-time computing performance and highly biologically-realistic behaviour, can be applied in many ways as a powerful tool for multidisciplinary researchers in the field of optogenetics.