Using the Nios II processor system without the Nios II processor core

In the last block of articles about the complex for remote debugging of Redd, I showed that working with it is not only working with FPGAs. Not only that, FPGAs are just a very interesting, but still very specific part of the complex. Its main part is FTDI bridges and other USB things. The topic did not arouse much interest, but nevertheless, now everyone knows which particular hardware is in the complex. And again we can return to the consideration of an interesting topic - FPGA.

We continue the tradition of the previous block and continue to look for optional parts. Today we will learn how to do without the Nios II processor core. Yes Yes. In the Nios II processor system, the processor core itself is an important, but not mandatory element. We will practice making the system without it, taking all control functions to the level of the central processor of the Redd complex.




Why is the processor core necessary and why not


As I already noted in the theoretical part of one of the previous articles , when working with the Redd complex, all non-standard equipment is an auxiliary thing. It is used short-term, only at the project stage and is not alienated to the Customer. Accordingly, it is unlikely that individual developers will be allocated to this. And it makes no sense to work with him, to study too many additional things.

That is why, instead of the development approach through Verilog (or VHDL, or some other specific language), I suggested just creating a Nios II system based on ready-made devices, connecting them with buses. Instead of control machines, I chose the processor core, since the development language for it is closer to modern system programmers. Of course, if there are not enough cubes to create a system, the missing ones have to be added. But then they can be reused in other projects, so the finished library will be typed very quickly.

This approach has many advantages. And actually, we have already fixed this approach in practice. But when I think over new examples that I want to bring to the discussion, the lack of FPGA memory comes to the fore. I would like to give more memory for FIFO, but the program and data for the processor also need to be stored somewhere. And the fourth Cyclone does not have that much memory. Everyone may not be enough.

The second problem - we already made a system for communication between the Nios II processor and the central processor in one of the previous articles . It also takes up memory for its FIFOs, plus - its use clutters up the pictures, hiding the essence described in the article. If you can refuse this case, it is better to refuse (though this refusal is not always possible).

But who will manage the system? And how can data be thrown without a processor core? Meet me! Component Altera JTAG-to-Avalon-MM .



On one side, it is connected to the standard JTAG bus (namely, the standard one, you do not need anything external, it is enough to work through the standard USB-Blaster). The other is connected to the AVALON bus, providing access to all devices. Exactly the same access was provided by the processor core. Initially, this unit was designed for the initial debugging of its own cores in order to whip up from the host machine. But no one forbids using it for other tasks, because the JTAG adapter in the Redd complex is constantly present!

Of course, a system based on this block cannot be productive for transactions. The JTAG channel is not fast in itself, but also the USB-Blaster adapter used in the Redd complex based on the microcontroller makes it completely leisurely. But how to make a quick wizard - we already know. No one forbids for each case to choose what is more convenient for him.

In projects that are planned for future articles, management speed is not critical. There will be an analyzer, which will be given the command to copy data to SDRAM, after which it will go into itself until the RAM is full or until the command stops working. These commands will be executed in microseconds or tens of milliseconds - there is not much difference.

Well, data transfer through JTAG will go slower than through FT245. But then again, we already know how to pump everything through the FT245. Knowing the new option does not forbid us to choose the old one. But then, having become acquainted with the new option, we will get the opportunity to make the system even slower, but easier to develop and more economical in terms of resources. More techniques, good and different! Therefore, we proceed to the experiments.

Hardware


About the manufacture of hardware (as opposed to software) in the network there are many articles. Typically, in all examples, the internal FPGA memory, LEDs, and buttons are added to the system. Our organization, like the whole country, has now urgently gone to a remote place, so I don’t even have anyone to ask to connect the oscilloscope to the complex connector, so in the example I will limit myself to only memory. Who will repeat the experiments on ordinary breadboard models can add a GPIO to the output to control the LEDs, and a GPIO to the input to watch the operation of the buttons. We create a processor system (how I do it, I described here ), but instead of the processor we put the Altera JTAG-to-Avalon-MM block . He is taken here:



Well, we add memory so that there is at least something on which we will check the work today. Next, we connect all this with buses, automatically assign addresses and get something like this:



Please note that the Reset signal is switched very non-standard. This is due to the fact that I do not want to spend time on its physical production.

Actually, for today's equipment - that's all. We generate the system, assign the clk leg (we do not use other legs today). I remind you that I make the reset leg virtual (Virtual Pin), the procedure is described in this article (search for the phrase Virtual Pin). As a result, it should turn out:



We collect, fill in the FPGA ... We proceed to the experiments.

Programming via TCL scripts


Further, all the textbooks that talk about working with the Altera JTAG-to-Avalon-MM block say that work should be done using TCL scripts. Before rushing into the abyss of the unknown, it is better to get your hand on a well-documented one, so we will not be an exception and we will start experiments with them too. About what TCL is, there are many articles on Habré. I will give a link to one of them, since it is associated with the FPGA: https://habr.com/en/post/308962/ . The Intel FPGA (former Altera website) has a whole online course for working with this language. To learn how to issue commands on it, in the development environment, you can run the System Console:



As a result, a gorgeous environment will open.



It can work wonders, up to the development of GUI applications. But within the framework of the article we will touch on this matter only superficially (why - I will explain a little later). We just try to give a couple of teams. Which teams? To do this, you need to find and download the document Analyzing and Debugging Designs with System Console (as always, I give names, but not links, since links always change).

We will try to experiment with the teams from this group of the document:



To begin with, take a look at the messages issued to us. Since I downloaded the “firmware” remotely, the system itself found a remote JTAG server:



We will issue the commands in this window:



First, let's try to write a constant to the memory. To do this, you would give the command:

master_write_32

but the trouble is that its first argument is <service_path>

Therefore, before submitting a useful command, you must first learn how to receive this service_path . We are looking for the appropriate site in the documentation:



Great! We try to submit the appropriate command using the example from the document.

% get_service_paths master
/devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master
%

Somehow it's hard to enter manually. Option:
% lindex [get_service_paths master] 0
/devices/10CL006(Y|Z)|10CL010(Y|Z)| ..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master
%

Not any better. It would be useful only if there were several devices.

Good. Let's try the full version of the example from the document:

% set m_path [lindex [get_service_paths master] 0]
/devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master
%

And now - try the record:

% master_write_32 $m_path 0x0 0x01234567
error: master_write_32: /devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master is not an open master service
    while executing
"master_write_32 $m_path 0x0 0x01234567"
%

We continue to study the instructions ...

% open_service master $m_path

%

Will it work now?
% master_write_32 $m_path 0x0 0x01234567

% master_write_32 $m_path 0x4 0x89abcdef

%

No error messages. We try to read:

% master_read_32 $m_path 0x0 0x2
0x01234567 0x89abcdef
%

Works! We learned to write and read memory. Device registers are also projected onto it, so in the future we will be able to manage devices.
Total, the correct sequence of actions looks like this:

set m_path [lindex [get_service_paths master] 0]
open_service master $ m_path
master_write_32 $ m_path 0x0 0x01234567
master_write_32 $ m_path 0x4 0x89abcdef
master_read_32 $ m_path 0x0 0x2

Those who are interested in the capabilities of the System Console and the TCL language can study descriptions, examples, and video materials, which are quite numerous on the network, and within the framework of this article, I will pass on to what I personally could not find on the network ...

Moving from TCL to C ++


TCL is a fairly powerful language. It is always useful to study it, especially if you plan to work closely with FPGAs. In general, fluency in another powerful programming language is a big plus for the developer. But this is the whole. And in particular, the need to know another language runs counter to the principles that were formulated in this article . We believe that the Redd complex is an auxiliary complex that should be easily mastered by system developers in broad-based languages. That is, proficiency in an additional language is always welcome, but should not be a prerequisite.

So I decided to find how you can work with the JTAG bridge from the good old C / C ++. Perhaps I'm blind, however, having spent a lot of time on exercises with the compilation of search queries, I have not achieved anything worthwhile. The most interesting thing is that many of the requests gave me a link to my article about the fun quartel , but it shows how the System Console is called from a Java program, and the script file must be created in advance, which will obviously spoil the performance even more, and without that not very high due to the JTAG serial link and the slow adapter. Nevertheless, those who work in Java can refresh that article in memory and adopt methods.

Finding nothing, I decided to use the principle shown in one of my articles from the PSoC series. The essence of the approach is simple: if there is no documentation, we study what comes with the delivery. And complete with Quartus is a lot of TCL scripts. It was by studying them that I was imbued with the power of the language. But on the other hand, I was imbued with the thought of how much time will be spent on its development. Yes, now there is not much work, but, firstly, it is still there, and secondly, when you read these lines, I really hope that the world has returned to high-speed life.

In the end, I found a very interesting file: C: \ intelFPGA_lite \ 17.1 \ quartus \ bin64 \ tcl_server.tcl

Judging by its contents, it creates a server through which you can call functions that we have already accessed. That is, we can well write our client, which will connect over the network and begin to issue commands ... This client can be written in my favorite C ++. That is, the issue will be resolved. Therefore, we proceed to the analysis of this file.

Experiments with the tcl_server.tcl file on the local machine


We continue to move from a well-documented to a hidden goal in the dark. We are trying to learn everything from Windows (on my home machine), however, already being connected to a remote JTAG server (on the Redd complex in the office).

At first, the investigation went the wrong way. There is such an interesting design in the file:

if { [info exists env(QUARTUS_ENABLE_TCL_SERVER)]  } {
	if { $env(QUARTUS_ENABLE_TCL_SERVER) == 1 } {
		_q_setup_server $_q_port
	}
}

Of course, I drove the word QUARTUS_ENABLE_TCL_SERVER into a search engine. There were not many results. Only two. Of these, only one is worthwhile: https://www.intel.com/content/www/us/en/programmable/quartushelp/13.0/mergedProjects/eda/synthesis/synplicity/eda_pro_synplty_setup.htm .

Overjoyed, I created the appropriate environment variable, but the netstat –a command showed that no server appeared on port 2589.

Good. Then I tried to run this entire script through the System Console, using this menu item:



And I got an error:



I am a simple person. Not finding on my machine a description of this disgrace (but finding it on github and suspecting that we had a simple message output), I made a copy of the script file and deleted the corresponding line:



Along the way, I removed the check for the environment variable QUARTUS_ENABLE_TCL_SERVER described above , since anyway we have to run the script manually.

After which the script was launched, the listening port 2589 appeared in the system. You can connect to it via Telnet. Once again I remind you that at the moment I am conducting experiments on a Windows home machine. We will move to the remote machine a little later. So, we are connected:



We receive an empty terminal window. Having knocked a couple of times on the key, we see that they hear us, but do not understand:



We ask for help.

help
1 couldn't find help for command . Try help help.

Already better. Clarify the request.

I will show only a fragment of the answer:

help help 
<…>
get_service_paths
get_service_types
get_services_to_add
get_version
<…>
master_read_16
master_read_32
master_read_8
master_read_memory
master_read_to_file
master_write_16
master_write_32
master_write_8

Familiar words! Well, let's try to enter the line we already know:

get_service_paths master
1 Error: Invalid command

Sorry! But we were just allowed to issue the get_service_paths command !

We check the script for the presence of the Invalid command line . And she was discovered there!
About the middle of this design:

	set ecmd [lindex $line 0]
	if { $ecmd != "project" && $ecmd != "device" && 
	$ecmd != "cmp" && $ecmd != "sim" && $ecmd != "show_main_window"
	&& $ecmd != "hide_main_window" && $ecmd != "get_version" 
	&& $ecmd != "help" && $ecmd != "convert" 
	&& $ecmd != "import_assignments_from_maxplus2" } {
		set res "Error: Invalid command";
	} elseif [catch { set res [eval $line] } result] {
		set res "Error: $result";
	} 
	if { $res == "" } {
		set res " "
	}

It turns out that this script only processes a small number of commands from the terminal. This is actually a very useful check. In the future, we will be able to do the processing of script commands in this place. As we master the TCL language, we will be able to bring more and more code to the script level, and only call it from an external program (this will obviously be faster than driving all the lines through the network). Well, those commands that are not recognized are still just thrown directly into the interpreter. Ideally, we will slowly study TCL and gradually move on to it completely. If this works out.

In the meantime, stop the execution of the script (closing the System Console) and replace the specified section with this:

	set ecmd [lindex $line 0]
	if [catch { set res [eval $line] } result] {
		set res "Error: $result";
	} 
	if { $res == "" } {
		set res " "
	}

We run the script for execution, connect, try to run our reference sequence of commands:

set m_path [lindex [get_service_paths master] 0]
1 /devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master
open_service master $m_path
1 Error: can't read "m_path": no such variable

Variables made by us are forgotten. On the one hand, disgusting. But on the other hand, we are going to feed the lines programmatically, so packing the name into an argument is not so problematic. We are trying to submit a detailed command where not a variable will be used, but its value.

open_service master /devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master 

We got the result 1. Excellent! Continuing the experiments!

master_write_32 /devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master  0x0 0x01234567

master_write_32 /devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master 0x4 0x89abcdef

master_read_32 /devices/10CL006(Y|Z)|10CL010(Y|Z)|..@1#..@1#1-5.4.2.1#192.168.10.146/(link)/JTAG/alt_sld_fab_sldfabric.node_0/phy_0/master_0.master 0x0 0x2

We get the answer:

1 0x01234567 0x89abcdef

1 is the result of executing a function. Next are the read data. That is, access through the network is not ideal, but there is!

Experiments with tcl_server.tcl on a remote machine


In general, we can already run the program and start working with the processor system, but we add network slowness to the slowness of the JTAG equipment. Of course, it is better to run the working program on a remote machine so that it works there with the localhost address. So it will be clearly faster. And just then, the execution will go on the central processor of the Redd complex, which was originally planned.

Which file to run? Break some amount of documentation, find out what quartus_sh is . Good. While we will not have fun with scripts, but, as in Windows, we will try to issue commands in interactive mode. Looking ahead, I’ll say that it will be fun ...

So, we give the command:

user@redd:~$ sudo /opt/intelFPGA/18.1/qprogrammer/bin/quartus_sh -s

We are offered to submit tcl-lines:



But our favorite commands lead to an error:

tcl> get_service_paths master
invalid command name "get_service_paths"

It turns out that you need to load the missing library by submitting two commands:

load_package systemconsole 
initialize_systemconsole

True, the following team that is already familiar to us will give an error even with libraries:



Oh, and for a long time I was looking for a reason! I rummaged through all the libraries, rummaged through a sea of ​​articles on search engines, there is no such team. But there are others. But they do not work. But at least there is. I accidentally found an inconspicuous document with a solution. Turns out we need the claim_path command . Hello and cross-platform again and again.

This sequence will lead us to success:
set m_path [lindex [get_service_paths master] 0]
set claim_path [claim_service master $ m_path mylib]
master_write_32 $ claim_path 0x0 0x01234567
master_write_32 $ claim_path 0x4 0x89abcdef
master_read_32 $ claim_path 0x0 0x2



Checking via Telnet


In the interactive mode, everything works, we check the work through the script that creates the server. Add the missing lines to it (the very two lines loading the library) and run:

user @ redd: ~ $ sudo /opt/intelFPGA/18.1/qprogrammer/bin/quartus_sh -t tcl_server1.tcl

It crashes. The listening socket does not appear. It turns out that we need to add a line that does not allow us to exit the server open function:



Same text:
proc _q_setup_server {port} {
	global _q_lsock;
	if [catch { set _q_lsock [socket -server _q_accept $port] } emsg] {
	}
	vwait forever
	return;
}


We run the correct script and try to connect to the resulting server via Telnet:



We issue such commands (the commands are in bold, the answers are in plain text, as in the case of Windows, we substitute the phrases from the previous answers for the variables):

lindex [get_service_paths master]

1 {/ devices /10CL006(Y|Z)|10CL010(Y|Z)|..@1#1-5.4.2.1/(link)/JTAG/(110:132 v1 # 0) / phy_0 / master}

claim_service master {/ devices /10CL006(Y|Z)|10CL010(Y|Z)|..@1#1-5.4.2.1/(link)/JTAG/(110:132 v1 # 0) / phy_0 / master} mylib

1 / channels / remote1 / mylib / master_1

master_write_32 / channels / remote1 / mylib / master_1 0x0 0x01234567

1

master_write_32 / channels / remote1 / mylib / master_1 0x4 0x89abcdef

1

master_read_32 / channels / remote1 / mylib / master_1 0x0 0x2

1 0x01234567 0x89abcdef

Everything works!

And then the forester came and dispersed everyone ... Or not all?


While I was working on the article, I managed to find an interesting technique that does not require writing a script that provides access through the network. C has a popen function (in Windows - _popen ). It allows you to open a child process from the console program by capturing its input or output stream. As a result, on Windows, you can open the system console with the following arguments:

C: \ intelFPGA_lite \ 17.1 \ quartus \ sopc_builder \ bin \ system-console.exe --disable_readline –cli

And you can issue commands without any network. But the trouble is that I did not find how to send commands and receive answers at the same time. The argument to the _popen function can only be r or wbut not both at once. There is a technique specific to Windows, but it will be far from cross-platform. It is possible, as in the article with Quartusel, to submit scripts as an argument to the system-console program , and to take the answer as a stream, but each launch will not be very fast ...

In Linux, initializing the functions of the system console is also slow, so it is better to do it once. It seems that Google says that in Linux the popen function has the argument “r +”, which opens a bidirectional channel. Then we can open quartus_sh and communicate with it, but I'm not very strong on Linux, and Google finds links to forums where there is not a solution, but heated debate about whether it is possible in all assemblies.

Therefore, I will not bring the program today. Implementation of Telnet functions may be unnecessary, but I do not know how to make an ideal program that intercepts both streams. If someone provides ready-made solutions in the comments, I will be grateful. First of all - under Debian, so as not to drive large streams of data over the network. In the meantime, we restrict ourselves to a working concept.

Conclusion


We got acquainted with the methodology for developing processor systems based on Nios II that do not directly contain the processor core. Such systems cannot be recommended as universal, since their speed leaves much to be desired, but in some cases the speed of the control core is not an important factor. But the achieved savings of internal RAM FPGA in such systems comes to the fore. It is for them that the described technique is recommended.

The classic approach to programming such systems is TCL scripts, but within the framework of the concept of the Redd complex, an approach for programming in C ++ was developed and described in the article.

The source tcl_server.tcl script from the Quartus version 17.1 Lite distribution kit can be downloaded here . The version obtained after editing -here . An example project for FPGAs can be taken here .

All Articles