20150412

Play with VMFlexArray

I explained VMFlexArray in my last post, and went to some effort to get it approved as a topic for this years Google Summer of Code, kindly under the umbrella of the GNU Project, Classpath, and IcedTea.

Judging from the fact that only one Google Summer of Code student proposed to work on VMFlexArray, it seems as though the concepts might be difficult for students to conceptualize.

For that reason, I built a small VMWare image that anyone can freely download to experiment with VMFlexArray.

In the process of doing so, I also converted FB4J to use JNA instead of JNI, and I'm very glad I did. The average refresh rate has now gone up by a significant factor to above 40 fps, and I haven't even begun to optimize anything.

The way I would suggest someone begin experimenting with VMFlexArray is as follows:
  1. Download and uncompress the VMWare image (there is a .vmdk inside the .vmwarevm folder for those who do not use Mac OS X / VMWare Fusion)
  2. Run the VMFlexArrayLinux virtual machine
  3. Log in with user 'root' and an empty password
  4. Run 'ifconfig eth0' and write down the IP address
  5. Open a terminal session on the host OS and ssh into the IP address from above (i.e. ssh root@[ip_addr])
  6. run the demo ./fb4jdemo
You should see a white background and some colourful balls bouncing around on the VMWare's virtual SVGA II device.

I hope you're curious enough to begin to dissect the demo being run.

After some investigation, it should become obvious that JamVM and Classpath are being used under the hood for java. One might also notice that there is a java compiler (javac) in the image courtesy of Eclipse Compiler for Java (ecj). Some clever people will probably also notice /var/db/pkg.sqfs, which is a SquashFS version Gentoo's database of installed packages (tying all installed free software back to source repositories).

Then, please take note that there is a portage overlay located at /usr/local/portage/java_overlay, as well as a patch located in /usr/src. The portage overlay represents changes required to demonstrate VMFlexArray using jamvm and gnu classpath. The patch in /usr/src represents the one patch required to perform double-buffering using the VMWare frame buffer.

I went to some length to ensure that my patch set was easy to reproduce. Therefore, all required changes to JamVM are located in my feature/vmflexarray-demo branch, and all required Classpath changes are located in my feature/vmflexarray-demo branch. Also, all required changes to JNA-Posix are in my branch feature/ioctl.

My updated FB4J changes are currently in the feature/jna branch. These changes mean that FB4J does not require any of its own native components. They will be merged into master eventually, but I just need to a) clean up the README, and b) possibly machine-generate enums and constants from /usr/src/linux/fb.h

This is where it gets fun.

The next task in understanding VMFlexArray, for anyone who is curious, is to take the Fb4jDemo code and modify it to draw a different video effect. The good news is that there is a built-in java compiler (ecj) in the VMWare image, so that's rather easy.

Next, do something with FB4J that I haven't done yet - use it to read a Video 4 Linux device ;-) This will require you to roll your own kernel and add UVC, V4L, and other modules in.

Here's a screenshot of my latest run. Don't forget to download the compressed VMWare image to experiment with from here.
Happy frame buffer drawing!

In the mean time, I will continue to experiment on my end. Once I have some free time, I might attempt an OpenJDK port of the VMFlexArray changes.

Please note that the software is still very beta - e.g. I am currently getting an undiagnosed segfault after some time. I believe I ran into this problem before I did a major cleanup of the code, and will review it when I have some time.



20150217

MappedByteBuffer.hurray()!: Programming the Linux Framebuffer in Java & VMFlexArray Explained

I recently travelled to Belgium to participate in FOSDEM. This year, I gave two presentations:
  • MappedByteBuffer.hurray()!: Programming the Linux FrameBuffer in Java & VMFlexArray Explained. See here.
  • Internet of #allthethings: Using GNURadio Companion to Interact with an IEEE 802.15.4 Network. See here.
The topic of this post will focus on the former. Specifically, VMFlexArray.

Currently, there is no Java Virtual Machine in existence that allows a developer to reference off-heap memory regions as Java arrays - e.g. via byte[] or int[]. That is, all java arrays that the VM deals with must be contiguously allocated at instantiation time.

What I mean by that, is that when the VM instantiates an integer array object, e.g. int[] x = new int[ length ], it typically allocates memory (now careful, I'm going to use some C teriminology here) for an object struct (2 uintptr_t in JamVM) which represents the instance of the int[] object, followed by 1 uintptr_t, which represents the length of the int[] object, followed by exactly length uintptr_t items (on a 32-bit machine) or length / 2 uintptr_t items (on a 64-bit machine) to represent the data.

VMFlexArray


VMFlexArrays are slightly different. For the same case as above, where a new int[] is allocated on the Java heap, the VM would allocate an object struct (2 uintptr_t in JamVM) which represents the instance of the int[] object, followed by 1 uintptr_t to represent the length of the int[] object, followed by 1 uintptr_t to point to the int[] data, followed by the data itself.

What makes VMFlexArrays different, and what makes them flexible (and arguably way better than what most JVMs use today) is that they include that extra uintptr_t to point to the data which could exist anywhere in virtual memory. That means, obviously, VMFlexArrays can point to contiguous data that the JVM would allocate for a regular array, but it also means that it can point to an arbitrary location - and still cooperate with the garbage collector. Indeed, the object lifecycle remains unchanged for VMFlexArrays if the garbage collector avoids releasing memory regions with free(3) if the VMFlexArray pointer does not point to the next contiguous memory address.


VMFlexArray is a solution I came up with that allows one integrate off-heap memory regions into the Java Virtual Machine - e.g. a native external thread that allocates memory using malloc(3), or pages mapped from a device such as /dev/video0 using mmap(2).

Buffer Views


Perhaps the aspect of VMFlexArrays that I found most useful, that I somehow forgot to mention during my talk, is that they rather trivially allow the following code snippet to work as expected. Specifically, an IntBuffer derived from a ByteBuffer with a backing array should be able to provide an int[] backing array view of the same virtual memory.


Currently this code, which should work pretty seamlessly, fails miserably.


ByteBuffers allow themselves to be viewed as IntBuffers or LongBuffers or ShortBuffers. Pretty brilliant! Well... if it worked it would be brilliant. The fact is, as shown above, one cannot wrap a byte[] into a ByteBuffer, view it as an IntBuffer, and then call IntBuffer.array() to get an int[] view of the original byte[]. That would make the NIO API complete, in my opinion, and this feature is sadly lacking.

With VMFlexArrays, that problem is solved.


I've even used this code to memory map the Linux FrameBuffer and animate a bunch of bouncing balls :-) It works quite well.


There's even a massive speedup associated with access to the underlying byte[] from a ByteBuffer and even more so viewing the ByteBuffer as an IntBuffer, with access to the underlying int[].


I am definitely interested in enabling these changes to make it into OpenJDK, and I feel that the community at large would benefit greatly from them. As my time is rather limited these days, I might prefer to mentor a student to make these changes in the Google Summer of Code, 2015, if OpenJDK was a mentoring organization. Otherwise, I would be open to mentoring a student under the umbrella of JamVM or GNU Classpath as a mentoring organization.

20150208

Internet of #allthethings: Using GNURadio Companion to Interact with an IEEE 802.15.4 Network

I recently travelled to Belgium to participate in FOSDEM. This year, I gave two presentations:

  • MappedByteBuffer.hurray()! Programming the Linux Framebuffer in Java & VMFlexArray Explained. See here.
  • Internet of #allthethings: Using GNURadio Companion to Interact with an IEEE 802.15.4 Network. See here.
The topic of this post will focus on the latter.
The gist of my talk was essentially that we have all of the tools available for us to quickly prototype all sorts of 802.15.4 devices. All that is needed is to integrate the following:

  • FreakZ: A BSD-licensed ZigBee stack (for non-commercial purposes)
    • Easily modified to communicate to a GNURadio device via UDP (github)
    • Note: this stack is not certified.
  • GNURadio
  • GNURadio IEEE 802.15.4 Out Of Tree (OOT) module
    • gr-ieee-802_15_4 is available today
    • based on work originally from UCLA
    • unofficially meets all of the mandatory requirements for the IEEE 802.15.4 PHY layer
    • meets some of mandatory requirements for the IEEE 802.15.4 MAC layer
    • lacking mandatory MAC features such as
      • Beacon Management
      • Receive Beacons
      • Channel Access Mechanism
        • Carrier Sense Multiple Access with Collision Avoidance (CSMA-CA)
      • ACK Delivery
      • Security
      • Orphan Scanning
      • Store One Transaction
The primary barrier-to-entry for developers & researchers is most likely going to be the cost of an SDR. Even after buying an SDR that is capable of sampling at a sufficiently high rate around 2.4GHz, it still requires some minimal amount of investment in other 802.15.4 equipment such as ZigBee enabled thermostats, light bulbs, or gateways (I am only aware of ZigBee consumer products in the IEEE 802.15.4 market today).

To assist would-be developers in overcoming that hurdle, what I have done is simply used my USRP B200 to record real-world 802.15.4 traffic produced from an EM370 in NodeTest mode. This should easily facilitate offline signal processing using e.g. GNURadio (see File Source block), Matlab, Octave, or any other programming language. The block diagram for doing so is depicted below. I have intentionally made all of my variables directly obvious.


Keep in mind, that the the files are rather large (433 MB compressed with LZMA2) as the samples are complex-float32 and I have oversampled at a rate of 4x (8M samples per second) intentionally to better facilitate SDR receiver design. You may find them here.

The files are listed and described below:

  • ieee802154-channel14-txtone-complex-float32.dat
    • simply recording a tone at 2420 MHz in the presence of noise
    • note: there is a slight frequency offset which will need to be corrected


  • ieee802154-channel14-txstream-complex-float32.dat
    • a continuous random stream of valid channel symbols

  • ieee802154-channel14-tx-complex-float32.dat
    • a stream containing intermittent & full IEEE 802.15.4 frames
    • frames are sent once every 25500 us

A Few Notes About the Current State of IEEE 802.15.4 in GNURadio

All of the open-source PHY implementations assume that Symbol and Timing Recovery (STR) is already performed. This is fine for simulation (depicted below).

Indeed, clock recovery, frequency offset compensation, and phase offset compensation are often the most complicated part of real-world wireless receiver architectures. Without frequency compensation, the constellation diagram appears to move around the unit circle, as shown below.


For those who would like to get started quickly, you may follow the PSK Symbol Recovery tutorial. Use a Polyphase Clock Sync block, followed by the "blind" Constant Modulus Algorithm (CMA) equalizer, followed by Costas Loop. The Polyphase Clock Sync takes the taps for the matched filter as an argument, the filter length, as well as the number of samples per channel symbol, and returns a configurable number of samples per symbol with a (somewhat) frequency-corrected clock. The CMA equalizer then forces all of the samples onto the unit circle, and finally the Costas Loop corrects the phase of the signal. This approach works well enough but it has some associated complexity. The clock recovery, frequency, and phase compensation work independently. It has been suggested that using the Least Mean Squared Decision-Directed (LMS-DD) equalizer could improve performance. The Polyphase Clock Sync block can be more accurate, sacrificing time in the receive chain due to increased complexity.

However, since we already know the preamble of an IEEE 802.15.4 packet in the 2450 MHz ISM region, we are at liberty to implement a more sophisticated coherent architecture in our receiver. Specifically, due to the known preamble, we may use a Correlate and Sync block.

I will be doing a bit more experimenting in the coming days and will post an update once available.