Linux Kernel Project Update: End of Semester Wrap-up

It’s been a fun summer to work on this project and I want to thank everyone in Microsoft and RCOS that helped to support us!

My main takeaway is that OS dev is really hard and that after this project, I don’t really want to touch it again. I did, however, learn an incredible amount about a variety of things in the kernel including how drivers work, how devices communicate, SPI interfaces, and much more.

I agree with Max that Arch Linux may have been a poor choice to start with. During the summer, I’ll continue working on this to both add additional functionality and to document the install process on other flavors of Linux.

I’m glad that we were able to work with external contributors (cb22, roadrunner2, and more) to get the keyboard and trackpad working as that really sets the stage for further work. There are still quite a few things left that aren’t working: suspend/resume, speakers, bluetooth, and webcam, with the first being miles more important than the rest.

Here are all of the loops that we have left to close:

Generally:

  • How do we eliminate the need to recompile the kernel when fiddling with DSDT?
  • What are the side effects of disable LPSS? Can we rework something in LPSS so that it doesn’t hijack the keyboard output?
  • What are the steps to getting patches added to the upstream kernel? What do we need to change to get it accepted?

Keyboard:

  • Where in the keyboard drivers is rollover handled? Why is it not correctly dealing with our keyboard?
  • How is keyboard backlighting handled on other laptops in the kernel? Is there any major difference in the hardware (besides SPI) between our keyboard and others?

Suspend/resume:

  • What are those custom op-codes doing? Are they even relevant?
  • Is LPSS being disabled effecting suspend?
  • Could it be that suspend is working, but there is a problem with resume?

Speakers:

  • At what level of the audio stack is communication to the speaker breaking down?

For now, I’ll be focusing on suspend/resume and Max will be working to get our work so far merged upstream.

Again, thank you to everyone who helped us along the way!

Linux Kernel Project Update: Touchpad is Working!

Big thanks to roadrunner2 for his work on getting functionality on his Macbook Pro (which shares alot of the same components as our computers). Apparently, there was a communication breakdown between the device and driver which is fixed by adding short delays between the setup messages. I’m not sure if this is the optimal way to solve it, but at least the trackpad has pretty close to full functionality now!

It makes me very happy to say that I am now typing this blog post from inside Linux on my Macbook.

What still doesn’t work is force touch (pressing harder on the trackpad) and right click via a two finger click (which I utilize alot). Most likely the force touch will be backburnered since it is high effort, low reward, and I can just add two finger click without having to do it from within the kernel itself.

I’ve put off work on the speakers for now since it is relatively unimportant, at least compared to getting resume working.

From what I’ve read and then confirmed for myself, the hard drive, which is nvme, is not shutting down properly. I tried reversing it, but haven’t been able to gleam too much from that process (I’m not experienced enough in reversing). Someone (I apologize for not remembering your name or where you said this) reversed it and was able to find custom op-codes that are manufacturer specific that are called right before OS X shuts down the hard drive.

My debugging of the problem is significantly hindered by the fact that every time I want to test anything, it results in me having to reboot. Thankfully, the ssd is very quick and arch has a small footprint.

Another possible part of the problem is the disabling of LPSS. From what I can tell from intel’s original LPSS patent, it is used to put the computer in a state where data can be recovered later. However, the patent is super old (1999) and an entirely different system or process is being used to suspend.

It is the intersection of these problems (improper nvme shutdown and disabled lpss side-effects) that makes fixing resume so finicky. I’m going to read as much as I can about LPSS and nvme as well as try once again to reverse the Mac driver.

Linux Kernel Project Update: Keyboard is working!

Project introduction here: http://charlieyou.me/linux0/

Max has figured out that if you disable the low power subsystem (LPSS) inside of the SPI driver (pxa2xx.c), then the applespi driver written by cb22 is able to do its job. In other words: the keyboard now works!

Well… mostly. Key rollover still does not work, so you can’t press two keys at the same time and have both be detected. Wakeup from keyboard also doesn’t work. The latter we’ll tackle along with sleep/hiberate issues. The former Max is trying to figure out now.

The immediate next step is to get the trackpad to work as well. Valid data is being read by the IRQ handler, so it should just be a matter of piping it to the correct place.

My focus is now on getting the internal speakers to work. Strangely enough, the headphone jack and internal mic work perfectly, it’s just the speakers that don’t output anything. There’s a bugzilla post for this that’s been a fairly helpful start: https://bugzilla.kernel.org/show_bug.cgi?id=110561. For an overview of the linux audio stack see this article: http://voices.canonical.com/david.henningsson/2011/12/08/audio-debugging-techniques/.

So far, I’ve played around with the patches mentioned in the bugzilla as well as hda jack retask to see if I could reroute to a different pin that is hopefully the speaker. No luck with that. Next I tried to just route the output to all unconnected pins, and that did get some sound from the speaker. Not the right sound and the kernel promptly panicked. But it’s something…

After this, the next things to tackle are:

  • Screen tearing
  • Bluetooth
  • Sleep/hibernate

Newest Project: Working on the Linux Kernel

This is a long overdue post on our (Max Shavrick and my) work on the Linux kernel for RCOS. We are being supported by Microsoft through mentorship by Stephen Hemminger, who works on the kernel for a living.

Max and I both own the 2015 12″ Macbook (8,1), which unfortunately contains quite a few hardware items that do not yet have drivers in the Linux kernel. Our task is to try and fill these gaps.

The most important of these is getting the keyboard and trackpad to work. The issue is that they are both SPI devices, which Linux does not currently support. In addition, there is not a DMA controller built into the SPI controller as in the 2016 (9,1) Macbook. There are two posts on Bugzilla about it as well as one on Bounty Source. There is also a WIP driver on Github from cb22 that apparently has basic functionality (no rollover or wakeup) on the 2016 Macbook.

By forcing the pxa2xx driver (the main SPI controller for Linux) to not use DMA, Max has been able to detect keypresses and touchpad actions. However, all of the packets are filled with zeros. There are three hypotheses:

  1. We are not reading the correct number of bytes (currently reading 256 in chunks of 8).
  2. We are not correctly acknowledging that we have read the bytes resulting in the last packet being sent.
  3. No bytes are being transferred and an empty buffer is being returned.

At this time, we are not sure how to proceed. We aren’t able to run kgdb since there is no (simple) way to connect via a serial connection.

I’m still wrapping my head around how all of the communication in the kernel works, I’ll have a blog post next week explaining as much as I know.

 

How I Found a Bug in HackerRank

For those not in the tech space, or for those who are but haven’t had to interview recently, one of the most recent trends in recruiting is to send candidates a coding challenge. The idea is to screen out those who can’t actually write code (apparently surprisingly common, see https://blog.codinghorror.com/why-cant-programmers-program/) so that you don’t have your engineers interviewing someone who is clearly not qualified.

There are a few companies that have developed platforms that make it easier for companies to do this and by far the most popular is HackerRank. They give you a problem description at the top, a text editor to write and run code in below that and then an input to test on as well as the expected output for it.

After you write code that is able to solve the known input, the server then runs the code on inputs that you are not supposed to be able to see. This is to give companies a gauge on how well you are able to write code that is able to deal with edge cases that are not given to you.

HackerRank lets you see how many hidden test cases there are as well as how many hidden test cases that you have passed. This leads to many frustrating moments when you have written code that gives the correct output on every test case except for one.


About a week ago, [company name omitted], one of the largest and most well known tech companies, sent me one of these and I faced this exact situation. After working on the last problem for a bit, I had a solution that worked for eight out of nine test cases. Sigh.

I read the problem description again and identified four areas in which there could possibly be an edge case with one of them requiring a rewrite of the main data structure that I had used.

Being lazy, I obviously did not want to do this. I really wanted to know what hidden input I was failing on so that I didn’t have to add code for all four of these edge cases when they were just testing for one.

I got up to get another cup of coffee and when I was walking back to my desk, I had a really stupid thought: “Since I know what the edge cases look like, can I just raise a runtime error if I see them?”

I really didn’t expect this to work, but added the two lines of code to test it: screenshot-from-2016-11-02-20-07-27

then I scrolled down and I pressed the submit button:

screenshot-from-2016-11-02-20-06-42

Bingo, I knew which edge case the problematic input was and could skip writing the solutions for the other three.

HackerRank does have functionality to mask the error message, so you aren’t able to write a custom exception to dump the exact input, but this is still a massive bug and something that should be addressed urgently. I’ll be sending this to HackerRank and to the company that gave me the challenge.

Thanks for reading!

 Update (2016-11-07)

HackerRank’s CTO, Hari Karunanidhi, reached out to me:

Harishankaran Karunanidhi (HackerRank)
Nov 7, 03:51 MST

Hi Charlie,

Thanks for sharing the feedback. We have a tough job of finding the right balance between having the test secure and also candidate friendly. For example, hiding the type of error would make it really frustrating for a developer to solve a challenge.

In this case, kudos on tracing out the corner case. For example, your idea of “Since I know what the edge cases look like, can I just raise a runtime error if I see them?” is not something all candidates might get during an online assessment.

Once again, thanks for reporting this. But we’ll display the “Runtime Error” to make sure the product is easy for test solvers to solve. We’ll also work towards making better testcases, so that you are forced to take care of all the corner cases.

-Hari.
CTO,
HackerRank.

His reasoning is understandable, but I still feel that they should just mark test cases with errors and incorrect. Still cool that the CTO himself responded.