Advances In Computer Vision Propel Transportation Autonomy

Vision is a powerful human sensory input. It enables complex tasks and processes we take for granted. With an increase in AoT™ (Autonomy of Things) in diverse applications ranging from transportation and agriculture to robotics and medicine, the role of cameras, computing and machine learning in providing human-like vision and cognition is becoming significant. Computer vision as an academic discipline took off in the 1960s, primarily at universities engaged in the emerging field of artificial intelligence (AI) and machine learning. It progressed dramatically in the next four decades as significant advances in semiconductor and computing technologies were made. Recent advances in deep learning and artificial intelligence have further accelerated the application of computer vision to provide real-time, low latency perception and cognition of the environment, enabling autonomy, safety and efficiency in various applications. Transportation is one area that has benefitted significantly.

LiDAR (Light Detection and Ranging) is an active optical imaging approach that uses lasers to determine the 3D environment around an object. It is one of the technologies that computer vision solutions (which rely purely on ambient light and do not use lasers for 3D perception) are trying to disrupt. The common theme is that human drivers do not need LiDAR for depth perception, so neither should machines. Current commercial L3 autonomous driving features (complete autonomy in specific geographies and weather conditions, with the driver ready to take control within seconds) products today use LiDAR. Purely vision-based techniques have still not been able to offer this capability commercially.

ADVERTISEMENT

TeslaTSLA
is a dominant proponent of using passive camera-based computer vision to provide passenger vehicle autonomy. During the company’s recent AI Day event, Elon Musk and his engineers provided an impressive presentation of its AI, data management and computing capabilities that support, amongst other initiatives, the Full Self Driving (FSD) feature on multiple Tesla models. FSD requires the human driver to be engaged in the driving task at all times (which is consistent with L2 autonomy). Currently, this option is available on 160,000 vehicles purchased by customers in the U.S. and Canada. A suite of 8 cameras on each vehicle provides a 360° occupancy map. Camera (and other) data from these vehicles are used to train its neural network (which uses auto-labeling) to recognize objects, plot potential vehicle trajectories, select optimum ones and activate the appropriate control actions. ~75K updates of the neural network have occurred over the past 12 months (~1 update every 7 minutes) as new data is continually collected and labeling errors or manoeuvering mistakes are detected. The trained network executes planning and control actions through an onboard, redundant architecture of purpose-built compute electronics. Tesla expects FSD to eventually lead to autonomous vehicles (AVs), which provide complete autonomy in certain operational design domains with no human driver engagement required (also referred to as L4 autonomy).

Other companies like Phiar, Helm.ai and NODAR are also pursuing the computer vision avenue. NODAR aims to significantly expand the imaging range and 3D perception of stereo camera systems by learning to adjust for camera misalignment and vibration effects through patented machine learning algorithms. It recently raised $12M for the productization of its flagship product, Hammerhead™, which utilizes “off-the-shelf” automotive-grade cameras and standard compute platforms.

Apart from cost and size, a frequent argument against using LiDAR is that it has limited range and resolution compared to cameras. For example, LiDARs with a 200 m range and 5-10 M points/second (PPS akin to resolution) are available today. At 200 m, small obstacles like bricks or tire debris will register very few points (maybe 2-3 in the vertical and 3-5 in the horizontal direction), making object recognition difficult. Things get even more coarse at longer ranges. By comparison, standard megapixel cameras running at 30 Hz can generate 30M pixels/second, enabling superior object recognition even at long ranges. More advanced cameras (12 M pixels) can increase this even further. The issue is how to utilize this massive data and produce actionable perception with millisecond-level latencies, low power consumption and degraded lighting conditions.

ADVERTISEMENT


Recogni, a California-based company, is trying to solve this problem. According to CEO Mark Bolitho, its mission is to “deliver superhuman visual perception for fully autonomous vehicles.” The company was founded in 2017, has raised $75M to date and has 70 employees. R.K. Anand, an alum of Juniper Networks, is one of the co-founders and Chief Product Officer. He believes that using higher resolution cameras, with > 120 dB dynamic range, running at high frame rates (for example, OnSemi, Sony and Omnivision) provides the data required to create high-resolution 3D information, which is critical for realizing AVs. The enablers to this are:

  1. Custom-designed ASICs to process the data efficiently and produce accurate and high-resolution 3D maps of the car environment. These are fabricated on a TSMC 7 nm process, with a chip size of 100 mm², operating at a 1 GHz frequency.
  2. Proprietary machine learning algorithms to process millions of data points offline to create the trained neural network, which can then operate efficiently and learn continuously. This network provides the perception and includes object classification & detection, semantic segmentation, lane detection, traffic signs and traffic light recognition
  3. Minimizing off-chip storage and multiplication operations which are power intensive and create high latency. Recogni’s ASIC design is optimized for logarithmic math and uses addition. Further efficiencies are realized by clustering weights optimally in the trained neural network.

During the training phase, a commercial LiDAR is used as ground truth to train high resolution, high dynamic range stereo camera data to extract depth information and make it robust against misalignment and vibration effects. According to Mr. Anand, their machine learning implementation is so efficient that it can extrapolate depth estimates beyond the training ranges provided by the calibration LiDAR (which provides the ground truth to a range of 100 m).

ADVERTISEMENT

The training data above was conducted in the daytime with a stereo pair of 8.3-megapixel cameras running at 30 Hz frame rates (~0.5B pixels per second). It demonstrates the ability of the trained network to extract 3D information in the scene beyond the 100 m range it was trained with. Recogni’s solution can also extrapolate its learning with daytime data to nighttime performance (Figure 2).

ADVERTISEMENT

According to Mr. Anand, the range data is accurate to within 5% (at long ranges) and close to 2% (at shorter ranges). The solution provides 1000 TOPS (trillion operations per second) with 6 ms latency and 25W power consumption (40 TOPS/W), which leads the industry. Competitors using integer math are > 10X lower on this metric. Recogni’s solution is currently in trials at multiple automotive Tier 1 suppliers.

Prophesee (“predicting and seeing where the action is”), based in France, uses its event-based cameras for AVs, Advanced Driver Assistance Systems (ADAS), industrial automation, consumer applications and healthcare. Founded in 2014, the company recently closed its C round funding of $50M, with a total of $127M raised to date. Xiaomi, a leading manufacturer of mobile phones, is one of the investors. Prophesee’s goal is to emulate human vision in which the receptors in the retina react to dynamic information. The human brain focuses on processing changes in the scene (especially for driving). The basic idea is to use camera and pixel architectures that detect changes in light intensity above a threshold (an event) and provide only this data to the compute stack for further processing. The pixels work asynchronously (not framed like in regular CMOS cameras) and at much higher speeds since they do not have to integrate photons like in a conventional frame-based camera and wait for the entire frame to finish this before the readout of the data. The advantages are significant – lower data bandwidth, decision latency, storage, and power consumption. The company’s first commercial-grade VGA event-based vision sensor featured a high dynamic range (>120 dB), low power consumption (26 mW at the sensor level or 3 nW/event). An HD (High Definition) version (jointly developed with Sony), with industry-leading pixel size (< 5 μm) has also been launched.

ADVERTISEMENT

These sensors form the core of the Metavision® sensing platform, which uses AI to provide smart and efficient perception for autonomy applications and is under evaluation by multiple companies in the transportation space. Apart from forward-facing perception for AVs and ADAS, Prophesee is actively engaged with customers for in-cabin monitoring of the driver for L2 and L3 applications, see Figure 4:

Automotive opportunities are lucrative, but the design-in cycles are long. Over the past two years, Prophesee has seen significant interest and traction in the machine vision space for industrial applications. These include high-speed counting, surface inspection and vibration monitoring.

ADVERTISEMENT

Prophesee recently announced collaborations with leading developers of machine vision systems to exploit opportunities in industrial automation, robotics, automotive and IoT (Internet of Things). Other immediate opportunities are image blur correction for mobile phones and AR/VR applications. These use lower format sensors than those used for the longer-term ADAS/AV opportunities, consume even lower power, and operate with significantly lower latency.


Israel is a leading innovator in high technology, with significant venture investments and an active start-up environment. Since 2015, about $70B in venture-led investments in the technology sector have occurred. A portion of this is in the area of computer vision. Mobileye spearheaded this revolution in 1999 when Amnon Shashua, a leading AI researcher at Hebrew University, founded the company to focus on camera-based perception for ADAS and AVs. The company filed for an IPO in 2014 and was acquired by IntelINTC
in 2017 for $15B. Today, it is easily the leading player in the computer vision and AV domain and recently announced its intention to file for an IPO and become an independent entity. Mobileye had revenues of $1.4B/year and modest losses ($75M). It provides computer vision capabilities to 50 automotive OEMs who deploy it across 800 car models for ADAS capabilities. In the future, they intend to lead in L4 vehicle autonomy (no driver required) using this computer vision expertise and LiDAR capabilities based on Intel’s silicon photonics platform. Mobileye’s valuation is estimated at ~$50B when they finally go public.

ADVERTISEMENT

Champel Capital, based in Jerusalem, is at the forefront of investing in companies developing products based on computer vision for diverse applications from transportation and agriculture to security and safety. Amir Weitman is a co-founder and managing partner and started his venture company in 2017. The first fund invested $20M in 14 companies. One of their investments was in Innoviz, which went public through a SPAC merger in 2018 and became a LiDAR unicorn. Led by Omer Keilaf (who hailed from the technology unit of the Intelligence Corps of the Israel Defense Force), the company today is a leader in LiDAR deployments for ADAS and AVs, with multiple design wins at BMW and Volkswagen.

Champel Capital’s second fund (Impact Deep Tech Fund II) was initiated in January 2022 and has raised $30M to date (the target is $100 M by the end of 2022). A dominant focus is on computer vision, with $12M deployed in five companies. Three of these use computer vision for transportation and robotics.

TankU, based in Haifa, started operations in 2018 and has raised $10M in funding. Dan Valdhorn is the CEO and is a graduate of Unit 8200, an elite high-tech group within the Israeli Defense Force responsible for signal intelligence and code decryption. TankU’s SaaS (Software as a Service) products automate and secure processes in complex outdoor environments servicing vehicles and drivers. These products are used by owners of vehicle fleets, private cars, fueling and electric charging stations to prevent theft and fraud in automated financial transactions. Vehicle fuel services generate ~$2T in global revenues annually, of which private and commercial vehicle fleet owners consume 40% or $800B. Retailers and fleet owners lose ~$100B annually due to theft and fraud (for example, using a fleet fuel card for unauthorized private vehicles). CNP (Card not present) fraud and tampering/stealing fuel are additional sources of loss, especially when using stolen card details in mobile apps for payments.

ADVERTISEMENT

The company’s TUfuel product facilitates one-tap secure payment, blocks most types of fraud and alerts customers when it suspects fraud. It does this based on an AI engine trained on data from existing CCTVs in these facilities and digital transaction data (including POS and other back-end data). Parameters like vehicle trajectory and dynamics, vehicle ID, journey time, mileage, fueling time, fuel quantity, fuel history and driver behavior are some attributes monitored to detect fraud. This data also helps retailers optimize site operation, enhance customer loyalty, and deploy vision-based marketing tools. According to CEO Dan Valdhorn, their solution detects 70% of the fleet, 90% of credit-card and 70% of tampering-related fraud events.

Sonol is an energy services company that owns and operates a network of 240 stations and convenience stores across Israel. TUfuel is deployed at their sites and has demonstrated enhanced security, fraud prevention, and customer loyalty. Product trials are underway in the U.S. in collaboration with a leading global supplier of gas stations and convenience store equipment. Similar initiatives are also underway in Africa and Europe.

ADVERTISEMENT

Tel-Aviv-based ITC was founded in 2019 by machine learning academics from Ben-Gurion University. ITC creates SaaS products that “measure traffic flow, predict congestion and mitigate it through smart manipulation of traffic lights – before jams begin to form.” Similar to TankU, it uses data from off-the-shelf cameras (already installed at numerous traffic intersections) to obtain live traffic data. Data from thousands of cameras across a city are analyzed, and parameters like vehicle type, speed, movement direction and sequence of vehicle types (trucks vs. cars) are extracted through the application of proprietary AI algorithms. Simulations predict traffic flow and potential traffic jam situations up to 30 minutes in advance. Traffic lights are adjusted using these results to smooth traffic flow and prevent jams.

Training the AI system takes one month of visual data across a typical city and involves a combination of supervised and unsupervised learning. ITC’s solution is already deployed in Tel-Aviv (ranked 25th in the world’s most congested cities in 2020), with thousands of cameras deployed at hundreds of intersections controlled by traffic lights. ITC’s system currently manages 75K vehicles, which is expected to continue growing. The company is installing a similar capability in Luxembourg and is starting trials in major U.S. cities. Globally, its solution manages 300,000 vehicles with operating sites in Israel, U.S.A, Brazil and Australia. Dvir Kenig, the CTO, is passionate about solving this problem – to give people back personal time, reduce greenhouse gases, enhance overall productivity and most importantly, reduce accidents at congested intersections. According to Mr. Kenig, “our deployments demonstrate a 30% reduction in traffic jams, reducing unproductive driving time, stress, fuel consumption and pollution.”

ADVERTISEMENT

Indoor Robotics was founded in 2018 and recently raised $18M in funding. The company, based near Tel-Aviv, Israel, develops and sells autonomous drone solutions for indoor security, safety and maintenance monitoring. The CEO and co-founder, Doron Ben-David, has significant robotics and aeronautics experience accumulated at IAIIAI
(a major defense prime contractor) and MAFAT (an advanced research organization within the Israeli Ministry of Defense), which is similar to DARPA in the United States. The growing investments in smart buildings and commercial security marketplaces fuel the need for autonomous systems that can use computer vision and other sensory inputs in small and large interior commercial spaces (offices, data centers, warehouses, and retail spaces). Indoor Robotics targets this market by using indoor drones equipped with off-the-shelf cameras and thermal and infrared range sensors.

Ofir Bar-Levav is the Chief Business Officer. He explains that the lack of GPS has hampered indoor drones from localizing themselves inside buildings (typically GPS-denied or inaccurate). Additionally, convenient and efficient docking and powering solutions were lacking. Indoor Robotics addresses this with four drone-mounted cameras (top, down, left, right) and simple range sensors that accurately map an indoor space and its contents. The camera data (cameras provide localization and mapping data) and thermal sensors (also mounted on the drone) are analyzed by an AI system to detect potential security, safety and maintenance issues and caution the customer. The drones power themselves through a ceiling-mounted “docking tile,” which saves valuable floor space and allows data collection while charging. The financial advantages of automating these mundane processes where human labor is complex and expensive in terms of recruitment, retention and training are evident. Using aerial drones vs. ground-based robots also has significant advantages in terms of capital and operating costs, better use of floor space, freedom to move without encountering obstacles and efficiency of camera data capture. According to Mr. Bar-Levav, Indoor Robotics’ TAM (Total Addressable Market) in indoor intelligent security systems will be $80B by 2026. Key customer locations today include warehouses, data centers and office campuses of leading global corporations.

ADVERTISEMENT


Computer vision is revolutioning the autonomy game – in movement automation, security, smart building monitoring, fraud detection and and traffic management. The power of semiconductors and AI are powerful enablers. Once computers master this incredible sensory modality in a scalable fashion, the possibilities are endless.

Source: https://www.forbes.com/sites/sabbirrangwala/2022/10/04/advances-in-computer-vision-propel-transportation-autonomy/