Machine learning: LinkedIn publishes Dagli, the Java ML framework

Source: Heise.de added 12th Nov 2020

  • machine-learning:-linkedin-publishes-dagli,-the-java-ml-framework

With Dagli, LinkedIn has published a machine learning framework that has some special features compared to the established systems. On the one hand, the underlying programming language is not Python as with TensorFlow or PyTorch, but Java. On the other hand, Dagli combines the training and the inference of the models in standardized pipelines.

Developers create these pipelines as directed acyclic graphs for which the framework gives its name: DAG is the acronym for the English translation of Directed Acyclic Graph.

A pipeline with no return A DAG consists of nodes whose edges or connections are directed, i.e. traversed in a specific direction. Acyclic means that there is no return connection to a node that has been passed through once, which means that no endless loops can occur.

The pipelines are designed as a directed acyclic graph designed so that feedback (red arrow) is not permitted.

In Dagli the roots of the graphs are either placeholder – Objects that represent placeholders for the example data for training or inference, or Generator – Objects that automatically generate values ​​for each example. The latter can include Constant , ExampleIndex and RandomDouble .

Preparation instead of training The child nodes are transformers that either perform simple data transformations or consist of trained models, including artificial neural networks, regression or classification methods in the form of NeuralNetwork , XGBoostRegression or LiblinearClassifier .

The Transformer nodes can either be “prepared” or “unprepared”. Dagli uses the expression “prepared” instead of “trained” for the models. The readme in the GitHub repository justifies the choice of words with the fact that the PreparableTransformer Objects are not statistical models. Thus a BucketIndex probably examines the examples in preparation for the limits for the most even distribution of the values ​​in the buckets.

Toy box with transformers Each transformer starts with an input that it typically receives in the pipeline from the previous node. On this basis it carries out the calculations or transformations in order to ultimately generate an output that it transfers to the next node in the graph.

At the start of the public beta, some ready-made transformer modules will be available. The GitHub repository has some code samples and a summary of the modules. Own functions can be integrated via the FunctionResultX – Transofmer.

The documentation also shows how developers can create their own transformers. In doing so, you have to observe a few basic rules: Every transformer must be immutable, thread-safe and serializable. In addition, it must be kept quasi-deterministic, which means that the result must not depend on the context. The documentation gives as an example that a transformer may receive and process a time specification as input, while using the current time for the transformation is prohibited.

JVM as a basis Dagli is written in Java and lists portability and broad IDE support as an advantage in the GitHub readme file on. In addition to Java version 9 or higher, it works with other JVM languages ​​(Java Virtual Machine) and can therefore also be used with Kotlin, Scala or Clojure.

Even if Python is the de facto standard for ML applications, there are other approaches for Java and JVM in the area of ​​machine learning in addition to Dagli, including the deep learning library DeepLearning4j and the Library Tribuo recently introduced by Oracle.

The Dagli- Team does list the advantages of Dagli in the readme, but at the same time frankly admits that there is no ML framework that is perfect for all scenarios. TensorFlow, PyTorch and DeepLearning4j are more suitable depending on the area of ​​application. Dagli can probably work with the frameworks in certain cases: DeepLearning4j architectures can integrate it directly and use TensorFlow models with a customized wrapper. However, there are no references to a general exchange, for example via the Open Neural Network Exchange (ONNX) format, which recently appeared in version 1.8.

At the moment, the framework is still marked as beta, and the current one Release bears the version number 15. 0.0-beta3. Major changes are to be expected during the beta phase, which are presumably associated with incompatibilities. Breaking changes potentially occur when the version changes in the beta phase. As soon as Dagli is ready for production, the team wants to keep future versions backwards compatible so that larger projects can work together with the framework in a future-proof manner.

(rme)

Read the full article at Heise.de

brands: RME  
media: Heise.de  

Related posts


Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 88

Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 88

Related Products



Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 91

Warning: Invalid argument supplied for foreach() in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 91