[Nuxeo Tech Report] News from the Developer Front #9
Tue 30 July 2013 By Laurent Doguin
Here's the July tech report. Let me warn you that this one is quite dense!
- Infrastructure Improvements
- Optimizations and Misc Issues
- UI Framework
- CI and Build
IDE and Deployment Model
The plan NXROADMAP-129 was to implement a white list/black list system. However, it does not seem to be needed anymore: we have found other ways to achieve the same goal.
This work includes:
- IDE bundles always override server bundles
- Make deployer deploy bundles defined as user lib in the IDE
- Make user libs/bundles part of the static class loader
- Will fix the initial deployment
Stephane will update NXROADMAP-129 accordingly. As a side note, for now we still have issues with the reload of some resources like JSF resources.
Even if we know there are solutions, it is not worth spending too much time on this: if a server restart fixes the issue, this is something that can be done when adding a new navigation case for example.
The cleanup for deployment fragment is part of the Tomcat 7 work: NXROADMAP-37.
This issue will be addressed after the Tomcat 7 alignment, hopefully in September.
For dependencies, we would sometimes need to make them optional. Typically,
nuxeo-polldepends on preview (which is in DM) to disable the preview, but it should work on CAP (with the needed nuxeo jars,but without the preview).
To solve this, we would need to make some requirements optional so that unresolved dependencies may not always trigger errors and break the bundle deployment.
Guillaume rebuilt a clean version of RichFaces including:
- Our patches
- All official patches we found
Guillaume worked on NXP-11967 and NXP-11331 and thanks to that we now have:
- An automatic Conversation creation when the user opens a new Tab/Window
- The 'previous' link has been removed
- A state manager that can handle correctly using Nuxeo in multiple tabs without having problem with the "can not restore view" issue
This work has been merged and validated by tests: this is a major improvement!
More Improvements to Come
We would like to improve our Seam layer to have:
- Better error handling when conversation locking issues occur
- At least know what are the concurrent requests
- If we need to recreate a temporary conversation: create one that can work with Nuxeo!
- Provide Nuxeo Runtime Service injection
- This is easy and nice
- Review existing Seam beans to make them lighter
- Use Page rather than Conversation scopes
- Reduce invalidation requirements
- Reduce memory footprint
An other improvement we may want to do is to provide a simple way to integrate JS/Html Widgets with the existing Widget/Layout system. Most people find it complicated to build JSF components and prefer spending time debugging HTML/JS. We should help them in doing that if we have a clean solution.
The global approach is:
- Use a simple h:inputText to create the component in the JSF tree
- Use JSON serialization to init/reset the JS widget from the model
- Use Automation RPC to do the processing
- Use valueHolder to manage transient state in JSF tree
A first tentative was done with nuxeo-select2-integration and associated task https://jira.nuxeo.com/browse/NXP-12018
The task for aligning on Tomcat 7 is still here: NXP-10071. It was first reverted because of some broken tests, and since then some overlapping work has been done on the datasources system (to support non XA mode). Julien (probably with some help from Stephane) will try to realign and merge the branch.
The goal is to have this ready for 5.7.2!
Since 5.7.1 we have an initial integration for Metrics and Graphite. However we still have some remaining work to do.
Cleanup and Finish the Work
There are still a lot of small tasks to be completed:
- Upgrade Metrics version NXP-11995
- Looks like we are using the version that was not properly tagged in GitHub!
- Better expose the Geronimo pool
- Rewrap application level probes NXP-11162
- Cleanup metrics NXP-11161
- Update documentation NXP-11164
The data collected by Metrics is available via JMX. However, JMX:
- Is not user friendly
- Is not easy to tunnel/map as a protocol
As a result, it is a pain to explain how to get these info. This means we need an additional layer to make it easy to collect the stats. Stephane started the work in integrate a helper that includes:
- A webserver to give access to JMS via web pages
- Protocol adapters for JMX over http/https
- Deploy it by default
- Only enable it via a
- Reuse server key for authentication
Ideally, we should have all counters/probes exposed via JAX-RS.
Global Monitoring Solution
Mathieu is working on providing a global monitoring solution. So, in addition of Metrics + Graphite + Diamond, the current solution includes:
- Fetch the logs
- Clone the Metrics flow
- Manage alerts (based on rules written in closure)
- Elastic Search
- Index the logs
- Front end for Elastic Search
Optimizations and Misc Issues
Storage and Files
The current file upload widget relies on RichFaces. This implementation has some drawbacks:
- It relies on JSF/Seam
- It maintains access to Seam conversation: this can create lock timeouts in Seam
- Upload is done in the context of a Tx
- Upload progress is managed server side
- Several client/server round trips
- Progress is wrong when nuxeo is behind a buffering reverse proxy like nginx
This is an opportunity to start a test implementation for a replacement of the fileupload action:
- Just replacing the file import popup dialog is simpler than addressing the global Widget use case
The idea is to align with the way the import is currently done via Drag&Drop:
- Use Automation Batch API for upload
- Run an Operation to do the import
However, the upload part was currently done using a tweaked version of jquery-upload plugin so that we don't have hard dependency on JS File API. A prototype was done for 5.6 and is available here nuxeo-jqueryfileupload-integration
We have improved the way big blobs are managed to relieve the server I/O. Florent did some optimizations to avoid local copies as far as possible: see NXP-11689
S3 Binary Manager
The S3 Binary Manager can easily become the bottleneck when dealing with a lot of file access and/or big files. We saw in recent bench that S3 can indeed become a limiting factor, especially when Network config is not optimum (Nuxeo and S3 being in separated networks). To make it scale for real we need some kind of Async S3 Binary manager:
- Manage staging on a local FS
- Send on S3 on async
The implementation is not as simple as it may seem (Tiry did a very naive implementation that does not work!), since it does require some infrastructure:
- Distributed locking system
- Persistent Job system
For this we decided to use Redis (that Ben and Florent already tested): this may end up being the first Nuxeo component with a direct dependency on Redis. See NXP-11731
Worker and Blocking Queue
We know we need to improve the infrastructure of the WorkManager, especially in the context of a cluster infrastructure:
- Better manage synchronization between nodes
- Manage exclusion
- Avoid running concurrent jobs hitting the same document
- Avoid dirty updates and deadlock
- Make jobs persistent
The work on S3 Async Binary Manager may be the opportunity to bootstrap the infrastructure for that. Once we have that, Redis may become a requirement for:
- Cluster architecture
- Scalability of jobs
- Safety of jobs
This may end up being a 5.8 feature!
Long Running Processing
We (should) all know that long running transactions are bad:
- Because it creates more concurrency against the resources like the Database
- Because it ends up generating nasty issues like
- Dirty updates
- Transactions timeouts
The problem is all the more important when:
- Database sucks
- Here MS SQL Server does a very good job at being sucky
- Processes have bad granularity
- Typically processes that directly grow with number of users or number of documents
- Processes that uses slow WebServices
- Processes that manage big files
The typical examples are:
- Async listeners that manage big file processing
- Async listeners that call WebServices
- Big Automation Chains that include a lot of processing in a big loop
Solving the Issue
Sadly there is no magic solution. The only way to avoid deadlock is to shorten the transactions. So when shortening the process is not an option, we have to split the process in smaller transactions.
So far, we have 2 patterns:
- Read/Process/Save: split the global process in 3 steps
- Read required data inside a Tx
- Run the long processing on disconnected objects (outside of TX)
- Save back result in the repository inside a TX
- Create a new TX for each iteration inside a big loop
The tricky part is that we still have a low level problem where Session can not be correctly re-associated to the TX in some cases. See NXP-11681.
A new base class for async listener was started using the read/process/save pattern to provide an example for a VirusScanner.
See nuxeo-virusscaner-sample and more specifically the AbstractLongRunningListener.java
The associated Jira task is NXP-11721 and as soon as NXP-11681 we will add this new base class inside Nuxeo.
Worker could also provide a base class that handles the read/process/save pattern. We could extract a base class from it and at least we should verify that the picture conversion in 5.7 uses a similar pattern to avoid long running transactions.
See nuxeo-imaging-async and more specifically PictureViewsComputerWorker.java
Task NXP-11975 was created for that.
In the context of Automation, we have an operation that runs an operation in loop. This is a good candidate for generating long running transactions, so this is a good candidate for testing the multi-tx solution. For that we created a initial implementation for
RunOperationOnListInNewTransaction, see NXP-11885.
Misc Issues Reported via Support
Metrics Bad Impact on VCS Caching
Looks like VCS Cache is slowed down by Metrics timers. Stephane and Ben are investigating this point NXP-11925
Scalability Issue on FullText
As reported in NXP-11897 full text does not seem to scale well because of some migration issues on TS_VECTOR.
VCS and Dates
Most databases, even SQL Server starting with version 2005, allow us to store the time zone with datetime. However, for now, Nuxeo does not properly store the time zone with the date we save in the Database.
This means that, thanks to hacks, everything goes relatively well as long as we have the Nuxeo nodes and the database servers inside the same time zone. Fixing this won't be very complex, but will need some tooling to manage migration.
ACL Missing Prefixes
From the very beginning, we have forgotten to use prefix inside the ACL to differentiate users and groups. This lead us to several problems:
- Having users and groups with the same name creates issues!
- We have some services that use prefix and some that don't
This can end up generating very strange issues:
- Tasks API expect to have prefixed users/groups for assignee
- But it can be called with simple string extracted from an ACL
- Tasks assignments are not correct!
A good fix would be:
- Upgrade ACL
- Provide a migration script
- Force to use prefix everywhere
We have started a new documentation page that aims at defining some kind of TCK and register all known Automation Clients and their level of compliance. For now the TCK is nothing more than a suite of simple tests described in english, JS and java.
This page is just a draft for now, but we'll be working on this.
Vlad will externalize the tests resources that are, for now, in
automation-testso that we can share a simple Nuxeo plugin that only contributes more complex schemas than what we have in default Nuxeo: this is used in the TCK.
Java Automation Client and Code Sharing
The Java Automation Client is evolving at the same time as the server side. But it becomes more and more obvious that there are some classes that should be shared by the server and the client:
- Blob classes
- Properties classes
- DocumentRef classes
We also have some "almost duplicated" classes for the marshaling code. Moreover, the fact that we now support the concept of business adapter will push users to use the same class on the client and server side.
This task will not be easy to do without breaking anything on the client side API.
NXP-12010 was created for now.
JS Automation Lib
Inside Nuxeo we have an
automation.jsthat contains primitives for doing automation calls. This script is mainly used by the Drag&Drop and some extensions. However the code was not cleaned and not documented. To address that, a new sandbox nuxeo-automation-jsshell was created.
The goal is to prepare in one module:
- A better
- Improved namespace
- Improved api
- The tests for the TCK
- A JS based Shell that could replace the applet based shell
The current code base is a mix of:
- The original
- The uploader from the Drag and Drop script
- Some new code
The first version includes tests done using qunitjs. Thomas forked the repository in troger/nuxeo-automation-jsshell to continue the work:
For now, we have decided to keep the callback model (rather than switching to promises). Umbrella task is NXP-11860.
The target is to have everything working with Node.js, but this will require some additional work since some important API (Blob, File, XHR) are not completely available via Node.js. We'll also work on providing Bower and Node descriptors.
Python Automation Client
We will need to work on extracting it from the Drive client.
Dart Automation Lib
Nelson is moving forward on the Dart Automation Client. Hopefully, we will also have a playground in Dart + WebUI.
.Net Automation Client
Astone Solutions is moving forward on implementing the .Net client, and they are switching the Outlook extension to use it.
We have a lot of code that handles JSON marshaling for Nuxeo objects:
- Inside Automation Server
- Inside Automation Client
These classes are useful everywhere: so we will factorize the code in
Damien has started the work on making Automation API
HATEOAS. This new module will be based on WebEngine:
- Using WebAdapter model
- Leveraging the shared JSON marshaling
Vlad (with the help of Stephane) added support for business adapters in the Automation API. Damien tested and did some adjustment to be able to reuse the same adapter system inside the REST bindings.
Vlad added some doc with an example in confluence.
For now, the system handles only one type of Business Adapter at a time. This means you can not have an heterogeneous list as input /output, unless you create a dedicated Business Adapter to wrap the list.
We'll see later if we have real uses like that where having support for multi-part would help...
Operation versus Chain Alignment
Vlad (with the help of Stephane) worked on aligning the models for Operations and Chains.
The idea is to:
- Be able to call Chains or Operations the same way
- Define documentation and parameters for Chains too
The final target will be to be able to bind forms to Automation Chains (ask Alain about this!).
Still to be addeed to the Chain:
- Add input/output definition (infer from first and last operations)
- Add label + description in the XMap descriptor
- Add MVEL parsing at the right place (i.e. not in the descriptor)
Impact on existing API
- Update the Flow operations (like RunChains...): done
- Update the DnD import chains to leverage new features: to be done
Studio impacts to plan:
- Add Chain Category
- Add parameter definition in chain builder
- Add doc edit in chain builder
site/automation/docbuilt-in documentation system has been updated to include the chains.
We still need to see with Lise if we can quickly make it nicer (it should not be a challenge to make it look better):
- Make a CSS that does not look like 90's
- Provide nicer navigation (now we have a lot of operations/chains)
- Provide better display of JSON definitions
Exception management in Automation will be addressed. As a first step, we'll focus on being able to get a proper exception call stack, we'll see for the control flow next.
We've added a type notion and a properties notion to Actions. This work is ongoing but won't be completed before 5.7.2.
Action types and related properties are not documented yet, but this would be easier by using widget types to render actions of a given type.
Involved JIRA task NXDOC-188
Action Widget Types
Making actions used widget types for rendering would:
- Improve pluggability: no need to override a nuxeo default template to add a custom action type
- Ease up documentation: just need to integrate action widgets types to a layout showcase-like website
- Ease up integration in Studio: action types configuration screens would be generated from widget types configuration
- Allow defining complex filterable UI elements like the widget type displaying tabs (as it needs a notion of mode to handle the content of the tab, and this notion is already handed by widget types)
Work has already been started in a branch.
Involved JIRA tasks are: NXP-11142, NXP-9673, NXP-9384
Action Filters Evaluation Context
Action filters currently rely on a restricted context.
- Make them rely on a context provided by the action, this will make actions more generic, and reusable in other UI contexts than Seam/JSF.
- Make the current document used in action filters pluggable so that widget types displaying actions do not have to be specific to it (exp: "currentDocumentActions" widget type)
This will ease up usability. Users will be able to handle standard JSF expressions in action filters. This will also make it possible to define generic widget types.
Other features that this would unblock for the future (but not for now):
- Possibility to add action widget types in layout listing rows.
- Possibility to use actions displaying tabs outside of a current document context (exp: admin center)
- Possibility to extend the action context, to provide current selected elements for instance (useful when adding action selection widget types)
Target task is NXP-10566.
Widgets have been added to a notion of control, that handle mostly (for now):
These controls are usually handled by default layout templates and default widget types accepting subwidgets.
Widget Type Definitions
The widget type definition has already been improved, but we need to continue to extend it so that we can handle more use cases inside in a cleaner way. The requirements come from the fact that we need to handle better:
- What field types can be bound to a widget
- If widget has a hard coded title or not
- What are the default widget properties
We also need to be able to filter/present them better in Studio and improve their management.
For DAM we have requirements for adding some validation, especially on the file attributes. See NXP-11940
However, we should not try to handle this kind of validation server side: as much as possible everything should be managed on the client side. At least on recent browser this won't be an issue and for file uploads, doing server side validation is really a pain.
Drive, Extensions and Profiles
We added some pluggable Java Factories so that users can have a different roots layout. For now pluggability is only java/factory level.
We should add the concept of profiles:
- So that we can activate/deactivate profiles
- So that tests can cover several layouts
Antoine will update Jira accordingly.
PyQT vs PySide
Testers have experienced stability issues on the QT bindings. After doing some digging, we found out that:
- There are known bugs on PySide, the Python QT binding we use
- The PySide project is almost dead
For testing purpose we created a branch using PyQT, the original Python QT binding. This works and this seems to fix the stability issue that testers reported.
Drive and Automation
For now the Python Automation is directly part of Nuxeo Drive. With the license switch, ideally, we would like to extract the Automation Client so that it can remain LGPL.
Upload and Download
Upload and Download of big files should now be fixed. Drive client now uses the 'automation batch upload api'.
Antoine took some time to fix the build and tests
- fixed some encoding on Windows
- fixed 5.7 issues with thumbs (double events in audit)
- scalability issues on recursive adapter (may explain timeout on windows servers)
The scalability issue is probably just a side effect of the test that create 20 levels deep hierarchy.
Ben is currently doing performance tests on Drive.
- Add support for http proxies
- Improve scan system
- Improve Drive hosting/deployment policy
- Liveedit integration
We have integrated some new widgets (player, picture view, video info). Complete configuration will require to finish the work on incremental widgets.
The integration of BoxListing in DM is also in progress.
ImportRoot doc type was renamed AssetLibrary for consistency, but we should keep ImportRoot type for compatibility.
- The new doctype is AssetLibrary to make it consistent and understandable in Studio
- We keep backward compat on ImportRoot, but there are no subtypes for ImportRoot (for now)
After having the issue on the Intranet, we created NXP-11853. For now, we'll keep it like that, at least for the next FastTrack, even if this means adding the same subtypes to ImportRoot as AssetLibrary.
However, in the scope of 5.8, we should be able to handle automatic migration at SQL level.
Allowed Asset Types
Inside DAM import dialog, we need to filter available types based on the container. We cannot rely on default
allowedTypessince there may be DM or SC types that we don't want to be available from DAM UI.
For now, we use a dedicated
- This works
- This is already integrated inside Studio
So, even if this is not very clean and this probably means we should have better flexibility on the allowedTypes definitions: we'll keep it like that for now...
NB: real option would be to extend the configuration of
Import and DnD
The import system has been aligned on the Automation/DnD system of DM. This is still a transitional solution since:
- We'll need to align everything on the
- We want to add more features to the import/dnd (like picture previews via Canvas)
We'll try to add support for:
- Bulk tagging
- define tags at import time (integrated with DnD)
Modernizr and HTML5 Features
For now, we are inferring browser features from the browser user agent using some server side helper. This does work for most cases, but still in some cases:
- We need the info from client side without having to call the server
- We need very precise feature detection like check
history.pushsupport that is used in REST linking
Modernizr does provide:
- Clean html5 feature detection
- Help to manage browser compatibility issues
We know that we need to do some cleanup in Imaging components. High level list of tasks is:
- Rework PictureBook type
- Align on ContentView
- Align on BoxListing
- Rethink UI and slideshow
- Cleanup adapters
- Adapters should hold less automatic logic
- Logic should be forwarded to service
- Adapt the picture viewgeneration to multi-tx mode
- Fix meta-data extraction:
- Need to fix field size overflow
- Dependency issues between meta-data/views (error on meta => rollback!)
- Remove mistral for real
- Integrate Laurent's work
- Merge Nelson Silva Pull Request NXP-11582
- Make Picture views configurable.
We know we need to improve the Social Collab feature, but we must be careful not to break everything.
DM and Social Collab integration
We want to add some of the Social Collab feature to a 'standard workspace':
- Activity stream
This work could also include adding the rich profile in DM. Thomas initially started this work in a branch: at some point we should resume this work and do the merge. This should end up in creating 2 separated marketplace package:
- One that adds basic social features to a Nuxeo DM
- One that include the collaboration model of the current Social Collab module
Local Groups are one of the cool features that comes with Social Collab, Presales often use this inside DM. Ideally, Benjamin should package this work in an addon and submit it.
Ideas on Product Evolutions
Alain would like to replace/improve the OpenSocial Gadget system. However, this is not really the time to go back to plain server side rendering for dashboard: this would be anachronistic!
What we could do is:
- Improve the JS container to also support Non OpenSocial Gadgets
- Provide a way to make Non OpenSocial Gadgets using Automation + Angular
We are likely to have more and more Html/JS Widgets. We have already started doing work on
Select2. This is very likely to be continued.
CI and Build
Maven 3 upgrade
We need to start this upgrade ASAP and for this we must fix the distribution issue. Julien tested to see if there are quick fixes to align our existing nuxeo-distribution maven plugin: looks like there is none.
Before starting the work on building a brand new Maven3 plugin, we should be sure that there is no alternative good solution.
- Use gradle to mix maven dependencies with procedural commands
- Fix an ant/maven plugin that does work
If we really have no other valid option that rewrite we should:
- Create the Git Repo
- Update the ReadMe to explain what we want to do
- Start some initial code (at least some tests)
- Push on Maven3 Mailing list to see if we can have more players
Category: Product & Development