Hello everyone! Here is the first Tech Report of the year – as usual taken straight from the meetings of the Nuxeo Dev Team. You'll see there are many many different things going on right now.

2013 Roadmap


The global roadmap is being slightly updated.

DAM 2


This is the big part of this report. Work on DAM 2 has started and there is a lot to talk about :)

Web application

Progress report


Thomas started rebuilding the DAM web application from the ground last week. The main goals are to align on Layouts and ContentViews so that the DAM view is configurable via Studio.10

The good news is that after 6 days of work, we already have a working DAM view:


  • using Layout for page layout

  • using Actions, Widgets, and Grid Layout for content

  • with support for Ajax

  • with support for bookmarkable URLs


THE DAM2 project only contains 2 small beans and few templates. This means we are mostly relying on the Nuxeo CAP infrastructure.

Ajax navigation


DAM2 will be used as a sandbox to improve Ajax integration inside Nuxeo.

This covers several aspects:

REST URL update
This is done with a generic fragment calling HTML5 history.push() in an Ajax rendered panel. This is something we will push into Nuxeo CAP soon.

ContentView Pagination

This requires adding contentViewName, pageIndex and pageSize inside the URL pattern and inside the associated codec. This is being done inside DAM for now, but will likely be integrated inside Nuxeo CAP default codecs.

Switching tabs
This was already prototyped in DM. Now that the infrastructure is ready (via better management of JSF forms), we should be able to integrate this in DAM and then in CAP/DM.

UX


We may want DAM2 to be usable via the keyboard.

The goal is to allow browsing assets in a "natural way":


  • using arrows to navigate

  • using PageUp/PageDown for pagination

  • using Enter to select an asset

Problems / Issues

Facets and merge


DAM2 displays all documents having the Asset facet. This means the DAM plugin should add the Asset facet to all Picture/Video/Audio documents that already exist. This was a problem because Doc Types didn't support merge for Facets. This is now fixed.

Faceted search dependency


DAM2 will have a dependency on FacetedSearch, so the DAM package will need to embed faceted search in the Marketplace Package if we want to be able to install DAM on top of CAP (i.e. without DM).

This may require some changes in Faceted Search to be sure it won't give a funny UI or broken contrib on CAP (i.e. don't embeded FacetedSearch-DM part ).

Codec


Some work was done on the DAM codec (based on docPath) to be able to manage "null document" case. We need to add this fallback directly in the DocumentPathCodec.

HiddenInFacetedSearch


Faceted Search contributes the HiddenInFacetedSearch facet for the saved search documents.
This facet is used to filter the saved search from the result listing.

We should remove this facet and simply add the HiddenInNavigation facet. As for edit / manage saved search, we should propose a dedicated screen with a dedicated contentView that will simply bypass the HiddenInNavigation filtering.

ContentView, PageProvide and currentDocument


DAM contains a Master/Detail view type:


  • ContentView display the master / listing part

  • the currentDocument is displayed on the right panel


This means that when doing prev/next page on the ContentView we must update the currentDocument to one being part of the current page.

We will use an Event driven system based on Seam events:
contentViewChanged => update the currentDocument to be in the currentPage

The PageProvider itself cannot do it since it does not have access to Seam JSF.

The current idea is:


  • add a contributable listener in AbstractPageProvider

  • make ContentView add a Seam-aware listener class (that will fire a Seam event)

  • make DAM add a Seam listener that will update the currentDocument according to the currentPage

Drive

Automation changes

Object2JSON


For Drive requirements, Olivier had to add support for native Java Object 2 JSON marshaling.

Complex types


At the same time, Olivier started looking at the complex property limitations in Automation API. The create/update Automation API is really not easy to use to manage complex properties.

Historically this was simply not possible, later a hack was added for the Android Client, but the full fix was never done.

A complete solution includes:


  • managing JSON encoded complex properties inside the PropertyMap object (already partially done)

  • send the PropertyMap object as a real JSON object when possible (i.e. remove the limitation for \n!)

  • update the Java Automation Client to manage complex properties and dirty states

Automation protocol version and non-regression


The changes introduced in Automation Marshaling may at some point impact compatibility.

This is something we ideally don't want.
If we call Automation V2 the new Automation that manages Objects and Complex properties, Automation Server V2 should be able to handle calls from Automation V1 and V2 clients.

This could be a good point to have a REST (Funkload) test using Automation API that is recorded against 5.6 and check that it still runs against a 5.7 server:


  • depending on the result, we will know if we must really manage Automation Version in the protocol via a Client/Server negotiation

  • this will be useful for non-regression.

Repository Work

Quota Module

Performance impact


Managing Quotas implies additional checks and additional Write operations in the repository.

In addition, Quotas are mainly useful when there is a lot of data and a lot of users.

These 2 points raise a warning for performance, so Ben did some tests :)

Funload Benchmarking

A Funkload test for document creation was run to compare with and without Quota add-ons.

The results show that adding Quota management:


  • reduces scalability by 20%

  • increases response time by 25%


Most of the work is done inside Listeners:


  • quotaStatsListener is slow 83% of time of all sync listeners

  • quotaProcessor 15% of all async listeners

DB processing

There is a high number of calls of quotaStatsListener on document creation:


  • for the created document:


    • DOCUMENT_CREATED

    • BEFORE_DOC_UPDATE

    • DOCUMENT_UPDATED



  • for each parent:


    • BEFORE_DOC_UPDATE

    • DOCUMENT_UPDATE




3 + depth *2 calls = 11 calls for a creation on depth 4.
The work is done only in BEFORE_DOC_UPDATE and DOCUMENT_CREATED (nothing on DOCUMENT_UPDATE).
The number of SQL UPDATEs to set the size and count is correct (no duplicate commands).

VCS optimization

UUIDs (NXP-4803 uuid for document id)


The branch NXP-4803-db-uuid implements changes for PostgreSQL.

The data migration from varchar to uuid works in 4 steps:


  1. sql dump

  2. create a new db

  3. import a uuid schema

  4. import the sql dump


Migration speed on a good PGSQL server is about 2000 docs / s.

A bench is in progress using the CI.
Initial tests were done on octopussy with 600k docs:


  • uuid index size is 45% smaller

  • index is certainly less bloated after a mass import.


Once the CI job update is done, we should have a bench with big volumes and a diff between the master and the uuid branch.

SQL Server 2012

NXP-9660 support of mssql 2012:


The default collation is case sensitive and there are system tables that cannot be dropped. Changes have been made on the master. Unit tests and funkload tests are ok.
=> we can consider that MSSQL 2K12 is now supported (YAY!)

NXP-10640 Avoid issues on concurrent write:


Switching the transaction level from "SNAPSHOT" to "READ COMMITTED" removes the "Snapshot isolation transaction aborted".
But reintroduces a deadlock on ACL optimization update. It is not easy to reproduce but happens with importer addon or on a long bench.

One solution is to synchronize the updateReadACL call in SessionImpl.doFlush. This works so far.

Note that the synchronized code prevents having multiple update read ACLs at the same
time (at least for each Nuxeo node), which should be fine for all databases.

Ben will test it with the ondemand-bench job to see if there is performance regression.

The way we handle the updateReadACL for now is just a first step:


  • phase 1: add the synchronize java block

  • phase 2: run processing in async

  • phase 3: provide a cluster-wide lock / sync system.

Clustered index


It looks like MSSQL Server needs to have a clustered index, and by default uses the PK.
=> we should add auto-incremed int columns and mark it as clustered
==> this should not impact any java code and may improve the MSSQL performance!

JDBC and cast


The JDNC driver transfers all string parameters as UTF-8. As a result, there may be cast on the database side when looking at ASCII columns like UUIDS and this makes SQL Server skip the indexes and scan the table.

Async update


Some operations in VCS can start long running transactions. This is typically the case when a change inside the repository triggers a recomputation of ReadACL or Ancestors tables. The worst case is to move a folder on top of a big hierarchy:

  • trigger rebuilds ACLs
  • trigger rebuilds ancestors

This long running TX leads to 2 problems:


  • slow UI and possible TX timeout

  • concurrency issue, because on some databases, the tables end up being locked by the update process.


The ideal solution would be:


  • optimize ancestors update (ex: for rename)

  • run update in async + mono-thread.

Infrastructure

Deployment fragment


The goal is to align a Servlet 3 spec to manage modular web deployment.

DataSource internalization


We recently discovered that the DataSources managed directly by Tomcat (i.e. all except VCS) are not correctly enlisted in transactions. Stephane started to manage DataSources via an extension point system.

Advantages:


  • we can correctly enlist in the Tx

  • this make the configuration more flexible

  • the configuration no longer depends on the application server


Drawbacks:


  • Default Tomcat monitoring won't see our DataSources

  • we don't leverage the application server infrastructure.


We'll see how we manage this with respect to JBoss, but we'll try to keep the option of using application server level DataSources, but in the default Tomcat distribution we'll use Nuxeo "internal DataSources".

Doing this forces the service that needs to initialize persistence to wait for the DataSource services to be initialized. This forced Stephane to make changes in Nuxeo Runtime to better manage the ApplicationStarted event. This results in better management of the Component LifeCycle:


  • we now manage a "started" state (like in the OSGI model :) )

  • bundle notification will be usable during reload.


This will be committed in a branch and we'll wait for QA to validate this.

Tomcat 7


Tomcat 7 support is available in a pending branch. We must merge this branch so that 5.7 will be aligned on Tomcat 7.

Critical section


As you may already have experienced, managing concurrency between several threads trying to create overlapping subtrees in the repository is not easy (NXP-10707).

We already had the issue on several projects where import jobs are using the personal workspace. At some point, 2 import jobs will create the same UserWorkspace (because of MVCC and isolation) but one will fail.

Avoiding this required having a critical section pattern that:


  • works cluster-wide (JVM-level locks don't fix the issue)

  • manages Transaction visibility constraints.


Stephane started the work in order to provide some code samples for support.

This code is in the NXP-10707-critical-section branch.

Add-ons

Metadata


The goal is to manage metadata extract / writeback for some file types (pictures, videos). For now, inside Nuxeo we have basic support:

  • based on ImageMagic and FFMpeg
  • extract only
  • maps file metadata to a fixed Nuxeo schema (IPTC, EXIF)
    In the last weeks:
  • Some work was done for the blog
  • define service using an external tool
  • manage extract AND writeback
  • manage configurable mapping (no need for a fixed schema)
  • Fred integrated another metadata extractor for a customer POC (on Flash files).

This means that we will plan some integration work to:


  • package this inside DAM 2

  • remove from DAM 2 the deprecated items.=> need to integrate with DAM
    ==> avoid hard coded metadata schemas
    ==> provide default mapping
    ==> schedule work after DAM refactoring.

Deck.js


Laurent also worked on a template-rendering extension to add support for Deck.js PDF generation directly inside Nuxeo. The work includes:

  • evolution of template rendering
  • integration and dependency on Phantom.js.