Patent Applicant: Accenture Global Solutions Limited

Case

[2022] APO 34

13 May 2022


IP AUSTRALIA

AUSTRALIAN PATENT OFFICE

Accenture Global Solutions Limited [2022] APO 34

Patent Application:             2019200051

Title:Utilizing an artificial intelligence model determined for a target domain based on a dataset associated with a source domain

Patent Applicant:                Accenture Global Solutions Limited

Delegate:Greg Powell

Decision Date:  13 May 2022

Hearing Date:  Written submissions filed on 17 November 2020

Catchwords:  PATENTS – examiner objection – inventive step – applicant submission not addressing objection but seeking explanation – explanation given in examiner’s objection – no evidence from applicant – explanation correct on balance of probabilities – opportunity to continue examination

Representation:                   Patent attorney for the applicant: Murray Trento & Associates Pty Ltd, Victoria

IP AUSTRALIA

AUSTRALIAN PATENT OFFICE

Patent Application:             2019200051

Title:Utilizing an artificial intelligence model determined for a target domain based on a dataset associated with a source domain

Patent Applicant:                Accenture Global Solutions Limited

Date of Decision:                13 May 2022

DECISION

The invention defined in any one of the claims as presently proposed to be amended, on the balance of probabilities, does not involve an inventive step in light of prior art document US 2017/0193392 (D1).

The applicant may still be able to provide, before the final date of acceptance, evidence in response to the 4th adverse report to shift the balance of probabilities. As per sub regulation 13.4(1)(g) of the Patents Regulations 1991, the period to gain acceptance of the patent request and complete specification in relation to patent application 2019200051 is 3 (three) months from the date of this decision.

REASONS FOR DECISION

Background

  1. Patent application 2019200051 (the present application) was filed by Accenture Global Solutions Limited (the applicant) on 4 January 2019 claiming priority from US application 15/864,257 filed on 8 January 2018.

  2. The present application was filed after 15 April 2013. The fate of the present application is, as a consequence, governed by the Patents Act 1990 (the Act) as amended by the Intellectual Property Laws Amendment (Raising the Bar) Act 2012. These amendments included the introduction of new section 49(1). Under this provision, I must accept the present application if satisfied on the balance of probabilities that it complies with the requirements of the Act. If I am not so satisfied, I can refuse the present application. However, I will only refuse the present application if I am also satisfied that providing the applicant with an opportunity to address the objection or amend will serve no useful purpose.

  3. A first examination report was issued on 11 September 2019 raising an objection in relation to inventive step. The applicant responded to the first examination report on 24 December 2019 by way of written submissions and proposed amendments to the specification. A second examination report issued on 28 January 2020 maintaining the objection to inventive step. Another response was filed on 19 June 2020 with further proposed amendments and arguments. A third examination report issued on 10 July 2020 maintaining the inventive step objection. A response to this report was filed on 28 August 2020 and proposed further amendments and provided further arguments as to the inventiveness of the claimed invention. A fourth examination report issued on 7 September 2020, again maintaining the inventive step objection.

  4. The applicant subsequently requested to be heard on 10 September 2020. Following notice that the hearing would be held via written submissions, the applicant filed written submissions on 17 November 2020.

  5. As amendments have been proposed, this decision is based upon the statements of proposed amendments filed up to an including 28 August 2020. That is, this decision is based upon the specification as proposed to be amended.

  6. Finally, while the final date for acceptance of the present application was 11 September 2020, patent sub-regulation 13.4(1)(g) extends the time for gaining acceptance to 3 months (or longer if appropriate) from the date of the present decision.

    The Specification

    Background to the invention

  7. The present invention relates to the area of artificial intelligence. The single paragraph setting out the background to the invention states:

    “Artificial intelligence describes different ways that a machine interacts with a world around it. Through advanced, human-like intelligence (e.g., provided by software and hardware), an artificial intelligence model can mimic human behavior or perform tasks as if the artificial intelligence model were human. Machine learning is an approach, or a subset, of artificial intelligence, with an emphasis on learning rather than just computer programming. In machine learning, a device utilizes complex models to analyze a massive amount of data, recognize patterns among the data, and make a prediction without requiring a person to program specific instructions. Deep learning is a subset of machine learning, and utilizes massive amounts of data and computing power to simulate deep neural networks. Essentially, these networks classify datasets and find correlations between the datasets. With newfound knowledge (acquired without human intervention), deep learning can apply the knowledge to other datasets. Artificial intelligence models have found great success in practical applications. Computer vision, speech recognition, and language translation have all seen a near human level performance with the help of artificial intelligence models.”

  8. The specification notes that prediction systems that rely on artificial intelligence (AI) models have been trained on “cleansed and representative data” for a particular domain. It is noted that robust prediction systems rely on AI models that are robust to changing (i.e. unseen) data. However, the specification notes that:

    “many AI models are non-robust and non-applicable to domains outside of the particular domains used to train the AI models (e.g., predicting United States flight data using an AI model trained with European flight data). Therefore, many AI models cannot be applied or transferred across different domains.”

  9. While this is not entirely clear given the paucity of discussion in the background, the invention seeks to determine an AI model for a (target) domain based on data from a different (source) domain.

    Implementation

  10. An example implementation is described. In a broad sense, the system of the invention is shown in Figure 1A:

  11. The process followed is set out in the flowchart shown in Figure 4:

  12. As the specification states:

    “Some implementations described herein provide a model determination platform that determines an AI model for a target domain based on a dataset associated with a source domain that is different than the target domain. For example, the model determination platform may receive source data (e.g., associated with a source domain), target data (e.g., associated with a target domain that is different than the source domain), external data (e.g., weather data, calendar data, and/or the like), and a target task associated with the source data and the target data. The model determination platform may generate features of and differentiators between the source data and the target data, and may identify a set of mappings between the source data and the target data based on the features. The model determination platform may determine different clusters of the source data based on the features, the differentiators, and/or the external data, and may generate, based on the external data, a set of AI models to perform the target task. The model determination platform may generate a performance measure for the set of AI models based on the features, the differentiators, the set of mappings, and/or the different clusters, and may identify an AI model from the set of AI models based on the set of mappings, the different clusters, and/or the performance measure. The model determination platform may utilize the identified AI model to perform the target task.”

  13. As shown in step 410 of the flowchart, the model determination platform receives a large, and well-defined dataset with respect to, for example, airline A, and a small, undefined dataset with respect to airline B. In the implementation described, the source data is the dataset associated with airline A, and the target data is the dataset associated with airline B. It is noted in the specification that both the source data and target data may include data, associated with the source or target domain respectively, which may be used to train AI models. The data could be quite extensive and include things like quantity of daily flights offered by each airline, destination locations of the daily flights, departure locations of the daily flights, times associated with the daily flights, an average delay time for all flights of each airline, causes of flight delays, passenger data, departure locations for flights, destination locations for flights, number of bags that can be brought on board of flights and so on.

  14. The platform also receives external data. As described, this could be data indicating weather conditions associated with the destination locations and the departure locations of the airlines data indicating whether the flights associated with the airlines occur during weekends, holidays, etc, social media data indicating customers sentiments about the airlines, such as complaints by customers about the timeliness of the flights. Also supplied is a target task. The example given is a task of predicting flight delays for airline A and airline B. This task might be supplied by the user. The other data supplied to the platform is what is called “expert data. As described, this expert data is described as:

    “data indicating AI models that may be used with the source data and the target data, such as AI models that may be used to predict flight delays for airlines, data indicating different factors that may have an impact on a target task, such as the effect of weather information on predicting flight delay, and/or the like”

  15. It is noted in the specification that the platform might convert the source data, the target data, and the external data from a format received by the platform into another format (the example given being a resource descriptive framework (RDF) format).

  16. Once the data is received, as noted in step 420, a feature engine within the platform generates “features” of the source and target data. These “features’ are exemplified as:

    “a feature indicating that airline A provides passenger gender information and airline B does not, a feature indicating that airline A and airline B both fly to city X, a feature indicating weather for city X, a feature indicating that airline A and airline B follow the same calendar, a feature indicating that airline A services fifty cities that airline B does not service, a feature indicating that both airline A and airline B provide passenger age information, and/or the like.”

  17. The specification notes that the features could be generated by a schema matching technique seeking correspondences and conflicts between the source and target data. It also states that features may be generated by other techniques such as machine learning techniques like a Bayesian or multivariate Gaussian mixture model.

  18. The determined features are sent to a differentiator engine which generates differentiators between the source data and the target data based on the features of the source data and the target data. The differentiators indicate where the source and target datasets are “misaligned”. Examples of misalignments might be that that airline A provides passenger gender information and airline B does not, that airline A provides information about baggage and airline B does not, and that airline A and airline B provide airport data, but the airlines cover different airports. As with the features, the specification notes that the differentiators could be generated by a schema matching technique or with machine learning techniques.

  19. At step 430, a mapping function engine of the platform receives the differentiators and identifies a set of mappings between the source data and the target data based on the differentiators. The mapping techniques exemplified are a data-driven mapping technique using heuristics and statistics to automatically discover complex mappings between the two datasets, a semantic mapping technique using a metadata registry to look up data element synonyms, and a nonlinear data mapping technique, using neural networks to identify the set of mappings between the source data and the target data. The neural network is said to be trained (unsurprisingly) via unsupervised techniques (e.g., vector quantization techniques, subspaces techniques, probability density functions, and/or the like), or supervised techniques (e.g., learning vector quantization techniques, subspaces techniques, probability density functions).

  20. The set of mappings could be used to embed or transfer data from the source data to the target data, depending on the level of misalignment between the source data and the target data.

  21. Having generated features, differentiators and a set of mappings, at step 440, these are sent, along with the external data, to a cluster engine of the platform which determines different clusters of the source data that would have an effect on the target task, which might be predicting flight delays for airline A and airline B.

  22. Cluster analysis techniques which are exemplified include grouping a set of data in such a way that data in a same cluster is more similar to each other than to data in other clusters, a hierarchical clustering technique whereby objects are more related to nearby objects than the objects are to objects farther away, a centroid-based clustering technique where clusters may be represented by a central vector, which may not be a member of the dataset, a distribution-based clustering technique where clusters may be defined as objects belonging most likely to a same distribution, and a density-based clustering technique where clusters may be defined as areas of higher density than a remainder of a dataset. The cluster engine associates a “cluster importance factor” (e.g. a weighting factor) to each cluster. For example, if seeking to predict the flight delay for a flight of airline B, the cluster engine might cluster the source data based on a departure airport of the flight, weather conditions and a trip time associated with a previous leg of the flight, a type of an aircraft for the flight, and a quantity of legs for the flight.

  23. In step 450, model generation engine of the platform receives the external data and generates a set of AI models to perform the target task (which might be to predict flight delays for airline B). The model generation engine could use domain expert knowledge data, provided in the external data, to generate the set of multiple AI models. The domain expert knowledge data includes data indicating which types of AI models are useful to perform the target task (to predict flight delays). The AI models listed include decision tree learning models, association rule learning models, neural network models, inductive logic programming models, support vector machine models, deep learning models, long short term memory (LSTM) deep learning models (e.g. a simple recurrent neural network which may be a building block for a larger recurrent neural network), multilayer perception (MLP) deep learning models (such as a feedforward artificial neural network with at least three layers of nodes that each use a nonlinear activation function), and/or combinations of LSTM deep learning models and MLP deep learning models.

  24. A measure engine of the platform, at step 460, having received the features of the source and the target data, the differentiators between the source and target data, the set of mappings, and the different clusters of the source data, generates a performance measure for the set of Al models. The performance measure provides an indication of the performance of the AI models in performing the target task using the target dataset, based on the cluster importance factor and/or the set of mappings.

  25. The measure engine is exemplified as using a Euclidean loss function to determine the performance measure. The loss function can map an event or values of one or more variables onto a real number that represents a cost associated with the event. The loss function could be a quadratic loss function, or a 0-1 loss function, associated with the cluster importance factor, mappings for the source data, mappings for the target data, and an Al model.

  26. In step 470, the AI models, the performance measures, the mappings and the clusters are then passed to an optimisation engine which is to identify the best AI model, the best set of mappings, and the best cluster performance factor. The optimisation engine could use a stochastic gradient descent (SGD) technique such as a first-order iterative optimization technique for finding a minimum of a function, or an iterative method for minimizing an objective function that is written as a sum of differentiable functions. The specification also notes that the SGD technique could include a momentum method technique, an averaged stochastic gradient descent technique, an adaptive gradient technique, a root mean square propagation technique, an adaptive moment estimation technique, or a Kalman-based stochastic gradient descent technique.

  27. The best set of mappings, and the best cluster performance factor are used to map particular source data to the target data to generate enhanced target data. This enhanced target data can then be used by the platform to train the best AI model so that it can handle the target task.

  28. The specification states that the optimisation engine, in identifying the best Al model, the best set of mappings, and the best cluster performance factor, could use the following equation:

    where L is a loss function, fP represents the AI model, YS corresponds to a flight delay associated with the source data, YT corresponds to a flight delay associated with the target data, aj corresponds to a weight (e.g., the cluster importance factor), λS correspond to a regularization factor for the source data, λT corresponds to a regularization factor for the target data, λP corresponds to a regularization factor for the best AI model, WS corresponds to mappings for the source data, WT corresponds to mappings for the target data, XS corresponds to a predicted fight delay (e.g., calculated by the best AI model) based on the source data, XT corresponds to a predicted flight delay (e.g., calculated by the best AI model) based on the target data, and || || correspond to a weighting method for the mapping (e.g., a Frobenius norm of a transformation).

  29. Having identified the best AI model, this AI model is then used to perform the target task in step 480. In the specification, this is exemplified by the user requesting a predicted flight delay for a flight of airline B.

  30. The specification then states:

    “In this way, several different stages of the process for determining an Al model for a target domain based on a dataset associated with a source domain are automated, which may remove human subjectivity and waste from the process, and which may improve speed and efficiency of the process and conserve computing resources (e.g., processors, memory, and/or the like). Furthermore, implementations described herein use a rigorous, computerized process to perform tasks or roles that were not previously performed or were previously performed using subjective human intuition or input. These roles may include mapping source domain data to target domain data, determining an Al model for a target domain based on the source domain data, and/or the like. Finally, automating the process for determining an Al model for a target domain based on source domain data conserves computing resources (e.g., processors, memory, and/or the like) that would otherwise be wasted in attempting to determine the Al model.”

  1. The environment that the invention is implemented in is described in entirely generic terms. The user device is said to be a mobile phone, a laptop computer, a tablet computer, a desktop computer, a handheld computer, a gaming device, a wearable communication device (e.g., a smart wristwatch), and the like. The model determination platform is said to contain “computing resources” such as personal computers, workstation computers, server devices, or “other types of computation and/or communication devices”. These resources communicate with each other via wired and/or wireless connections. The platform could be a cloud-hosted computing environment with applications (accessible by the user), virtual machines executing programs for the user, virtualized storage storing user data, and one or more hypervisors allowing multiple operating systems to execute concurrently for multiple users. The network is said to be one or more wired and/or wireless networks. Examples given are a cellular network, a public land mobile network, a local area network, a wide area network, a metropolitan area network, a telephone network, a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, or a combination of these, or other types of, networks. The description of the individual devices – e.g. user device, model determination platform, and/or computing resource is also entirely generic. Figure 3 shows the level of detail:

  2. The examples given for each individual element of this figure are also completely generic and unexpected. For example, inter alia, the storage component could be a hard disk, the input component could be a keyboard and the output component could be a display.

  3. It is important to note that the level of technical detail in the specification goes no higher than what I have presented above. Elements are mentioned in very broad terms and exemplified by entirely generic examples of that element. The described mode of operation uses terms such as “generate”, “determine”, “receive”, “utilise”, and the like. The techniques applied are mentioned by name only with little to no explanation of what they are or do. The actual implementation is left entirely up to the reader.

  4. Moreover, the specification is drafted with an overuse of the word “may”. The impression one is left with is that the invention could be composed of the things described in the specification, could operate the way it is described in the specification, or could use certain techniques exemplified in the specification, or it could be composed of, operated by, and use, other, completely different elements entirely.

  5. In the language used in the specification as a whole there is a deliberate lack of specificity, making it clear that there is no individual element outside the common general knowledge of the person skilled in the art.

    Examiner’s objection

  6. Through all four reports, the examiner has maintained an objection that the claims lacked an inventive step in light of US 2017/0193392 (D1). It appropriate at this stage to discuss what D1 discloses.

    US 2017/0193392 (D1)

  7. D1 discloses a method (and system) for generating a computer model. The method is best represented by figure 2:

  8. The system that this method uses is presented at figure 1:

  9. The system 100, has a machine learning (ML) service 110 which interacts with a cluster system 150 to generate and execute computer models. Various elements of this system will be mentioned when discussing the method employed by D1 to generate computer models. D1 mentions that the system 100, may have been provided by a social network provider such as LinkedIn, Facebook or Google+.

  10. In the method, with a job 164 being created, the first step 210 is to generate a data mart 156 from raw data sources 152, 154 that was provided when users signed up to a social network. As an example, D1 states that, in a social networking context, the raw data could be profile data of users and activity data of users. Profile data might be personal information includes first and last names, current residence, marital status, personal interests, and/or business information such as academic history, work history, employer name, job title, employment status, skills, endorsements by other users or members of a social network. The data might also contain “change information” such as when work history changed, when employment status changed, when job title changed, etc. Other change information that is exemplified includes:

    “when a user sent an invitation to connect (in a social graph) to another user, when a user sent a message to another user, when a user received a message to another user, when a user visited a particular page, when a user purchased a particular product or service, when a user commented on an online posting from another user, when a user was restricted from performing certain actions (such as accessing certain web pages, logging in, or commenting on other users' online postings), when a user was contacted by a recruiter, and when a user registered with a web site.”

    D1 also notes that the data may pertain to “animals, astronomy, ecosystems and the weather”.

  11. D1 states that each “problem domain” can have its own data mart 156, noting, as examples:

    “one group of developers may be working on identifying users of an online system who are illegitimate. Evidence of illegitimate activity may be include “scraping” data from other users’ profiles, creating fake accounts to invite a large number of members of a social network to be connected, and uploading a long list of email addresses to which a web site will automatically send invitations to join a social network. Concurrently, another group of developers may be working on determining which users are most likely to purchase a particular product. Yet another group of developers may be working on predicting which digital content (e.g., a web page) will be viewed most frequently. Thus, each of these three groups working on a different problem domain. Each problem domain is likely to rely on very different features to train a respective computer model. Thus, a different data mart may be created (by someone familiar with the problem domain) for each group of developers.”

  12. D1 states that the raw data is analysed to identify and generate data to include in the data mart (“feature data”). The jobs could be created by a user interacting (possibly with a laptop, tablet, smartphone or desktop computer) with the ML service 110 through an interface 112. Or the job could be transmitted directly to a job scheduler 162.

  13. In the next step 220, a user indicates to the system, via the user interface 112, an intention to generate a computer model. Depending on the “problem domain” selected by the user (which might be, for example, finding users who are most likely purchase a particular service, or predicting whether a member is a real person or a bot), an appropriate data mart is selected from which to generate a computer model.

  14. In response to receiving the intention, feature data in the data mart is integrated with training data (which may be stored in input database 158) to generate a data set from which a computer model may be generated. Such a process is said by D1 to traditionally require the user to know how the underlying data is stored and how to access it. This is described as being labour-intensive and error-prone. In D1, the system automatically combines (integrates) multiple features into a data set. In this step 230, a driver 118 might generate a job which is sent to the job scheduler 162 and training data is combined with data from the data mart to create integrated data.

  15. In step 240, transformation operations are performed on the integrated data to produce “additional features” (i.e. transformed data) which are used for later model training. Traditionally, transformations require users to specify the transformations applied. It is also necessary for the user to manually assign an engineering strategy for each desired feature. In D1, multiple transformation operations such as, for example, maximum, minimum, average, sum, logarithm and binary are stored in the system and, depending on the type of feature, one is applied to one of the features selected from the integrated data to create the transformed feature value. Further transformation operations may be performed on the selected feature, and other features could be selected and transformed to generate further transformed data. The transformed data is stored.

  16. At this point 250, one or more computer models are generated and trained on the data previously generated. While, traditionally, a user seeking to generate a computer model would be required to know which features the model will be based on and how to extract/generate the feature values, the system in D1 can pre-select features and values. The user of the system of D1 may be given the opportunity to review the selections and “deselect” one or more features prior to the model being generated, where ethe feature may be counter-intuitive to the context for which the computer model is being built. While, again, in traditional processes, a user may be required to specify portions of training data to train and portions to validate models, the system of D1 can automatically determine this division.

  17. D1 notes that “[a]ny technique for generating and training the first computer model may be used.” Nevertheless, D1 notes that one type of analysis that may be used to generate a computer model is regression analysis and notes that Liblinear and Libsvm are examples of open-source ML libraries that can be used to generate the models. As part of this process, the selection of parameters such as learning rate, regularisation, iteration and termination tolerance can be chosen by the user, be pre-defined by another (earlier) user, or be stored as different sets of parameters.

  18. The performance of each model is validated and the performance metrics for each validation are stored along with the trained models. D1 states that performance metrics like AUC (Area Under of the Curve), accuracy (ACC), average precision (APR), and Precision (PRE) could be used, noting that the more metrics provided to the user, the more information the user has to make an informed decision about whether the computer models are sufficient and which computer model to select.

  19. At 260 a computer model is selected, based on one or more of the performance metrics associated with that model. The selection could be done by the user or automatically by the system. At this point, the model can be tested based on a portion of the training data that was not used to train the model. This determines if the selected model performs sufficiently well. If the model does not perform well, then another model might be selected and tested.

  20. In the next step 270, either by user input or automatically, the selected model is deployed, and executed against a “live” data set which, in the social networking context, could be records of many member accounts (stored in input database 158). The results of whether the prediction made by the model match the reality are stored (in 160).

  21. At 280, when one or more criteria are met (e.g. a certain amount of time has passed since the first model has been deployed), a second computer model is generated using the same steps 220270. It should be noted that the data mart 156 may store different data to the data that was used in generating and training the first model that was selected (for example, the data may be activity that occurred over a more-recent time period than the activity data used for the first model), and the second model could be generated using this different data. The second model is executed against a different portion of “live” data and the results are stored again.

  22. At step 290, after one or more criteria are again met (e.g. a certain amount of time has passed since the second model has been deployed), the results of the two models (champion/challenger) are analysed to generate a performance metric for each model. The performance metric might be what is termed a “conversion rate”; with “conversions” being an indication of whether the member identified as possibly undertaking a target action, such as purchasing a particular product, performed that target action.

  23. As a result of the comparison of the performance metrics, one of the models might be discarded, or both may be retained, but make use of different portions of the live data.

  24. D1 notes that, traditionally, a user seeking to create a computer model would have to formulate a query that extracts certain data from raw data sources (152–154), and then specify how a portion of the extracted data was to be aggregated or combined. This required the user to be familiar with how the underlying data was stored and what information was to be extracted. The described process of D1 allows a user, with knowledge of the “problem domain” of interest, to instruct that data be extracted and aggregated in a data mart 156 (with exemplified aggregation operations being sum, average, median and mode), but does not require familiarity with how the data is stored. Moreover, for users seeking to train and deploy a computer model, the process described in D1 does not require the user to have any knowledge of the raw data, as the data mart already exists. This reduces the amount of time needed to create such models. D1 states that, while a manual approach might take 4-6 weeks to generate and deploy a model, some embodiments only take a few days or even less than two hours to generate and deploy a model. The establishment of data mart 156, multiple computer models being trained simultaneously and a pipeline that includes a feature selection stage, a feature integration stage, a feature engineering stage, a model training stage, a model selection stage, a model deployment stage, and a model comparison stage, which all factor into this greater efficiency.

  25. The hardware used in the system could be a computer system programmed to operate the process, or it may be hard-wired, special-purpose devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs)

    Claim 1

  26. The claims as proposed to be amended consist of 12 claims with 3 independent claims. All independent claims are essentially identical, with each being directed to a different aspect of the invention – device, method and non-transitory computer-readable medium storing instructions. Consideration of one independent claim will be equivalent to considering all others. I will consider claim 1. Below is claim 1 as it is currently proposed to be amended. The underlined features are ones which have been added to originally filed claim 1 throughout the course of examination:

    A device, including:
    one or more memories; and
    one or more processors, communicatively coupled to the one or more memories, that:

    receive source data, target data, external data, and a target task associated with the source data and the target data,

    the external data including at least one of:

    weather data,
    calendar data,
    social media data, or
    expert data, and

    the external data received from one or more resource devices;

    generate features of the source data and the target data by analyzing the source data and the target data to determine correspondences among concepts and possible conflicts;
    generate, based on the generated features, differentiators between the source data and the target data,

    the differentiators including misalignment data indicating at least one of:

    granularity data misalignment,
    type data misalignment, or
    coverage data misalignment, and
    wherein generating the features of and the differentiators between the source data and the target data includes utilizing a Bayesian or multivariate Gaussian mixture model;

    identify, utilizing one of a data mapping technique or a nonlinear data mapping technique, which utilizes neural networks, a set of mappings between the source data and the target data based on the differentiators between the source data and the target data,
    each mapping dependent upon a level of misalignment between the source data and the target data,
    the set of mappings used to embed or transfer data from the source data to the target data;
    determine, utilizing at least one of a hierarchical clustering technique, a centroid-based clustering technique, or a distribution-based clustering technique, different clusters of the source data based on the external data, the features of the source data and the target data, the differentiators between the source data and the target data, and the sets of mappings, the sets of mappings being utilized to force alignment of the source data and the target data based on the misalignment data included in the differentiators;
    generate, based on the external data, a set of artificial intelligence models as candidates to perform the target task;
    generate a performance measure for the set of artificial intelligence models based on the features of the source data and the target data, the differentiators between the source data and the target data, the sets of mappings, and the different clusters of the source data,
    generating the performance measure utilizing at least one of:

    a Euclidean loss function to determine the performance measure,
    a quadratic loss function to determine the performance measure, or
    a 0-1 loss function to determine the performance measure;

    refine the set of mappings based on the different clusters of the source data and based on the performance measure to generate enhanced target data;
    identify, utilizing a stochastic gradient descent (SGD) technique, an artificial intelligence model, from the set of artificial intelligence models, to perform the target task based on the different clusters of the source data and based on the performance measure,

    the enhanced target data used to train the identified artificial intelligence model;

    receive a request to perform the target task; and
    utilize the trained identified artificial intelligence model to perform the target task based on the request; and
    transmit information associated with the target task to a user device.

    Examiner’s objection

  27. There have been 4 adverse reports. Each report has objected to a lack of inventive step in light of D1. This objection in the fourth adverse report was as follows (italics and strikethroughs in original):

    Novelty and inventive step

    4     The applicant is thanked for their submissions and their amendments in their response dated the 28th August 2020.

    In their response, the applicant raises their concerns that the examiner is considering features in isolation when assessing inventive step, rather than in combination; with reference to at least Section 2.5.3.5. of the Patent Manual of Practice and Procedure and Minnesota Mining and Manufacturing Co v Beiersdorf (Australia) Limited (1980) 144 CLR 253 at page 293.

    It appears that the response from the applicant (in particular page 5) suggests that the claimed invention is directed to a particular selection of features and that the objection of lack of inventive step raised in previous reports is a result of impermissible ex post facto dissection of the invention and/or consideration of the features of the claimed invention in isolation rather than in combination.

    The examiner confirms that the claimed invention has been considered both individually as well as a combination of features. Furthermore, the examiner has appropriately used the recommended “problem-solution” approach when considering the claimed invention which is the preferred approach to apply when considering inventive step. For ease of reference, the problem-solution approach is described in the relevant section under the Patent Manual of Practice and Procedure at Section 2.5.1.6 and 2.5.1.6A.

    As has been suggested by the applicant in their response dated 24 December 2019, the problem to be solved is: “determining an AI model for a target domain based on a data set associated with a source domain that is different to the target domain.”

    The person skilled in the art is a data scientist familiar with both machine learning (i.e. artificial intelligence) and statistical data modelling.

    Claim 1 now includes further features (which were present in previous dependent claims):

    ·     Generating the features of the differentiators between the source and data and the target data includes utilizing a Bayesian or multivariate Gaussian mixture model

    ·     Utilizing one of a data mapping technique or nonlinear data mapping technique, which utilizes neural networks

    ·     Utilizing at least one of a hierarchical clustering technique, a centroid-based clustering technique, or a distribution-based clustering technique

    ·     Generating the performance measure utilizing at least one of a Euclidean loss function… a quadratic loss function… or a 0-1 loss function to determine the performance measure.

    ·     Utilizing a stochastic gradient descent (SGD) technique

    Each of these features represent the utilization of methods/techniques and models which are widely known in the art of machine learning, all of which would be well within the common general knowledge of the person skilled in the art.

    For example:

    ·     Each of a Bayesian mixture model and a multivariate Gaussian mixture model is common general knowledge in the art (see for example: D7, “multivariate gaussian mixture model... typical bayesian mixture model”)

    ·     Both linear and non-linear mapping techniques are common general knowledge in the art (see for example: D8 and D9)

    ·     Hierarchical clustering, centroid-based clustering and distribution-based clustering are common general knowledge in the art. (see for example: D10 and D11)

    ·     Performance measures of the types mentioned (Euclidean loss, quadratic loss, 0-1 loss function are common general knowledge in the art and well-known to be used to optimise/select machine learning models (i.e. by minimising the loss function). (see for example: D12, “...Euclidean loss...” (also known as L2 Norm, Square Loss, or, Euclidean Distance); See also D13, quadratic and 0-1 loss functions;)

    ·     Stochastic gradient descent (SGD) is common general knowledge in the art, and is well-known to be used in machine learning (neural networks) in concert with backpropagation (see for example: D14)

    The options provided in the claim constitute combinations of methods/techniques and models which are well-known in the art, and therefore the person skilled in the art would immediately be aware of these methods/techniques and arrive at the appropriate combinations (and eliminate inappropriate combinations) merely by reference to what is already known in the common general knowledge in the art.

    Given the context of the problem which lies in the area of machine learning, it is considered that the selections claimed are options (being methods and techniques of the common general knowledge in the art of machine learning) that would at once suggest themselves to the person skilled in the art, and furthermore, the prior art (including the common general knowledge) does not teach away from the particular selections. It also appears that there is no practical difficulty in implementing the particular solution claimed.

    Consequently, these features do not contribute to providing an inventive step, either individually or considered as a whole with other features of the claims which were addressed in previous examination reports.

    5     The invention defined by claim 1 as proposed to be amended does not involve an inventive step when compared with the disclosure of D1 in light of the common general knowledge in the art.

    Regarding claim 1, D1 discloses:

    A device, including: one or more memories; and one or more processors, communicatively coupled to the one or more memories, (see [0173], “…computer systemsprocessors…”; see also [0179], “…storage mediamemory…”)

    that: receive source data, target data, external data, and a target task associated with the source data and the target data, the external data including at least one of: weather data, calendar data, social media data, or expert data, and the external data received from one or more resource devices; ([0017], “data mart is a result of extracting and aggregating data from one or more data sources…”; [0036], “generate data mart based on raw data sources… in the social networking context, raw data may include profile data of multiple profiles created by different users and activity data that indicates behavior of multiple users relative to an online system… raw data is analysed to identify and generate data to include in data mart 156…”; [0037], “problem domain…”; problem domain as described is considered to be the target task; see also [0047]-[0049])

    generate features of the source data and the target data by analyzing the source data and the target data to determine correspondences among concepts and possible conflicts; generate, based on the generated features, differentiators between the source data and the target data, the differentiators including misalignment data indicating at least one of: granularity data misalignment, type data misalignment, or coverage data misalignment, and wherein generating the features of and the differentiators between the source data and the target data includes utilizing a Bayesian or multivariate Gaussian mixture model; identify, utilizing one of a data mapping technique or a nonlinear data mapping technique, which utilizes neural networks, a set of mappings between the source data and the target data based on the differentiators between the source data and the target data, each mapping dependent upon a level of misalignment between the source data and the target data, the set of mappings used to embed or transfer data from the source data to the target data; determine, utilizing at least one of a hierarchical clustering technique, a centroid-based clustering technique, or a distribution-based clustering technique, different clusters of the source data based on the external data, the features of the source data and the target data, the differentiators between the source data and the target data, and the sets of mappings, the sets of mappings being utilized to force alignment of the source data and the target data based on the misalignment data included in the differentiators; (D1 discloses feature selection ([0095]-[0098], [0113]-[0120]), feature integration ([0099]-[0102]) and feature engineering ([0104]-[0112]) which inherently involve the generating of features and differentiators, and mappings, otherwise the disclosure of D1 would not be able to be worked;

    The person skilled in the art would recognise feature integration as a process of aligning or combining data from multiple sources to provide unified data for further use. This process involves analysing data from multiple sources to determine correspondences (i.e. similarities as claimed) and disparities (i.e. differentiators as claimed). The person skilled in the art would also recognised [sic] that, when performed, feature integration uses these similarities and differentiators (both similarities and differentiators may also be considered to be feature values extracted from data) to establish a mapping (or a set of mappings) between two heterogenous sources of data based on a similarity/disparity threshold (i.e. level of misalignment). The mappings allow the alignment of data of different types (or formats) from different sources or repositories.

    See also [0035]-[0040], “…cluster system executing a job, generating and training the one or more computer models based on the feature values generated previously…”; the cluster system inherently performs clustering of various feature values and differentiators (differentiators are also considered to be feature values extracted from data) during the generation and training of the one or more computer models;)

    generate, based on the external data, a set of artificial intelligence models as candidates to perform the target task; generate a performance measure for the set of artificial intelligence models based on the features of the source data and the target data, the differentiators between the source data and the target data, the sets of mappings, and the different clusters of the source data ([0040]-[0041], “one or more computer models are generated… validating each computer model, resulting in one or more performance metrics”; see also [0079]-[0088])

    generating the performance measure utilizing at least one of: a Euclidean loss function to determine the performance measure, a quadratic loss function to determine the performance measure, or a 0-1 loss function to determine the performance measure;

    refine the set of mappings based on the different clusters of the source data and based on the performance measure to generate enhanced target data; (See for example D1 [0113]-[0120] which discloses suggesting to the user different features as candidates to select or deselect (i.e. to cluster), after the user deselects one or more features, the set of feature values (which are considered to be mappings based on different clusters of the source data as claimed) are updated; see also D1 [0159]-[0169], which discloses updating (i.e. refining) the training data to train and deploy a new machine learning model.)

    identify, utilizing a stochastic gradient descent (SGD) technique, an artificial intelligence model, from the set of artificial intelligence models, to perform the target task based on the different clusters of the source data and based on the performance measure, the enhanced target data used to train the identified artificial intelligence model; ([0041], “a particular computer model from among multiple computer models (if multiple were generated) is selected…Thus, the best performing computer model as it pertains to the performance metric(s) may be selected.”)

    receive a request to perform the target task; and utilize the trained identified artificial intelligence model to perform the target task based on the request; and transmit information associated with the target task to a user device. ([00150]-[00157], “the computer model is deployed executing the job passing the feature values to the computer model computer model generating a score, which reflects a prediction of… Results of each prediction may be stored to output sends one or more results to ML service…”)

    D1 does not explicitly disclose the features of:

    ·     the differentiators including misalignment data indicating at least one of: granularity data misalignment, type data misalignment, or coverage data misalignment,

    ·     and wherein generating the features of and the differentiators between the source data and the target data includes utilizing a Bayesian or multivariate Gaussian mixture model;

    ·     utilizing one of a data mapping technique or a nonlinear data mapping technique, which utilizes neural networks

    ·     utilizing at least one of a hierarchical clustering technique, a centroid-based clustering technique, or a distribution-based clustering technique

    ·     generating the performance measure utilizing at least one of: a Euclidean loss function to determine the performance measure, a quadratic loss function to determine the performance measure, or a 0-1 loss function to determine the performance measure

    ·     utilizing a stochastic gradient descent (SGD) technique

    D1 does not disclose granularity data misalignment, type data misalignment or coverage data misalignment. However, it is considered that the selection of the particular types of differentiators is obvious to the person skilled in the art in light of the common general knowledge in the art. The present specification only mentions that the differentiators may be of the types appearing in the claim, but makes no mention how these differentiators are generated save through the generic use of ‘schema matching’ techniques or ‘machine learning’ techniques (which are well-known in the art - see description [0071]-[0076]) and thus these types of differentiators are considered to be similarly well-known and therefore obvious.

    Each of the features of a Bayesian or multivariate Gaussian mixture model, data mapping technique or non-linear data mapping technique, which uses neural networks, hierarchical/centroid-based/distribution-based clustering techniques, Euclidean/quadratic/0-1 loss functions, and stochastic gradient descent techniques are well-known both individually and in combination (as addressed above in the previous section). Furthermore, the combination of (and working interrelationship between) all of these features would immediately suggest themselves to the person skilled in the art based on their prior knowledge and understanding in the art.

    The prior art does not teach away from the claimed solution, nor does there appear to be any practical difficulty in implementing such a claimed solution without an inventive step. Additionally, there does not appear to be any surprising or unforeseen effect.

    Therefore, in consideration of the balance of probabilities, claim 1 does not involve an inventive step.

    Claims 5 and 9 (as proposed to be amended) do not involve an inventive step for similar reasoning as discussed above for claim 1.

    Appended claims 2-4, 3-8, and 10-12 (as proposed to be amended) add only features that are either disclosed in D1 or are obvious in light of common general knowledge in the art or mere obvious implementation details and which therefore cannot contribute to providing an inventive step. For example:

    ·     The feature of claim 2 is disclosed by D1 at [0037](“The user may specify a problem domain…”) and [0043], “…the selected computer model is deployed…”)

    ·     The feature of claim 3 is disclosed by D1 at [00156](“the computer model generating, based on the feature values, a score, which reflects a prediction of, in the context of user behavior, an action that the corresponding user will take.”)

    ·     The feature of claim 4 is disclosed by D1 at [0035]-[0040] (“…cluster systemexecuting a job, generating and training the one or more computer models based on the feature values generated previously…”; the cluster system inherently performs clustering of various feature values and differentiators (differentiators are also considered to be feature values extracted from data) during the generation and training of the one or more computer models.)

    6.    The applicant notes that: “The US Examiner was aware of D1, yet considered 11 additional prior references to represent disclosures of greater reference as compared with D1. Accordingly, the US Examiner allowed former claim 1 in view of 11 prior references whilst also having regard to D1.”

    The applicant has also requested the examiner to: “provide detailed explanation in relation to same, including the specific differences in the assessment applied to the preset claims which gave rise to the previous objections as compared with the US Examiner’s assessment which included that the claims were non-obvious”.

    The applicant’s request for a detailed explanation of the difference between the US examiner’s approach and the office’s approach to be provided is noted. However, there is no requirement in the official practice and procedure to provide such an explanation. In the first instance the office is not privy to any such information from other jurisdictions. More importantly, if the inference from this is that the claims are consequently automatically acceptable in Australia then such an inference would be incorrect. The attorney will appreciate that even if a foreign country grants a patent on a similar application, you still have to comply with Australian patent law for your Australian application.”

    Applicant’s submissions

  1. The applicant filed written submissions after requesting to be heard. The applicant did not address the objection. Rather they submitted that they had less of an understanding about, and were not clear regarding the basis of, the objection.

  2. Verbatim, the applicant’s submissions were (italics, bolding and underlines in original):

    “We refer to the abovementioned Australian Patent Application and the recently filed Request for a Hearing. In support of the Applicant’s request for a Hearing, the Applicant provides the following comments.

    The instant application has been the subject of four adverse Examination Reports issued by the Australian Case Examiner. At the time of requesting a Hearing, the Examiner accepted the presently pending 12 claims as comprising patentable subject matter but maintained objection to the claims for lack of an inventive step in view of fourteen prior references all designated as category ‘A’ type documents.

    When issuing the Fourth Examination Report, the Examiner newly cited references D7 to D14 which, according to Examination Report No. 4, were located in a “top-up search” thus representing documents that the Examiner considered to define the general state of the art but not comprising documents considered to be of particular relevance. In view of the previous cited prior references (D1 to D3) similarly being designated at category ‘A’ type documents, the Applicant is presently confronted with an outstanding objection to the presently pending claims for lack of an inventive step in view of fourteen category ‘A’ type prior references.

    Whilst the Applicant is prepared to engage with Examiners in the Australian Patent Office through their locally appointed Attorneys to resolve objections, in the present case, after issuance of the Fourth Examination Report, the Applicant considers that they have less of an understanding regarding the Examiner’s objections and have a lack of clarity regarding the basis of the objection in view of the Examiner’s reliance upon a large number of category ‘A’ type documents.

    A brief history of the four Examination Reports that have issued in respect of the present
    application is detailed below.

    EXAMINATION REPORT NO. 1

    In the first Examination Report, the Examiner indicated that the claims comprised patentable subject matter although raised objections with respect to the claims for lack of an inventive step in view of four prior references.

    In this regard, reference D1 (LinkedIn Corporation) was identified as a category ‘X’ type document with references D2 (SHI et al), D3 (Feurer, M. et al) and D4 (Swearingen, T. et al) all classified as category ‘A’ type documents. In response to the First Examination Report, the Applicant amended the claims to substantially conform same with the allowed claims in the Applicant’s counterpart US filing. In the accompanying response, it was noted that prior reference D1 was included in the “List of references cited by the Applicant and considered by the Examiner” in respect of their US counterpart filing but was not one of the eleven prior references which the US Examiner considered to represent the “closest prior art”.

    It is also noted in the Notice of Allowance dated 31 July 2019 that issued in respect of the Applicant’s counterpart US filing, the US Examiner stated that the steps of receiving source data, target data, external data, and a target associated with the source data and the target data, generating, based on the external data, a set of artificial intelligence models as candidates to perform the target task, identifying an artificial intelligence model to perform the target task, and training the identified artificial intelligence model using enhanced target data to perform the target task, represented limitations that were not taught in the prior art.

    EXAMINATION REPORT NO. 2

    In the Second Examination Report that issued in respect of the instant application, the Examiner continued to affirm the presently pending claims as comprising patentable subject matter but maintained objection to the claims for lack of an inventive step. In support of the maintained objection, the Examiner identified two new prior references, namely, D5 (RUSSELL et al.) and D6 (AZVINE et al.). Both references D5 and D6 were classified as Category ‘A’ type documents.

    In response to the Second Examination Report, the Applicant submitted further claim amendments to introduce further claim limitations to each of the independent claims. As a result of the additional limitations incorporated into the independent claims, the claims of the present AU patent application at that time were of reduced scope as compared with the Applicant’s issued counterpart US patent claims.

    EXAMINATION REPORT NO. 3

    In the Third Examination Report that issued in respect of the instant applicant, the Examiner once again confirmed that the claims comprised patentable subject matter. However, objection to the claims for lack of an inventive step was maintained once again on the basis of the previously cited six prior references with D1 comprising a Category ‘X’ type document and the remaining five cited prior references comprising Category ‘A’ type documents.

    In response to the Third Examination Report, the applicant incorporated yet further limitations to each of the independent claims and submitted their concern regarding the examination of the presently pending claims at that time on the basis that the Examiner appeared to be conducting an impermissible ex post facto dissection of the claimed features and failing to consider all of the claim features in combination when assessing the claims regarding the presence, or otherwise, of an inventive step.

    EXAMINATION REPORT NO. 4

    Upon receipt of the Fourth Examination Report, the Applicant noted that the Examiner once again confirms that the claims comprise patentable subject matter. However, in the Fourth Examination Report, the Examiner yet again maintains objection to the claims for lack of an inventive step with yet further newly cited prior references in support of the maintained objection.

    As a result, the Fourth Examination Report identifies fourteen prior references (D1 – D14) all of which comprise Category ‘A’ type documents such that they are considered to define the general state of the art without particular relevance to the claims.

    Further, the Applicant notes that in response to the submitted concerns that the Examiner may be considering claim features in isolation when assessing inventive step, the Examiner states that “… the Examiner confirms that the claimed invention has been considered both individually as well as a combination of features …”. Whilst the Examiner also states that the recommended “problem-solution” approach has been adopted when considering the presently claimed invention, it remains a concern to the Applicant that the Examiner has inappropriately examined the claims on the basis of considering individual features in isolation and has potentially failed to consider all of the working interrelationships between the presently claimed features that contribute to the technical effect arising from performance of the invention.

    In any event, whilst the Applicant remains prepared to engage with the Case Examiner to understand the basis of the Examiner’s maintained objection and address same with either an explanatory response indicating the reasons why the Applicant considers that the balance of probabilities should lie in their favour regarding the presently pending claims, the ongoing issuance of further adverse reports with an increased number of Category ‘A’ type citations reduces the Applicant’s ability to understand the Examiner’s objection and address same with either an explanatory response or further proposed amendments to advance the application.

    According to our review of the Fourth Examination Report, the Examiner provides references to passages in document D1 in which the Examiner considers presently claimed features to have been disclosed. However, whilst conceding that D1 does not explicitly disclose a range of presently claimed features, the Examiner does not provide references to passages in the 14 Category ‘A’ type documents cited in support of the Examiner’s maintained objection to enable the Applicant to determine the relevant passages of the documents which the Examiner considers to evidence claim features as comprising the common general knowledge. More particularly, the Applicant is uncertain whether the Examiner considers one or more of the 14 cited references render any one or more combinations of features recited in the presently pending claims comprise the common general knowledge in view of the 14 cited prior references.

    Accordingly, in the local Attorney’s view, the Applicant is deprived of a clear understanding of the case they have to answer in order to address the Case Examiner’s maintained objection.

    Further, in item 6 of the Fourth Examination Report, the Applicant notes that comments submitted to the AU Examiner which were an attempt to obtain an understanding of the AU Examiner’s view as compared with that of the US Patent Office in respect of the counterpart US filing, this attempt has seemingly been interpreted by the AU Case Examiner as a suggestion that an allowed patent claim in the United States should be automatically allowed Australia.

    The Applicant does not suggest to the AU Case Examiner that an accepted counterpart US patent filing should automatically be accepted by the Australian patent Office.

    In response, the Examiner states that “… there is no requirement in the official practice and procedure to provide such an explanation. In the first instance, the Officer is not privy to any such information from other jurisdictions. More importantly, if the inference from this is that the claims are consequently automatically acceptable in Australia, then such an inference would be incorrect. The Attorney will appreciate that even if a foreign country grants a patent on a similar application, you still have to comply with the Australian Patent law for your Australian application…”.

    Accordingly, it is apparent that the Applicant has reached an impasse with the Australian Examiner regarding the Applicant’s attempt to achieve allowed patent claims in Australia. Further, in the local Attorney’s view, the Applicant has now been presented with a greater challenge to understand the Examiner’s objection and how to address same.

    As a result, the Applicant seeks clarification from a Hearing Officer regarding the basis of the Examiner’s objection, or a return of the application to the Case Examiner with a request to provide greater detail regarding the passages in the 14 cited prior references in which the Examiner considers features not explicitly disclosed in D1 are evidenced as part of the common general knowledge. Further, the Applicant seeks to understand which combinations of features as presently claimed are considered to have become part of the common general knowledge according to the Examiner.

    In the event that a Hearing Officer, or the Case Examiner, can provide greater guidance regarding why a document not considered to be of particular relevance by the US Examiner is considered to be of primary importance in view of the presently pending AU claims, in addition to the information requested above, such an explanation would likely provide the Applicant with a better understanding regarding the basis of the AU Examiner’s maintained objection and how to address the AU Examiner’s concerns despite the introduction of claim limitations above and beyond those that were necessary in the counterpart US filing in order to achieve allowed US claims.

    Once the Applicant has a better understanding regarding the case that they are required to answer, the Applicant is prepared to further engage with the Case Examiner in an attempt to advance the present claims of the present AU patent application.

    We look forward to your response.”

  3. As such, it is clear that the applicant is seeking clarification as to the objection that is being taken, particularly in light of the fact that only ‘A’ documents are listed, and the equivalent US application has been accepted with claims that are somewhat broader than the current claims of the present application.

    Number of ‘A’ documents

  4. By itself, I could understand the confusion around how it could be possible that an inventive step objection could raised on the basis of only ‘A’ documents. While it is possible for lack of inventive step be found on the basis of the common general knowledge alone, generally, an inventive step objection is  based around at least one ‘X’ document, or at least two ‘Y’ documents.

  5. However, any confusion falls away if one simply has regard to what is said the report, and the reasoning set out in the previous reports is not ignored. When this is done, it is very clear that very similar wording to that in the 4th report set out above has been used in each previous report with respect to D1. In those previous reports, D1 was classified as an ‘X’ document. It is very clear that D1 has been mislabelled as an ‘A’ document in the examiner’s fourth report. It is clearly the ‘X’ document around which the inventive step objection is taken, noting in particular the explicit statement that: “The invention defined by claim 1 as proposed to be amended does not involve an inventive step when compared with the disclosure of D1 in light of the common general knowledge in the art.”  I find it surprising that the “local attorney” would not understand this to be the case.

    Relevance of acceptance of US application

  6. The applicant states that they do not believe that “an accepted counterpart US patent filing should automatically be accepted by the Australian patent Office”. However, they profess confusion as to why the AU examiner could maintain an objection based on a citation that the US examiner felt was, effectively, 12th in line in order of importance. This is effectively asking why the AU application has not been accepted given the US application was accepted in light of the same citation. In this regard, the comments on this point by the examiner in the 4th report are germane.

  7. Moreover, it is not unheard of for a document which was considered unimportant by a patent office to be considered important by another patent office. Indeed, it is possible (but rare) for a document which was considered unimportant by an individual examiner in a patent office to be considered important by another examiner in the same office. In any event, this is not relevant. What process occurs in other patent offices, while interesting, is not binding and may not be persuasive.

    The law

  8. There appears to be no dispute as to the applicable law.

  9. The test for obviousness is whether it would have been a matter of routine to proceed to the claimed invention.

    “The test is whether the hypothetical addressee faced with the same problem would have taken as a matter of routine whatever steps might have led from the prior art to the invention, whether they be the steps of the inventor or not.” (Aicken J in Wellcome Foundation Ltd v VR Laboratories (Aust) Pty Ltd [1981] HCA 12 at [45]; [1981] HCA 12; (1981) 148 CLR 262 at 286)

  10. The High Court in Aktiebolaget Hässle v Alphapharm Pty Ltd [2002] HCA 59 at [51] - [53] also approved the approach taken in Olin Mathieson Chemical Corporation v Biorex Laboratories Ltd [1970] RPC 157 at 187 in which Graham J had posed the reformulated Cripp’s question:

    “Would the notional research group at the relevant date in all the circumstances directly be led as a matter of course to try [the claimed invention] in the expectation that it might well produce a useful [desired result]?”

    Explanation

  11. The applicant states that they seek an explanation that provides the applicant “with a better understanding regarding the basis of the AU Examiner’s maintained objection and how to address the AU Examiner’s concerns”. However, I do not see any need to carry out such a task in light of the comprehensive examination report that was supplied to the applicant. In my opinion, and as should be clear from the discussion of D1, the examiner’s identification is accurate.

  12. D1 discloses that a device with the same components as is claimed receives different types of data associated with a “problem domain”. As stated by the examiner, and not disagreed with by the applicant, the “problem domain” is considered to be equivalent to the “target task” of the claim. The data is aggregated into a data mart (an important concept in D1). D1 also discloses the steps of feature selection, feature integration and feature engineering. As noted by the examiner, these steps inherently involve the generation of features, differentiators, and mappings as is claimed.

  13. With “feature selection”, D1 states that this can be achieved by the user being presented with a list of features that have previously been generated (as is claimed) and the user deselecting some features that are then not used in the generation of the model. The process can also happen without any user input.

  14. As noted by the examiner, the person skilled in the art would recognise feature integration as:

    “a process of aligning or combining data from multiple sources to provide unified data for further use. This process involves analysing data from multiple sources to determine correspondences (i.e. similarities as claimed) and disparities (i.e. differentiators as claimed) … when performed, feature integration uses these similarities and differentiators (both similarities and differentiators may also be considered to be feature values extracted from data) to establish a mapping (or a set of mappings) between two heterogenous sources of data based on a similarity/disparity threshold (i.e. level of misalignment). The mappings allow the alignment of data of different types (or formats) from different sources or repositories”,

    and that the “cluster system” of D1 “inherently performs clustering of various feature values and differentiators (differentiators are also considered to be feature values extracted from data) during the generation and training of the one or more computer models”, such that the “set of feature values” in D1 is equivalent to the mappings based on different clusters of the source data as claimed. Again, I do not take the applicant to disagree with this.

  15. It is clear that D1 generates AI models based on the features of the source data and the target data, the differentiators between the source data and the target data, the sets of mappings, and the different clusters of the source data and generates a performance measure (metric) for models. Furthermore, D1 discloses allowing the user to refine the mappings by giving the option to select or deselect different features, giving a an updated (i.e. “refined”) set of feature values (i.e. mappings based on different clusters of the source data as claimed) are updated.

  16. D1 states that the best performing computer model, as far as the performance metric(s) with respect to the “problem domain” (i.e. the “target task”) is concerned, can be selected from among multiple computer models (if multiple were generated). This model is then deployed (i.e. “utilised”) to generate “predictions” (i.e. undertake a target task) as to what action (e.g. click a link, buy a product, etc) a user might take. The prediction is stored and later analysed to determine the computer model’s performance.

  17. As noted in the examiner’s report, it is correct to state that the specific techniques claimed are not stated to be used in D1. However, as I have noted when discussing the present specification, it is clear that the specific techniques claimed are one of many that can be used and, from the language used in the specification, the lack of detail of these techniques makes it very clear that they are well known to the person skilled in the art. For example, as noted by the examiner, while D1 does not explicitly disclose granularity data misalignment, type data misalignment or coverage data misalignment, the fact that the specification simply mentions these by name with the pronouncement that they may be generated by generic “schema matching”, or “machine learning” techniques (which are well-known in the art), makes it clear that these types of differentiators are similarly well-known and would be obvious inclusions depending on what task is to be undertaken.

  1. Similarly, with respect to:

    ·utilising a Bayesian or multivariate Gaussian mixture model to generate features, which includes differentiators;

    ·using a data mapping technique or a nonlinear data mapping technique, which utilizes neural networks, to establish mappings;

    ·using a hierarchical clustering technique, a centroid-based clustering technique, or a distribution-based clustering technique in generating clusters;

    ·using a Euclidean loss function, a quadratic loss function, or a 0-1 loss function to determine the performance measure; and

    ·using a stochastic gradient descent (SGD) technique to identify the computer model based on performance metrics

    these are all well-known techniques in the art. This is evidenced by the ‘A’ documents listed by the examiner in their report. While it is true that the examiner has not identified isolated parts of the various documents, such identification is not warranted. The examiner is not combining the documents per se with D1 to establish a lack of inventive step. Rather, the examiner is showing that these techniques are well-known and would, as a matter of routine, be part of the tool kit utilised by the person skilled in the art when seeking to implement D1. As is clear from the present specification, the applicant seems to consider all techniques as equivalent, with no single technique being essential to the correct operation of the system claimed.

  2. While the applicant may disagree with these positions, it has not established that such is not the case. In this regard, I note Commissioner of Patents v Emperor Sports Pty Ltd [2006] FCAFC 26 (“Emperor Sports”), where the Full Court stated at [24]:

    “The Commissioner is an administrative decision-maker equipped with technical expertise. Subject to the rules of natural justice both common law and statutory (see e.g. s 101(2)), he or she is entitled to make use of that expertise, and draw inferences that may be rationally drawn from technical knowledge, including how skilled persons of various descriptions may act in their respective occupations … On an appeal by way of hearing de novo the judge would not be a person credited with technical expertise of his or her own. In such event the judge may be able to take into account conclusions of the Commissioner based on his or her expertise, subject of course to the rights of other parties to call rebutting or supporting evidence” (my emphasis).

  3. There seems to be little, if any, evidence disputing that these techniques are well-known. As I have alluded before, the present specification makes it quite clear that the lack of specificity is deliberate. Many different techniques may be brought to bear.

  4. As such, we have a situation where D1 discloses at a general level the same method that is claimed, but without the specificity. However, the included specifics in the claim are nothing more than techniques that would be immediately available and to hand to the person skilled in the art. That is, given the lack of evidence from the applicant, on the balance of probabilities, the person skilled in the art would, as a matter of routine make use of the claimed techniques when seeking to implement the system of D1.

    Conclusion

  5. It follows that the claims as proposed to be amended lack an inventive step in light of D1. However, this conclusion has been reached on the basis of the balance of probabilities, taking into account the “technical expertise” of the examiner being used to “draw inferences that may be rationally drawn … including how skilled persons of various descriptions may act in their respective occupations” (see Emperor Sports), with no evidence being supplied by the applicant in response. There may well be evidence that establishes that the techniques used are not part of the common general knowledge and that their use is not obvious, but such evidence is not in play at the moment.

  6. Given that situation, it is not appropriate to refuse the present application. Rather, it is now up to the applicant to engage with the argument presented by the examiner in their 4th report and respond, providing, as necessary, “rebutting or supporting evidence” (as per Emperor Sports).

  7. In line with sub regulation 13.4(1)(g) of the Patents Regulations 1991, the period to gain acceptance of the patent request and complete specification in relation to the present application is 3 (three) months from the date of this decision

    Greg Powell

    Delegate of the Commissioner of Patents

Actions
Download as PDF Download as Word Document


Cases Citing This Decision

0