AI Writing Tools

Explore the best AI Writing Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • LRE Map

    LRE Map

    The LRE Map (Language Resources and Evaluation) is a freely accessible large database on resources dedicated to Natural language processing. The original feature of LRE Map is that the records are collected during the submission of different major Natural language processing conferences. The records are then cleaned and gathered into a global database called "LRE Map". The LRE Map is intended to be an instrument for collecting information about language resources and to become, at the same time, a community for users, a place to share and discover resources, discuss opinions, provide feedback, discover new trends, etc. It is an instrument for discovering, searching and documenting language resources, here intended in a broad sense, as both data and tools. The large amount of information contained in the Map can be analyzed in many different ways. For instance, the LRE Map can provide information about the most frequent type of resource, the most represented language, the applications for which resources are used or are being developed, the proportion of new resources vs. already existing ones, or the way in which resources are distributed to the community. == Context == Several institutions worldwide maintain catalogues of language resources (ELRA, LDC, NICT Universal Catalogue, ACL Data and Code Repository, OLAC, LT World, etc.) However, it has been estimated that only 10% of existing resources are known, either through distribution catalogues or via direct publicity by providers (web sites and the like). The rest remains hidden, the only occasions where it briefly emerges being when a resource is presented in the context of a research paper or report at some conference. Even in this case, nevertheless, it might be that a resource remains in the background simply because the focus of the research is not on the resource per se. == History == The LRE Map originated under the name "LREC Map" during the preparation of LREC 2010 conference. More specifically, the idea was discussed within the FlaReNet project, and in collaboration with ELRA and the Institute of Computational Linguistics of CNR in Pisa, the Map was put in place at LREC 2010. The LREC organizers asked the authors to provide some basic information about all the resources (in a broad sense, i.e. including tools, standards and evaluation packages), either used or created, described in their papers. All these descriptors were then gathered in a global matrix called the LREC Map. The same methodology and requirements from the authors has been then applied and extended to other conferences, namely COLING-2010, EMNLP-2010, RANLP-2011, LREC 2012, LREC 2014 and LREC 2016. After this generalization to other conferences, the LREC Map has been renamed as the LRE Map. == Size and content == The size of the database increases over time. The data collected amount to 4776 entries. Each resource is described according to the following attributes: Resource type, e.g. lexicon, annotation tool, tagger/parser. Resource production status, e.g. newly created finished, existing-updated. Resource availability, e.g. freely available, from data center. Resource modality, e.g. speech, written, sign language. Resource use, e.g. named entity recognition, language identification, machine translation. Resource language, e.g. English, 23 European Union languages, official languages of India. == Uses == The LRE map is a very important tool to chart the NLP field. Compared to other studied based on subjective scorings, the LRE map is made of real facts. The map has a great potential for many uses, in addition to being an information gathering tool: It is a great instrument for monitoring the evolution of the field (useful for funders), if applied in different contexts and times. It can be seen as a huge joint effort, the beginning of an even larger cooperative action not just among few leaders but among all the researchers. It is also an "educational" means towards the broad acknowledgment of the need of meta-research activities with the active involvement of many. It is also instrumental in introducing the new notion of "citation of resources" that could provide an award and a means of scholarly recognition for researchers engaged in resource creation. It is used to help the organization of the conferences of the field like LREC. == Derived matrices == The data were then cleaned and sorted by Joseph Mariani (CNRS-LIMSI IMMI) and Gil Francopoulo (CNRS-LIMSI IMMI + Tagmatica) in order to compute the various matrices of the final FLaReNet reports. One of them, the matrix for written data at LREC 2010 is as follows: English is the most studied language. Secondly, come French and German languages and then Italian and Spanish. == Future == The LRE Map has been extended to Language Resources and Evaluation Journal and other conferences.

    Read more →
  • Conference on Computer Vision and Pattern Recognition

    Conference on Computer Vision and Pattern Recognition

    The Conference on Computer Vision and Pattern Recognition is an annual conference on computer vision and pattern recognition. == Affiliations == The conference was first held in 1983 in Washington, DC, organized by Takeo Kanade and Dana H. Ballard. From 1985 to 2010 it was sponsored by the IEEE Computer Society. In 2011 it was also co-sponsored by University of Colorado Colorado Springs. Since 2012 it has been co-sponsored by the IEEE Computer Society and the Computer Vision Foundation, which provides open access to the conference papers. == Scope == The conference considers a wide range of topics related to computer vision and pattern recognition—basically any topic that is extracting structures or answers from images or video or applying mathematical methods to data to extract or recognize patterns. Common topics include object recognition, image segmentation, motion estimation, 3D reconstruction, and deep learning. The conference generally has less than 30% acceptance rates for all papers and less than 5% for oral presentations. It is managed by a rotating group of volunteers who are chosen in a public election at the Pattern Analysis and Machine Intelligence-Technical Community (PAMI-TC) meeting four years before the meeting. The conference uses a multi-tier double-blind peer review process. The program chairs, who cannot submit papers, select area chairs who manage the reviewers for their subset of submissions. == Location and time == The conference is usually held in June in North America. == Awards == === Best Paper Award === These awards are picked by committees delegated by the program chairs of the conference. === Longuet-Higgins Prize === The Longuet-Higgins Prize recognizes papers from ten years ago that have made a significant impact on computer vision research. === PAMI Young Researcher Award === The Pattern Analysis and Machine Intelligence Young Researcher Award is an award given by the Technical Committee on Pattern Analysis and Machine Intelligence of the IEEE Computer Society to a researcher within 7 years of completing their Ph.D. for outstanding early career research contributions. Candidates are nominated by the computer vision community, with winners selected by a committee of senior researchers in the field. This award was originally instituted in 2012 by the journal Image and Vision Computing, also presented at the conference, and the journal continues to sponsor the award. === PAMI Thomas S. Huang Memorial Prize === The Thomas Huang Memorial Prize was established at the 2020 conference and is awarded annually starting from 2021 to honor researchers who are recognized as examples in research, teaching/mentoring, and service to the computer vision community.

    Read more →
  • Waffles (machine learning)

    Waffles (machine learning)

    Waffles is a collection of command-line tools for performing machine learning operations developed at Brigham Young University. These tools are written in C++, and are available under the GNU Lesser General Public License. == Description == The Waffles machine learning toolkit contains command-line tools for performing various operations related to machine learning, data mining, and predictive modeling. The primary focus of Waffles is to provide tools that are simple to use in scripted experiments or processes. For example, the supervised learning algorithms included in Waffles are all designed to support multi-dimensional labels, classification and regression, automatically impute missing values, and automatically apply necessary filters to transform the data to a type that the algorithm can support, such that arbitrary learning algorithms can be used with arbitrary data sets. Many other machine learning toolkits provide similar functionality, but require the user to explicitly configure data filters and transformations to make it compatible with a particular learning algorithm. The algorithms provided in Waffles also have the ability to automatically tune their own parameters (with the cost of additional computational overhead). Because Waffles is designed for script-ability, it deliberately avoids presenting its tools in a graphical environment. It does, however, include a graphical "wizard" tool that guides the user to generate a command that will perform a desired task. This wizard does not actually perform the operation, but requires the user to paste the command that it generates into a command terminal or a script. The idea motivating this design is to prevent the user from becoming "locked in" to a graphical interface. All of the Waffles tools are implemented as thin wrappers around functionality in a C++ class library. This makes it possible to convert scripted processes into native applications with minimal effort. Waffles was first released as an open source project in 2005. Since that time, it has been developed at Brigham Young University, with a new version having been released approximately every 6–9 months. Waffles is not an acronym—the toolkit was named after the food for historical reasons. == Advantages == Some of the advantages of Waffles in contrast with other popular open source machine learning toolkits include: Waffles automatically takes care of many issues related to data format in order to simplify its tools. Because it is implemented in C++, many of its algorithms are particularly fast. Also, the lack of dependency on any virtual machine makes it easier to deploy in conjunction with other applications. The functionality included in Waffles is very broad, including algorithms for dimensionality reduction, collaborative filtering, visualization, clustering, supervised learning, optimization, linear algebra, data transformation, image and signal processing, policy learning, and sparse matrix operations. == Disadvantages == Although Waffles provides significant breadth, it lacks the depth of many toolkits that focus on a particular area of machine learning. The Weka (machine learning) toolkit, for example, provides many more classification algorithms than Waffles provides. Waffles only has a limited graphical interface.

    Read more →
  • International Conference on Computer Vision

    International Conference on Computer Vision

    The International Conference on Computer Vision (ICCV) is a research conference sponsored by the Institute of Electrical and Electronics Engineers (IEEE) held every other year. It is considered to be one of the top conferences in computer vision, alongside CVPR and ECCV, and it is held on years in which ECCV is not. The conference is usually spread over four to five days. Typically, experts in the focus areas give tutorial talks on the first day, then the technical sessions (and poster sessions in parallel) follow. Recent conferences have also had an increasing number of focused workshops and a commercial exhibition. == Awards == === Azriel Rosenfeld Lifetime Achievement Award === The Azriel Rosenfeld Award, or Azriel Rosenfeld Lifetime Achievement Award, recognizes researchers who have made significant contributions to the field of computer vision over their careers. It is named in memory of computer scientist and mathematician Azriel Rosenfeld. The following people have received this award: === Helmholtz Prize === The ICCV Helmholtz Prize, known as the Test of Time Award before 2013, is awarded every other year at the ICCV, recognizing ICCV papers from ten or more years earlier that had a significant impact on computer vision research. Winners are selected by the IEEE Computer Society's Technical Committee on Pattern Analysis and Machine Intelligence. The award is named after the 19th century physician and physicist Hermann von Helmholtz, and the ICCV's award is not related to the various Helmholtz Prizes in physics, or the Hermann von Helmholtz Prize in neuroscience. === Marr Prize === The ICCV best-paper award is the Marr Prize, named after British neuroscientist David Marr. === Mark Everingham Prize === The Mark Everingham Prize is an award given yearly by the Technical Committee on Pattern Analysis and Machine Intelligence of the IEEE Computer Society at the IEEE International Conference on Computer Vision or the European Conference on Computer Vision to commemorate the late Mark Everingham, "one of the rising stars of computer vision", and to encourage others to follow in his footsteps by acting to further progress in the computer vision community as a whole. The prize is given to a researcher, or a team of researchers, who have made a selfless contribution of significant benefit to other members of the computer vision community. The Mark Everingham Prize for Rigorous Evaluation was an award given in 2012 at the British Machine Vision Conference. === PAMI Distinguished Researcher Award === The PAMI Distinguished Researcher Award (until 2013 called Significant Researcher Award) is awarded to candidates whose research projects have significantly contributed to the progress of computer vision. Awards are made based on major research contributions, as well as the role of those contributions in influencing and inspiring other research. Candidates are nominated by the community. The following people have received this award: == Conference list == The conference is usually held in the Spring in various international locations.

    Read more →
  • Cloud-based quantum computing

    Cloud-based quantum computing

    Cloud-based quantum computing refers to the remote access of quantum computing resources—such as quantum emulators, simulators, or processors—via the internet. Cloud access enables users to develop, test, and execute quantum algorithms without the need for direct interaction with specialized hardware, facilitating broader participation in quantum software development and experimentation. In 2016, IBM launched the IBM Quantum Experience, one of the first publicly accessible quantum processors connected to the cloud. In early 2017, researchers at Rigetti Computing demonstrated programmable quantum cloud access through their software platform Forest, which included the pyQuil Python library. Since the early-2020s, cloud-based quantum computing has grown significantly, with multiple providers offering access to a variety of quantum hardware modalities, including superconducting qubits, trapped ions, neutral atoms, and photonic systems. Major platforms such as Amazon Braket, Azure Quantum, and qBraid aggregate quantum devices from hardware developers like IonQ, Rigetti Computing, QuEra, Pasqal, Oxford Quantum Circuits, and IBM Quantum. These platforms provide unified interfaces for users to write and execute quantum algorithms across diverse backends, often supporting open-source SDKs such as Qiskit, Cirq, and PennyLane. The proliferation of cloud-based access has played a key role in accelerating quantum education, algorithm research, and early-stage application development by lowering the barrier to experimentation with real quantum hardware. Cloud-based quantum computing has expanded access to quantum hardware and tools beyond traditional research laboratories. These platforms support educational initiatives, algorithm development, and early-stage commercial applications. == Applications == Cloud-based quantum computing is used across education, research, and software development, offering remote access to quantum systems without the need for on-site infrastructure. === Education === Quantum cloud platforms have become valuable tools in education, allowing students and instructors to engage with real quantum processors through user-friendly interfaces. Educators use these platforms to teach foundational concepts in quantum mechanics and quantum computing, as well as to demonstrate and implement quantum algorithms in a classroom or laboratory setting. === Scientific Research === Cloud-based access to quantum hardware has enabled researchers to conduct experiments in quantum information, test quantum algorithms, and compare quantum hardware platforms. Experiments such as testing Bell's theorem or evaluating quantum teleportation protocols have been performed on publicly available quantum processors. === Software Development and Prototyping === Developers use cloud-based platforms to prototype quantum software applications across fields such as optimization, machine learning, and chemistry. These platforms offer SDKs and APIs that integrate classical and quantum workflows, enabling experimentation with quantum algorithms in real-world or simulated environments. === Public Engagement and Games === Quantum cloud tools have also been used to create educational games and interactive applications aimed at increasing public understanding of quantum concepts. These efforts help bridge the gap between theoretical content and intuitive learning. == Existing platforms == qBraid Lab by qBraid is a cloud-based platform for quantum computing. It provides software tools for researchers and developers in quantum, as well as access to quantum hardware. qBraid provides cloud based access to Microsoft Azure Quantum and Amazon Braket devices including IQM, QuEra, Pasqal, Rigetti, IonQ, QIR simulators, Amazon Braket simulators, and the NEC Vector Annealer, as of August 2025. qBraid's base version is free, where unlimited hardware and simulator access is available with the purchase of credits. Quandela Cloud by Quandela is the platform to access first cloud-accessible European photonic quantum computer. The computer is interfaced using the Perceval scripting language, with tutorials and documentation available online for free. Xanadu Quantum Cloud by Xanadu is a platform with cloud-based access to three fully programmable photonic quantum computers. Forest by Rigetti Computing is a tool suite for cloud-based quantum computing. It includes a programming language, development tools and example algorithms. LIQUi> by Microsoft is a software architecture and tool suite for quantum computing. It includes a programming language, example optimization and scheduling algorithms, and quantum simulators. Q#, a quantum programming language by Microsoft on the .NET Framework seen as a successor to LIQUi|>. IBM Quantum Platform by IBM, providing access to quantum hardware as well as HPC simulators. These can be accessed programmatically using the Python-based Qiskit framework, or via graphical interface with the IBM Q Experience GUI. Both are based on the OpenQASM standard for representing quantum operations. There is also a tutorial and online community. Quantum in the Cloud by The University of Bristol, which consists of a quantum simulator and a four qubit optical quantum system. Quantum Playground by Google is an educational resource which features a simulator with a simple interface, and a scripting language and 3D quantum state visualization. Quantum in the Cloud is an experimental quantum cloud platform for access to a four-qubit nuclear magnetic resonance-NMRCloudQ computer, managed by Tsinghua University. Quantum Inspire by Qutech is the first platform in Europe providing cloud-based quantum computing to two hardware chips. Next to a 5-qubit transmon processor, Quantum Inspire is the first platform in the world to provide online access to a fully programmable 2-qubit electron spin quantum processor. Amazon Braket is a cloud-based quantum computing platform hosted by AWS which, as of June 2025, provides access to quantum computers built by IonQ, Rigetti, IQM, and QuEra. Braket also provides a quantum algorithm development environment and simulator. Forge by QC Ware is a cloud-based quantum computing platform that provides access to D-Wave hardware, as well as Google and IBM simulators. The platform offers a 30-day free trial, including one minute of quantum computing time. Quantum-as-a-Service by Scaleway is a cloud-based platform created in 2022 to access to real quantum hardware from IQM Quantum Computers, Alpine Quantum Technologies, Quandela and Pasqal. It also include access to GPU-powered emulators such as Aer, Qsim and Quandela proprietary emulation.

    Read more →
  • Gaussian process emulator

    Gaussian process emulator

    In statistics, Gaussian process emulator is one name for a general type of statistical model that has been used in contexts where the problem is to make maximum use of the outputs of a complicated (often non-random) computer-based simulation model. Each run of the simulation model is computationally expensive and each run is based on many different controlling inputs. The variation of the outputs of the simulation model is expected to vary reasonably smoothly with the inputs, but in an unknown way. The overall analysis involves two models: the simulation model, or "simulator", and the statistical model, or "emulator", which notionally emulates the unknown outputs from the simulator. The Gaussian process emulator model treats the problem from the viewpoint of Bayesian statistics. In this approach, even though the output of the simulation model is fixed for any given set of inputs, the actual outputs are unknown unless the computer model is run and hence can be made the subject of a Bayesian analysis. The main element of the Gaussian process emulator model is that it models the outputs as a Gaussian process on a space that is defined by the model inputs. The model includes a description of the correlation or covariance of the outputs, which enables the model to encompass the idea that differences in the output will be small if there are only small differences in the inputs.

    Read more →
  • Ordinal regression

    Ordinal regression

    In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning. == Linear models for ordinal regression == Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset. Suppose one has a set of observations, represented by length-p vectors x1 through xn, with associated responses y1 through yn, where each yi is an ordinal variable on a scale 1, ..., K. For simplicity, and without loss of generality, we assume y is a non-decreasing vector, that is, yi ≤ {\displaystyle \leq } yi+1. To this data, one fits a length-p coefficient vector w and a set of thresholds θ1, ..., θK−1 with the property that θ1 < θ2 < ... < θK−1. This set of thresholds divides the real number line into K disjoint segments, corresponding to the K response levels. The model can now be formulated as Pr ( y ≤ i ∣ x ) = σ ( θ i − w ⋅ x ) {\displaystyle \Pr(y\leq i\mid \mathbf {x} )=\sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )} or, the cumulative probability of the response y being at most i is given by a function σ (the inverse link function) applied to a linear function of x. Several choices exist for σ; the logistic function σ ( θ i − w ⋅ x ) = 1 1 + e − ( θ i − w ⋅ x ) {\displaystyle \sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )={\frac {1}{1+e^{-(\theta _{i}-\mathbf {w} \cdot \mathbf {x} )}}}} gives the ordered logit model, while using the CDF of the standard normal distribution gives the ordered probit model. A third option is to use an exponential function σ ( θ i − w ⋅ x ) = 1 − exp ⁡ ( − exp ⁡ ( θ i − w ⋅ x ) ) {\displaystyle \sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )=1-\exp(-\exp(\theta _{i}-\mathbf {w} \cdot \mathbf {x} ))} which gives the proportional hazards model. === Latent variable model === The probit version of the above model can be justified by assuming the existence of a real-valued latent variable (unobserved quantity) y, determined by y ∗ = w ⋅ x + ε {\displaystyle y^{}=\mathbf {w} \cdot \mathbf {x} +\varepsilon } where ε is normally distributed with zero mean and unit variance, conditioned on x. The response variable y results from an "incomplete measurement" of y, where one only determines the interval into which y falls: y = { 1 if y ∗ ≤ θ 1 , 2 if θ 1 < y ∗ ≤ θ 2 , 3 if θ 2 < y ∗ ≤ θ 3 ⋮ K if θ K − 1 < y ∗ . {\displaystyle y={\begin{cases}1&{\text{if}}~~y^{}\leq \theta _{1},\\2&{\text{if}}~~\theta _{1} Read more →

  • Generalized multidimensional scaling

    Generalized multidimensional scaling

    Generalized multidimensional scaling (GMDS) is an extension of metric multidimensional scaling, in which the target space is non-Euclidean. When the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. GMDS is an emerging research direction. Currently, main applications are recognition of deformable objects (e.g. for three-dimensional face recognition) and texture mapping.

    Read more →
  • Frameserver

    Frameserver

    A frameserver is any program that acts as a media source in the process called frameserving, which transfers digital video data from one computer program to another without intermediate files. The program that receives the data – the frameclient – could be any type of video application. The process is controlled by the frameclient: the frameclient requests audio/video frames and the frameserver serves them. The client can request frames in any order, allowing it to pause or jump to an arbitrary frame, just as a media player does with a file on disk. The client is most commonly a media encoder, a non-linear editing system, or a media player. == Frameservers == AviSynth VirtualDub VapourSynth Debugmode FrameServer

    Read more →
  • Nonlinear dimensionality reduction

    Nonlinear dimensionality reduction

    Nonlinear dimensionality reduction (NLDR), also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis. == Applications of NLDR == High dimensional data can be hard for machines to work with, requiring significant time and space for analysis. It also presents a challenge for humans, since it's hard to visualize or understand data in more than three dimensions. Reducing the dimensionality of a data set, while keeping its essential features relatively intact, can make algorithms more efficient and allow analysts to visualize trends and patterns. The reduced-dimensional representations of data are often referred to as "intrinsic variables". This description implies that these are the values from which the data was produced. For example, consider a dataset that contains images of a letter 'A', which has been scaled and rotated by varying amounts. Each image has 32×32 pixels. Each image can be represented as a vector of 1024 pixel values. Each row is a sample on a two-dimensional manifold in 1024-dimensional space (a Hamming space). The intrinsic dimensionality is two, because two variables (rotation and scale) were varied in order to produce the data. Information about the shape or look of a letter 'A' is not part of the intrinsic variables because it is the same in every instance. Nonlinear dimensionality reduction will discard the correlated information (the letter 'A') and recover only the varying information (rotation and scale). By comparison, if principal component analysis, which is a linear dimensionality reduction algorithm, is used to reduce this same dataset into two dimensions, the resulting values are not so well organized. This demonstrates that the high-dimensional vectors (each representing a letter 'A') that sample this manifold vary in a non-linear manner. It should be apparent, therefore, that NLDR has several applications in the field of computer-vision. For example, consider a robot that uses a camera to navigate in a closed static environment. The images obtained by that camera can be considered to be samples on a manifold in high-dimensional space, and the intrinsic variables of that manifold will represent the robot's position and orientation. Invariant manifolds are of general interest for model order reduction in dynamical systems. In particular, if there is an attracting invariant manifold in the phase space, nearby trajectories will converge onto it and stay on it indefinitely, rendering it a candidate for dimensionality reduction of the dynamical system. While such manifolds are not guaranteed to exist in general, the theory of spectral submanifolds (SSM) gives conditions for the existence of unique attracting invariant objects in a broad class of dynamical systems. Active research in NLDR seeks to unfold the observation manifolds associated with dynamical systems to develop modeling techniques. Some of the more prominent nonlinear dimensionality reduction techniques are listed below. == Important concepts == === Sammon's mapping === Sammon's mapping is one of the first and most popular NLDR techniques. === Self-organizing map === The self-organizing map (SOM, also called Kohonen map) and its probabilistic variant generative topographic mapping (GTM) use a point representation in the embedded space to form a latent variable model based on a non-linear mapping from the embedded space to the high-dimensional space. These techniques are related to work on density networks, which also are based around the same probabilistic model. === Kernel principal component analysis === Perhaps the most widely used algorithm for dimensional reduction is kernel PCA. PCA begins by computing the covariance matrix of the m × n {\displaystyle m\times n} matrix X {\displaystyle \mathbf {X} } C = 1 m ∑ i = 1 m x i x i T . {\displaystyle C={\frac {1}{m}}\sum _{i=1}^{m}{\mathbf {x} _{i}\mathbf {x} _{i}^{\mathsf {T}}}.} It then projects the data onto the first k eigenvectors of that matrix. By comparison, KPCA begins by computing the covariance matrix of the data after being transformed into a higher-dimensional space, C = 1 m ∑ i = 1 m Φ ( x i ) Φ ( x i ) T . {\displaystyle C={\frac {1}{m}}\sum _{i=1}^{m}{\Phi (\mathbf {x} _{i})\Phi (\mathbf {x} _{i})^{\mathsf {T}}}.} It then projects the transformed data onto the first k eigenvectors of that matrix, just like PCA. It uses the kernel trick to factor away much of the computation, such that the entire process can be performed without actually computing Φ ( x ) {\displaystyle \Phi (\mathbf {x} )} . Of course Φ {\displaystyle \Phi } must be chosen such that it has a known corresponding kernel. Unfortunately, it is not trivial to find a good kernel for a given problem, so KPCA does not yield good results with some problems when using standard kernels. For example, it is known to perform poorly with these kernels on the Swiss roll manifold. However, one can view certain other methods that perform well in such settings (e.g., Laplacian Eigenmaps, LLE) as special cases of kernel PCA by constructing a data-dependent kernel matrix. KPCA has an internal model, so it can be used to map points onto its embedding that were not available at training time. === Principal curves and manifolds === Principal curves and manifolds give the natural geometric framework for nonlinear dimensionality reduction and extend the geometric interpretation of PCA by explicitly constructing an embedded manifold, and by encoding using standard geometric projection onto the manifold. This approach was originally proposed by Trevor Hastie in his 1984 thesis, which he formally introduced in 1989. This idea has been explored further by many authors. How to define the "simplicity" of the manifold is problem-dependent, however, it is commonly measured by the intrinsic dimensionality and/or the smoothness of the manifold. Usually, the principal manifold is defined as a solution to an optimization problem. The objective function includes a quality of data approximation and some penalty terms for the bending of the manifold. The popular initial approximations are generated by linear PCA and Kohonen's SOM. === Laplacian eigenmaps === Laplacian eigenmaps uses spectral techniques to perform dimensionality reduction. This technique relies on the basic assumption that the data lies in a low-dimensional manifold in a high-dimensional space. This algorithm cannot embed out-of-sample points, but techniques based on Reproducing kernel Hilbert space regularization exist for adding this capability. Such techniques can be applied to other nonlinear dimensionality reduction algorithms as well. Traditional techniques like principal component analysis do not consider the intrinsic geometry of the data. Laplacian eigenmaps builds a graph from neighborhood information of the data set. Each data point serves as a node on the graph and connectivity between nodes is governed by the proximity of neighboring points (using e.g. the k-nearest neighbor algorithm). The graph thus generated can be considered as a discrete approximation of the low-dimensional manifold in the high-dimensional space. Minimization of a cost function based on the graph ensures that points close to each other on the manifold are mapped close to each other in the low-dimensional space, preserving local distances. The eigenfunctions of the Laplace–Beltrami operator on the manifold serve as the embedding dimensions, since under mild conditions this operator has a countable spectrum that is a basis for square integrable functions on the manifold (compare to Fourier series on the unit circle manifold). Attempts to place Laplacian eigenmaps on solid theoretical ground have met with some success, as under certain nonrestrictive assumptions, the graph Laplacian matrix has been shown to converge to the Laplace–Beltrami operator as the number of points goes to infinity. === Isomap === Isomap is a combination of the Floyd–Warshall algorithm with classic Multidimensional Scaling (MDS). Classic MDS takes a matrix of pair-wise distances between all points and computes a position for each point. Isomap assumes that the pair-wise distances are only known between neighboring points, and uses the Floyd–Warshall algorithm to compute the pair-wise distances between all other points. This effectively estimates the full matrix of pair-wise geodesic distances between all of the points. Isomap th

    Read more →
  • TabPFN

    TabPFN

    TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for tabular datasets proposed in 2022. It uses a transformer architecture. It is intended for supervised classification and regression analysis on tabular datasets, particularly focusing on small- to medium-sized datasets. The latest version, TabPFN-3, was released in May 2026 and supports datasets with up to one million rows and 200 features. == History == TabPFN was first introduced in a 2022 pre-print and presented at ICLR 2023. TabPFN v2 was published in 2025 in Nature by Hollmann and co-authors. The source code is published on GitHub under a modified Apache License and on PyPi. Writing for ICLR blogs, McCarter states that the model has attracted attention due to its performance on small dataset benchmarks. TabPFN v2.5 was released on November 6, 2025. TabPFN-3 was released on May 12, 2026. Prior Labs, founded in 2024, aims to commercialize TabPFN. As of April 2026, the open-source TabPFN repository had more than 6,000 stars on GitHub. == Overview and pre-training == TabPFN supports classification, regression and generative tasks. It leverages "Prior-Data Fitted Networks" models to model tabular data. By using a transformer pre-trained on synthetic tabular datasets, TabPFN avoids benchmark contamination and costs of curating real-world data. TabPFN v2 was pre-trained on approximately 130 million such datasets. Synthetic datasets are generated using causal models or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise. Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass. The new dataset is then processed in a single forward pass without retraining. The model's transformer encoder processes features and labels by alternating attention across rows and columns. TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation, while TabPFN-2.5 scales this approach to datasets with up to 50,000 rows and 2,000 features. TabPFN-3 introduced a redesigned architecture with row-compression, an attention-based many-class decoder, native missing-value handling, and inference optimizations such as row chunking and a reduced key-value cache, with benchmark-validated regimes of up to 1 million rows with 200 features, 100,000 rows with 2,000 features, or 1,000 rows with 20,000 features. Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization. == Research == TabPFN is the subject of on-going research. Applications for TabPFN have been investigated for domains such as chemoproteomics, insurance risk classification, and metagenomics. In clinical research, TabPFN was used in a study on the early detection of pancreatic cancer from blood samples, where it was combined with metabolomic data and reported high diagnostic performance. == Applications == TabPFN has been used in industrial and biomedical contexts. Hitachi Ltd. has been reported to use the model for predictive maintenance in rail networks, with its use described as helping to identify track issues earlier and reduce manual inspections. In the biomedical domain, Oxford Cancer Analytics has used TabPFN in the analysis of proteomic data in lung disease research. A 2025 ML Contests report noted that the winners of DrivenData's PREPARE challenge used TabPFN to generate features for gradient-boosted decision tree models. == Limitations == TabPFN has been criticized for its "one large neural network is all you need" approach to modeling problems. Further, its performance is limited in high-dimensional and large-scale datasets. == Scaling Mode == In late November 2025, Prior Labs introduced ‘‘Scaling Mode’’, an operating mode for TabPFN designed to remove the fixed upper bound on training set size. Earlier versions of TabPFN had been optimized and validated primarily for datasets of up to 100,000 rows, whereas Scaling Mode was reported to extend support to substantially larger datasets, with benchmarked experiments on datasets containing up to 10 million rows. According to Prior Labs, Scaling Mode preserves the existing TabPFN architecture, including its alternating row-attention and column-attention design, as well as the same feature-count limits as prior releases. Inference remains based on a single forward pass rather than dataset-specific gradient-descent training, while scalability is described as being constrained primarily by available compute and memory resources. Prior Labs reported benchmark results on an internal collection of datasets ranging from 1 million to 10 million rows across industry and scientific applications. In these benchmarks, Scaling Mode was compared with CatBoost, XGBoost, LightGBM, and TabPFN 2.5 using 50,000-row subsampling. The company stated that predictive performance improved monotonically with increasing training set size and that no diminishing returns from scaling were observed within the tested range. Prior Labs also announced the release through company and executive social media channels. TabPFN-3 later incorporated related scaling improvements, including row chunking and a reduced key-value cache, into the model architecture and inference pipeline.

    Read more →
  • Minimum Population Search

    Minimum Population Search

    In evolutionary computation, Minimum Population Search (MPS) is a computational method that optimizes a problem by iteratively trying to improve a set of candidate solutions with regard to a given measure of quality. It solves a problem by evolving a small population of candidate solutions by means of relatively simple arithmetical operations. MPS is a metaheuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. For problems where finding the precise global optimum is less important than finding an acceptable local optimum in a fixed amount of time, using a metaheuristic such as MPS may be preferable to alternatives such as brute-force search or gradient descent. MPS is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means MPS does not require for the optimization problem to be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods. MPS can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. == Background == In a similar way to Differential evolution, MPS uses difference vectors between the members of the population in order to generate new solutions. It attempts to provide an efficient use of function evaluations by maintaining a small population size. If the population size is smaller than the dimensionality of the search space, then the solutions generated through difference vectors will be constrained to the n − 1 {\displaystyle n-1} dimensional hyperplane. A smaller population size will lead to a more restricted subspace. With a population size equal to the dimensionality of the problem ( n = d ) {\displaystyle (n=d)} , the “line/hyperplane points” in MPS will be generated within a d − 1 {\displaystyle d-1} dimensional hyperplane. Taking a step orthogonal to this hyperplane will allow the search process to cover all the dimensions of the search space. Population size is a fundamental parameter in the performance of population-based heuristics. Larger populations promote exploration, but they also allow fewer generations, and this can reduce the chance of convergence. Searching with a small population can increase the chances of convergence and the efficient use of function evaluations, but it can also induce the risk of premature convergence. If the risk of premature convergence can be avoided, then a population-based heuristic could benefit from the efficiency and faster convergence rate of a smaller population. To avoid premature convergence, it is important to have a diversified population. By including techniques for explicitly increasing diversity and exploration, it is possible to have smaller populations with less risk of premature convergence. === Thresheld Convergence === Thresheld Convergence (TC) is a diversification technique which attempts to separate the processes of exploration and exploitation. TC uses a “threshold” function to establish a minimum search step, and managing this step makes it possible to influence the transition from exploration to exploitation, convergence is thus “held” back until the last stages of the search process. The goal of a controlled transition is to avoid an early concentration of the population around a few search regions and avoid the loss of diversity which can cause premature convergence. Thresheld Convergence has been successfully applied to several population-based metaheuristics such as Particle Swarm Optimization, Differential evolution, Evolution strategies, Simulated annealing and Estimation of Distribution Algorithms. The ideal case for Thresheld Convergence is to have one sample solution from each attraction basin, and for each sample solution to have the same relative fitness with respect to its local optimum. Enforcing a minimum step aims to achieve this ideal case. In MPS Thresheld Convergence is specifically used to preserve diversity and avoid premature convergence by establishing a minimum search step. By disallowing new solutions which are too close to members of the current population, TC forces a strong exploration during the early stages of the search while preserving the diversity of the (small) population. == Algorithm == A basic variant of the MPS algorithm works by having a population of size equal to the dimension of the problem. New solutions are generated by exploring the hyperplane defined by the current solutions (by means of difference vectors) and performing an additional orthogonal step in order to avoid getting caught in this hyperplane. The step sizes are controlled by the Thresheld Convergence technique, which gradually reduces step sizes as the search process advances. An outline for the algorithm is given below: Generate the first initial population. Allowing these solutions to lie near the bounds of the search space generally gives good results: s k = ( r s 1 ∗ b o u n d 1 / 2 , r s 2 ∗ b o u n d 2 / 2 , . . . , r s n ∗ b o u n d n / 2 ) {\displaystyle s_{k}=(rs_{1}bound_{1}/2,rs_{2}bound_{2}/2,...,rs_{n}bound_{n}/2)} where s k {\displaystyle s_{k}} is the k {\displaystyle k} -th population member, r s i {\displaystyle rs_{i}} are random numbers which can be −1 or 1, and the b o u n d i {\displaystyle bound_{i}} are the lower and upper bounds on each dimension. While a stop condition is not reached: Update threshold convergence values ( m i n _ s t e p {\displaystyle min\_step} and m a x _ s t e p {\displaystyle max\_step} ) Calculate the centroid of the current population ( x c {\displaystyle x_{c}} ) For each member of the population ( x i {\displaystyle x_{i}} ), generate a new offspring as follows: Uniformly generate a scaling factor ( F i {\displaystyle F_{i}} ) between − m a x _ s t e p {\displaystyle -max\_step} and m a x _ s t e p {\displaystyle max\_step} Generate a vector ( x o {\displaystyle x_{o}} ) orthogonal to the difference vector between x i {\displaystyle x_{i}} and x c {\displaystyle x_{c}} Calculate a scaling factor for the orthogonal vector: m i n _ o r t h = s q r t ( m a x ( m i n _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle min\_orth=sqrt(max(min\_step^{2}-F_{i}^{2},0))} m a x _ o r t h = s q r t ( m a x ( m a x _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle max\_orth=sqrt(max(max\_step^{2}-F_{i}^{2},0))} o r t h _ s t e p = u n i f o r m ( m i n _ o r t h , m a x _ o r t h ) {\displaystyle orth\_step=uniform(min\_orth,max\_orth)} Generate the new solution by adding the difference and the orthogonal vectors to the original solution n e w _ s o l u t i o n = x i + F i ∗ ( x i − x c ) ∗ o r t h _ s t e p ∗ x o {\displaystyle new\_solution=x_{i}+F_{i}(x_{i}-x_{c})orth\_stepx_{o}} Pick the best members between the old population and the new one by discarding the least fit members. Return the single best solution or the best population found as the final result.

    Read more →
  • AdBlock

    AdBlock

    AdBlock is an ad-blocking browser extension for Google Chrome, Apple Safari (desktop and mobile), Firefox, Samsung Internet, Microsoft Edge and Opera. AdBlock allows users to prevent page elements, such as advertisements, from being displayed. It is free to download and use, and it includes optional donations to the developers. The AdBlock extension was created on December 8, 2009, which is the day that supports for extensions was added to Google Chrome. It was one of the first Google Chrome extensions that was made. Since 2016, AdBlock has been based on the Adblock Plus source code. In July 2018, AdBlock acquired uBlock, a commercial ad-blocker owned by uBlock LLC and based on uBlock Origin. In April 2021, eyeo GmbH (developer of Adblock Plus) announced its purchase of AdBlock, Inc (formerly BetaFish, Inc). == Crowdfunding == Gundlach launched a crowdfunding campaign on Crowdtilt in August 2013 in order to fund an ad campaign to raise awareness of ad-blocking and to rent a billboard at Times Square. After the one-month campaign, it raised $55,000. == Sales and acceptable ads == AdBlock was sold to an anonymous buyer in 2015 and on October 15, 2015, Gundlach's name was taken down from the site. In the terms of the deal, the original developer Michael Gundlach left operations to Adblock's continuing director, Gabriel Cubbage, and as of October 2, 2015, AdBlock began participating in the Acceptable Ads program. Acceptable Ads identifies "non-annoying" ads, which AdBlock shows by default. The intent is to allow non-invasive advertising, to either maintain support for websites that rely on advertising as a main source of revenue or for websites that have an agreement with the program. == Filters == AdBlock uses EasyList, the same filter syntax as Adblock Plus for Firefox, and natively supports the use of a number of filter lists. == Partnership with Amnesty International == On March 12, 2016, in support of World Day Against Cyber Censorship, and in partnership with Amnesty International, instead of blocking ads, AdBlock replaced ads with banners linked to articles on Amnesty's website, written by prominent free speech advocates such as Edward Snowden, to raise awareness of government-imposed online censorship and digital privacy issues around the world. The campaign was met with both praise and criticism, with AdBlock's CEO, Gabriel Cubbage, defending the decision in an essay on AdBlock's website, saying "We’re showing you Amnesty banners, just for today, because we believe users should be part of the conversation about online privacy. Tomorrow, those spaces will be vacant again. But take a moment to consider that in an increasingly information-driven world, when your right to digital privacy is threatened, so is your right to free expression." Meanwhile, Simon Sharwood of The Register characterized Cubbage's position as "'You should control your computer except when we feel political', says AdBlock CEO". == AdBlock for Firefox == On September 13, 2014, the AdBlock team released a version for Firefox users, ported from the code for Google Chrome, released under the same free software license as the original Adblock. The extension was removed on April 2, 2015, by an administrator on Mozilla Add-ons. On December 7, 2015, the official AdBlock site's knowledge base article stated that with version 44 or higher of Firefox desktop and Firefox Mobile, AdBlock will not be supported. The last version of Adblock for those platforms will work on older versions of Firefox. AdBlock was released again on Mozilla Add-ons on November 17, 2016. On April 1, 2012, Adblock developer Michael Gundlach tweaked the code to display LOLcats instead of simply blocking ads. Initially developed as a short-lived April Fools joke, the response was so positive that CatBlock was continued to be offered as an optional add-on supported by a monthly subscription. On October 23, 2014, the developer decided to end official support for CatBlock, and made it open-source, under GPLv3 licensing, as the original extension.

    Read more →
  • International Conference on Acoustics, Speech, and Signal Processing

    International Conference on Acoustics, Speech, and Signal Processing

    ICASSP, the International Conference on Acoustics, Speech, and Signal Processing, is an annual flagship conference organized by IEEE Signal Processing Society. Ei Compendex has indexed all papers included in its proceedings. The first ICASSP was held in 1976 in Philadelphia, Pennsylvania, based on the success of a conference in Massachusetts four years earlier that had focused specifically on speech signals. As ranked by Google Scholar's h-index metric in 2016, ICASSP has the highest h-index of any conference in the Signal Processing field. The Brazilian ministry of education gave the conference an 'A1' rating based on its h-index. == Conference list ==

    Read more →
  • Evolutionary programming

    Evolutionary programming

    Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES( μ + λ {\displaystyle \mu +\lambda } ) in one detail. All individuals are selected for the new population, while in ES( μ + λ {\displaystyle \mu +\lambda } ), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms. == History == It was first used by Lawrence J. Fogel in the US in 1960 in order to use simulated evolution as a learning process aiming to generate artificial intelligence. It was used to evolve finite-state machines as predictors.

    Read more →